Parse "2,5" So It Reads as "2.5"

You are reading a preview.

Activate your 30 twenty-four hour period gratuitous trial to continue reading.

  •  

Microdata example <div detail> <p>My proper name is <span itemprop=&quot; name &quot;> Neil </span>.</p> <p>My band is chosen <span itemprop=&quot; band &quot;> Four Parts Water </bridge>. I was born on <fourth dimension itemprop=&quot; birthday &quot; datetime=&quot; 2009-05-x &quot;>May tenth 2009</fourth dimension>. <img itemprop=&quot; prototype &quot; src=" me.png &quot; alt="me"> </p> </div

Microdata example <div item> <p>My proper noun is <span itemprop=&quot; proper name &quot;> Neil </span>.</p> <p>My ring is called <span itemprop=&quot; band &quot;> Four Parts H2o </bridge>. I was born on <time itemprop=&quot; altogether &quot; datetime=&quot; 2009-05-ten &quot;>May 10th 2009</time>. <img itemprop=&quot; image &quot; src=" me.png &quot; alt="me"> </p> </div

  1. Making the Spider web Searchable Peter Mika Researcher, Data Builder Yahoo! Inquiry
  2. Yahoo! Enquiry (research.yahoo.com)
  3. Yahoo! Inquiry Barcelona <ul><li>Established Jan, 2006 </li></ul><ul><li>Led past Ricardo Baeza-Yates </li></ul><ul><li>Research areas </li></ul><ul><ul><li>Web Mining </li></ul></ul><ul><ul><ul><li>content, structure, usage </li></ul></ul></ul><ul><ul><li>Distributed Web retrieval </li></ul></ul><ul><ul><li>Multimedia retrieval </li></ul></ul><ul><ul><li>NLP and Semantics </li></ul></ul>
  4. Yahoo! by numbers (Apr, 2007) <ul><li>There are approximately 500 meg users of Yahoo! branded services, pregnant we attain 50 percentage – or 1 out of every 2 users – online, the largest audience on the Internet (Yahoo! Internal Information). </li></ul><ul><li>Yahoo! is the virtually visited site online with virtually 4 billion visits and an average of 30 visits per user per month in the U.Southward. and leads all competitors in audition reach, frequency and date (comScore Media Metrix, United states of america, February. 2007). </li></ul><ul><li>Yahoo! accounts for the largest share of time Americans spend on the Internet with 12 percent (comScore Media Metrix, US, Feb. 2007) and approximately eight percent of the world'due south online time (comScore WorldMetrix, Feb. 2007). </li></ul><ul><li>Yahoo! is the #one abode page with 85 1000000 average daily visitors on Yahoo! homepages around the earth, an increment of about 5 meg visitors in a calendar month (comScore WorldMetrix, Feb. 2007). </li></ul><ul><li>Yahoo!'s social media properties (Flickr, succulent, Answers, 360, Video, MyBlogLog, Jumpcut and Bix) take 115 million unique visitors worldwide (comScore WorldMetrix, February. 2007). </li></ul><ul><li>Yahoo! Answers is the largest drove of man knowledge on the Web with more than 90 million unique users and 250 million answers worldwide (Yahoo! Internal Information). </li></ul><ul><li>There are more than than 450 million photos in Flickr in total and 1 million photos are uploaded daily. eighty percent of the photos are public (Yahoo! Internal Data). </li></ul><ul><li>Yahoo! Mail is the #1 Web mail provider in the world with 243 1000000 users (comScore WorldMetrix, Feb. 2007) and nearly 80 1000000 users in the U.S. (comScore Media Metrix, The states, February. 2007) </li></ul><ul><li>Interoperability between Yahoo! Messenger and Windows Alive Messenger has formed the largest IM customs approaching 350 million user accounts (Yahoo! Internal Information). </li></ul><ul><li>Yahoo! Messenger is the about popular in time spent with an average of 50 minutes per user, per 24-hour interval (comScore WorldMetrix, Feb. 2007). </li></ul><ul><li>Nearly one in 10 Net users is a member of a Yahoo! Groups (Yahoo! Internal Data). </li></ul><ul><li>Yahoo! is one of just 26 companies to exist on both the Fortune 500 list and the Fortune's "Best Place to Piece of work" List (2006). </li></ul>
  5. Agenda <ul><li>Part one </li></ul><ul><ul><li>Publishing content on the Semantic Web </li></ul></ul><ul><ul><ul><li>Intro to RDF and the Semantic Web </li></ul></ul></ul><ul><ul><ul><li>Half-dozen ways to publish data on the Semantic Spider web </li></ul></ul></ul><ul><ul><ul><li>History of embedded metadata on the Web </li></ul></ul></ul><ul><ul><ul><li>RDFa, best practices and tools </li></ul></ul></ul><ul><ul><ul><li>Practice </li></ul></ul></ul><ul><li>Part 2 </li></ul><ul><ul><li>Semantic Web in use </li></ul></ul><ul><ul><ul><li>SearchMonkey </li></ul></ul></ul><ul><ul><ul><li>BOSS and YQL </li></ul></ul></ul><ul><ul><ul><li>Semantic Search and Navigation </li></ul></ul></ul><ul><li>Part iii </li></ul><ul><ul><li>Research in Semantic Search </li></ul></ul>
  6. Motivation <ul><li>Why publish information on the Semantic Web? </li></ul><ul><ul><li>Multiply the value of your data by increasing content agility </li></ul></ul><ul><ul><ul><li>The potential for reuse and aggregation with other datasets </li></ul></ul></ul><ul><ul><ul><li>Make your data more hands findable </li></ul></ul></ul><ul><li>Why develop applications using semantic technologies? </li></ul><ul><ul><li>Content agility means yous can more rapidly develop applications by reusing and recombining data. Content agility leads to increased agility and robustness of your application. </li></ul></ul>
  7. Intro to the Semantic Web
  8. Bones RDF <ul><li>RDF has two basic types of entities: resource and literals </li></ul><ul><ul><li>Roughly objects and built-in types in Object Oriented Programming </li></ul></ul><ul><ul><li>Resources are identified by a URI or otherwise called a blank node </li></ul></ul><ul><ul><ul><li>URIs are a generalization of URLs </li></ul></ul></ul><ul><ul><ul><li>Notation: <http://www.example.org/Person> or ex:Person </li></ul></ul></ul><ul><ul><li>Literals have an optional linguistic communication and datatype (string, integer etc.) </li></ul></ul><ul><ul><ul><li>Datatypes are identified by URIs, due east.1000. XML Schema datatypes </li></ul></ul></ul><ul><ul><ul><li>Two literals are the same if their components are the same </li></ul></ul></ul><ul><ul><ul><li>Notation: "Joe B." or Joe@en^^http://…#string </li></ul></ul></ul>
  9. RDF models <ul><li>A triple aka a argument is a tuple of (bailiwick, predicate, object) </li></ul><ul><ul><li>Example: (Joe, loves, Mary) </li></ul></ul><ul><ul><li>Each triple gives the value of a property for a given resources or relates ii objects to one another </li></ul></ul><ul><ul><li>A predicate is always a resource with a URI </li></ul></ul><ul><ul><li>A triple is also called a statement </li></ul></ul><ul><li>An RDF model is a prepare of triples </li></ul><ul><ul><li>Ordering of statements in an RDF document is irrelevant (unlike XML) </li></ul></ul>
  10. Graphical and textual notation <ul><li>A number of text-based interchange formats for RDF </li></ul><ul><ul><li>RDF/XML, Turtle, N3, Due north-Triples </li></ul></ul><ul><ul><li>Case: http://world wide web.cs.vu.nl/~pmika/foaf.rdf </li></ul></ul>my:Joe " Joe A." name foaf:Person type
  11. Ontologies <ul><li>Ontologies are collections of classes and backdrop used to describe objects in a particular domain </li></ul><ul><ul><li>Ontologies themselves are described in RDF or OWL (the Web Ontology Linguistic communication), an extension of RDF </li></ul></ul><ul><ul><li>Instance: the Friend-Of-A-Friend (FOAF) ontology for personal profiles </li></ul></ul><ul><li>Classes tin can be described by sub- and superclasses, required backdrop </li></ul><ul><ul><li>Class membership in RDF is expressed using the rdf:blazon holding </li></ul></ul><ul><ul><li>An case tin can take multiple classes (types) </li></ul></ul><ul><ul><li>A class can accept multiple superclasses </li></ul></ul><ul><li>Properties can be described by their domain, range, cardinalities, etc. </li></ul>
  12. Advanced topic: Resources vs Literals <ul><li>Resources are objects, Literals are strings </li></ul><ul><li>Resource are instances of classes, Literals take datatypes </li></ul><ul><li>Whether something is a resources or literal sometimes depends on the detail of modeling </li></ul><ul><ul><li><meta property="myvocab:knows">Paris Hilton</meta> </li></ul></ul><ul><ul><li><item rel="foaf:knows"> </li></ul></ul><ul><ul><ul><li><meta property="foaf:proper name">Paris Hilton</meta> </li></ul></ul></ul><ul><ul><li></item> </li></ul></ul><ul><li>You cannot make statements about literals (literals are always the object in a triple) </li></ul><ul><li>Resources can carry a globally unique identifier, literals have no identity </li></ul><ul><li>Web resources such as documents and images are resources </li></ul><ul><ul><li><item rel="rdfs:seeAlso" resources="http://www.some.related.folio.com/"/> </li></ul></ul><ul><ul><li><item rel="foaf:img" resource="http://photosite.example.org/photo.jpg"/> </li></ul></ul><ul><li>When in incertitude: it'due south a resource </li></ul>
  13. Advanced Topic: Informational resource vs. Conceptual resources <ul><li>Informational resource: an HTML certificate, paradigm, any other file on the Web </li></ul><ul><ul><li>Retrievable in its entirety from the Web </li></ul></ul><ul><ul><li>Retrieving it can return a 200 OK </li></ul></ul><ul><li>Conceptual (non-informational) resource: a person, an event, a place, etc. </li></ul><ul><ul><li>A description of it may be retrievable from the Spider web </li></ul></ul><ul><ul><li>When identified by a URL, retrieving information technology should return a 303 Redirect </li></ul></ul><ul><li>Never confuse a webpage with what it describes! </li></ul><ul><ul><li>You are not your Facebook profile: 1 is a document, the other is a person. A document has properties such equally byte-size, media-type etc, a person has name, historic period, etc. </li></ul></ul><ul><ul><li>Make sure you don't apply the URL of an existing webpage equally the URI of a resource </li></ul></ul>
  14. RDF is designed for distributed systems <ul><li>URIs provide web-broad global identification across documents </li></ul><ul><ul><li>A resource may be described past multiple documents </li></ul></ul><ul><ul><li>We know information technology's the same resource considering the same URI is used or through reasoning (advanced topic…) </li></ul></ul><ul><ul><li>URIs are intented to exist reused </li></ul></ul><ul><ul><li>Unique, merely not single identifiers: two URIs may denote the same matter </li></ul></ul><ul><li>URIs are dereferencable (can be retrieved) </li></ul><ul><ul><li>A well-behaved URI returns a description of the resource </li></ul></ul><ul><ul><li>Provides say-so: the definition of foaf:Person lives at that URI </li></ul></ul><ul><li>Ontologies can exist looked up as well </li></ul><ul><ul><li>Typically at the root of the URIs, besides known as the namespace </li></ul></ul><ul><ul><li>Case: http://xmlns.com/foaf/0.one/Person redirects to the specification </li></ul></ul>
  15. URIs implicitly link data together (#joe, #proper noun, "Joe A.") (#joe, #electronic mail, mailto:joe@joe.com) (#mary, name, "Mary B.") (#mary, gender, "female") (#joe, #loves, #mary) Joe's homepage A dating site Mary'southward homepage (#name, #type, #Property) (#proper noun, #domain, #Person) Schema doc Linked Data : Following links from one document to another allows to discover the entire graph (data and ontologies)
  16. When put together, they class a unmarried 'global' graph " Joe A." #joe #proper noun " joe@joe.com" #email #mary #loves " Mary B." " female" #name #gender
  17. The even larger picture: unabridged datasets connected
  18. Publishing data on the Web
  19. RDF on the Web Two. <ul><li>Half dozen means of publishing RDF </li></ul><ul><ul><li>Standalone files (static or dynamically generated) </li></ul></ul><ul><ul><li>Metadata inside webpages </li></ul></ul><ul><ul><li>SPARQL endpoints </li></ul></ul><ul><ul><li>Feeds </li></ul></ul><ul><ul><li>XSLT/GRDDL </li></ul></ul><ul><ul><li>Automated tools </li></ul></ul><ul><li>Note: these are not-exclusive </li></ul>
  20. Option i: Standalone RDF documents <ul><li>RDF documents linked to other RDF documents </li></ul><ul><ul><li>Use rdfs:seeAlso to signal to a related document </li></ul></ul><ul><ul><ul><li>Information technology says: Become and look at that document if y'all want to know more </li></ul></ul></ul><ul><li>Advantages: </li></ul><ul><ul><li>No change to the publishing of the HTML documents </li></ul></ul><ul><ul><li>Information can exist published by 3rd political party </li></ul></ul><ul><li>Tools </li></ul><ul><ul><li>RDB-to-RDF mappers such as D2RQ or Triplify </li></ul></ul><ul><ul><li>Linked Data browsers </li></ul></ul><ul><li>Examples: Most datasets in the Linked Data deject </li></ul>. . . #PeterM #Bud born " Peter Mika" label " Budapest" label #Hun uppercase-of " two,000,000" population #PeterM #Bud born " Peter Mika" characterization " Budapest" label #Hun uppercase-of " 2,000,000" population #PeterM #Bud born " Peter Mika" label " Budapest" characterization #Hun upper-case letter-of " 2,000,000" population
  21. Option one: cntd. <ul><li>For discovery, the metadata is often linked from HTML pages </li></ul><ul><ul><li>< link rel=&quot;meta&quot; type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot; href=&quot;http://www.cs.vu.nl/~pmika/foaf.rdf&quot; /> </li></ul></ul><ul><li>Additional advantages: </li></ul><ul><ul><li>Discovery from the webpage </li></ul></ul><ul><ul><li>Information technology'due south articulate that the metadata is a auto representation of the human-targeted content of the page </li></ul></ul><ul><li>Examples: FOAF profiles, BestBuy </li></ul>. Peter Mika was born in Budapest. #PeterM #Bud born " Peter Mika" label " Budapest" label #Hun upper-case letter-of " 2,000,000" population
  22. Option two: Metadata inside web pages <ul><li>Using microformats, RDFa, MicroData (more later) </li></ul><ul><li>Advantages: </li></ul><ul><ul><li>No split database consign required </li></ul></ul><ul><ul><li>Browser plug-in friendly </li></ul></ul><ul><ul><li>Search engine friendly </li></ul></ul><ul><ul><li>Re-create-paste friendly </li></ul></ul><ul><li>Tools: </li></ul><ul><ul><li>XML editors (due east.m. Oxygen) </li></ul></ul><ul><ul><li>Triplr </li></ul></ul><ul><ul><li>RDFa Distiller </li></ul></ul><ul><ul><li>RDFa bookmarklet </li></ul></ul><ul><ul><li>Ubiquity RDFa plugin </li></ul></ul><ul><ul><li>Optimus microformat parser </li></ul></ul><ul><li>Examples: many, including SlideShare, YouTube, LinkedIn, Digg, Myspace, Facebook… </li></ul>Peter Mika was born in Budapest. Peter Mika was born in Budapest. #PeterM #Bud born " Peter Mika" characterization " Budapest" characterization #Hun upper-case letter-of " 2,000,000" population #PeterM #Bud born " Peter Mika" label " Budapest" label #Hun capital-of " 2,000,000" population
  23. Choice 3: SPARQL endpoints <ul><li>Query access to your RDF database </li></ul><ul><ul><li>Similar to exposing your database on the Web and giving someone read-only SQL admission </li></ul></ul><ul><li>Advantages: </li></ul><ul><ul><li>Nearly flexible and best performing admission from a consumer perspective </li></ul></ul><ul><li>Tools: </li></ul><ul><ul><li>Triple stores (Oracle, Virtuoso, Sesame, Jena, OWLIM etc.) </li></ul></ul><ul><ul><li>RDB-to-RDF mappers such equally D2RQ and Triplify </li></ul></ul>#PeterM #Bud built-in " Peter Mika" label " Budapest" label #Hun capital-of " two,000,000" population
  24. Option 4: feeds <ul><li>The equivalent of a database dump </li></ul><ul><li>No standard feed format for RDF </li></ul><ul><li>Advantages </li></ul><ul><ul><li>Submit your data without making it public </li></ul></ul><ul><li>Yahoo! consumes: </li></ul><ul><ul><li>DataRSS </li></ul></ul><ul><ul><li>GoogleBase feeds </li></ul></ul><ul><ul><li>NewsML </li></ul></ul><ul><li>Submit your feed using SiteExplorer </li></ul>. #PeterM #Bud born " Peter Mika" label " Budapest" label #Hun capital-of " ii,000,000" population #PeterM #Bud built-in " Peter Mika" label " Budapest" characterization #Hun capital letter-of " 2,000,000" population #PeterM #Bud born " Peter Mika" label " Budapest" label #Hun capital-of " 2,000,000" population
  25. Selection 5: XSLT <ul><li>Publish the transformation from HTML to structured data </li></ul><ul><ul><li>GRDDL is a standard for linking an HTML page to a transformation that produces RDF information </li></ul></ul><ul><li>Advantages </li></ul><ul><ul><li>No alter to the folio </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>Transformation needs to be executed to get to the data </li></ul></ul><ul><li>Tools </li></ul><ul><ul><li>Intel MashMaker </li></ul></ul><ul><ul><li>Dapper </li></ul></ul><ul><ul><li>Glue API from AdaptiveBlue </li></ul></ul><XSLT> xx yy 1 2
  26. Option six: Automatic markup <ul><li>Restricted mostly to tagging entities with identifiers </li></ul><ul><li>Advantages </li></ul><ul><ul><li>Less manual effort </li></ul></ul><ul><li>Disadvantages </li></ul><ul><ul><li>Express to finding relevant entities in text </li></ul></ul><ul><li>Tools </li></ul><ul><ul><li>OpenCalais </li></ul></ul><ul><ul><li>Zemanta API </li></ul></ul>Peter Mika was born in Budapest. <person>Peter Mika</person> was born in <location>Budapest</location>.
  27. Instance: Zemanta <ul><li>A personal writing assistant for bloggers </li></ul><ul><ul><li>Plugin for popular blogging platforms and spider web mail clients </li></ul></ul><ul><li>Analyzes text as you type and suggests hyperlinks, tags, categories, images and related articles </li></ul><ul><li>API available with the same functionality </li></ul>
  28. Metadata in HTML
  29. Cursory history of the Annotated Spider web <ul><li>1995: HTML meta tags </li></ul><ul><li>1996: Unproblematic HTML Ontology Extensions (SHOE) </li></ul><ul><li>1998: RDF/XML </li></ul><ul><ul><li>RDF/XML in HTML </li></ul></ul><ul><ul><li>RDF linked from HTML </li></ul></ul><ul><li>2003: Web two.0 </li></ul><ul><ul><li>Tagging </li></ul></ul><ul><ul><li>Microformats </li></ul></ul><ul><ul><li>Metadata in Wikipedia </li></ul></ul><ul><ul><li>Machine tags in Flickr </li></ul></ul><ul><li>2005: eRDF </li></ul><ul><li>2008: RDFa </li></ul>
  30. HTML meta tags <ul><li><HTML> </li></ul><ul><li><Caput profile=&quot;http://dublincore.org/documents/dcq-html/&quot;> </li></ul><ul><li><META name=&quot;DC.author &quot; content=&quot; Peter Mika &quot;> </li></ul><ul><li><LINK rel=&quot;DC.rights copyright&quot; href=&quot; http://www.example.org/rights.html &quot; /> </li></ul><ul><li><LINK rel=&quot;meta&quot; type=&quot;application/rdf+xml&quot; title=&quot;FOAF&quot; </li></ul><ul><li> href= &quot; http://www.cs.vu.nl/~pmika/foaf.rdf &quot;> </li></ul><ul><li></HEAD> </li></ul><ul><li>… </li></ul><ul><li></HTML> </li></ul>
  31. SHOE example (Hefflin & Hendler, 1996) <ul><li><ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot;> </li></ul><ul><li><ONTOLOGY-EXTENDS &quot;organization-ontology&quot; VERSION=&quot;2.one&quot; PREFIX=&quot;org&quot; URL=&quot;http://www.ont.org/orgont.html&quot;> </li></ul><ul><li><ONTDEF CATEGORY=&quot;Person&quot; ISA=&quot;org.Thing&quot;> </li></ul><ul><li><ONTDEF RELATION=&quot;lastName&quot; ARGS=&quot;Person STRING&quot;> </li></ul><ul><li><ONTDEF RELATION=&quot;firstName&quot; ARGS=&quot;Person STRING&quot;> </li></ul><ul><li><ONTDEF RELATION=&quot;marriedTo&quot; ARGS=&quot;Person Person&quot;> </li></ul><ul><li><ONTDEF RELATION=&quot;employee&quot; ARGS=&quot;org.Organization Person&quot;> </li></ul><ul><li></ONTOLOGY > </li></ul><Caput> <META HTTP-EQUIV=&quot;Instance-Key&quot; CONTENT=&quot;http://www.cs.umd.edu/~george&quot;> <Utilise-ONTOLOGY &quot;our-ontology&quot; VERSION=&quot;1.0&quot; PREFIX=&quot;our&quot; URL=&quot;http://ont.org/our-ont.html&quot;> </Caput> <Torso> <CATEGORY &quot;our.Person&quot;> <RELATION &quot;our.marriedTo&quot; TO=&quot;http://world wide web.cs.umd.edu/~helena&quot;> <RELATION &quot;our.employee&quot; FROM=&quot;http://www.cs.umd.edu&quot;> My proper name is <Aspect &quot;our.firstName&quot;> George </ATTRIBUTE> <ATTRIBUTE &quot;our.lastName&quot;> Cook </ATTRIBUTE> and I alive at...
  32. SHOE system
  33. SHOE Text-based query interface
  34. SHOE Graphical Query Interface
  35. Example: Artistic Commons <ul><li>Embedding CC license in HTML (now deprecated): </li></ul><HTML> <Caput>… </HEAD> <BODY> … <!–- <rdf:RDF xmlns=&quot;http://creativecommons.org/ns#&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:rdf=&quot;http://world wide web.w3.org/1999/02/22-rdf-syntax-ns#&quot;> <Work rdf:about=&quot;http://world wide web.yergler.net/averages/&quot;> <dc:title>The Police force of Averages</dc:title> <dc:description>...considering eventually i&apos;ll be right...</dc:description> <license rdf:resource=&quot;http://creativecommons.org/licenses/by-nc/ane.0/&quot; /> </Piece of work> <License rdf:near=&quot;http://creativecommons.org/licenses/by-nc/ane.0/&quot;> <requires rdf:resource=&quot;http://spider web.resources.org/cc/Notice&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Reproduction&quot; /> <permits rdf:resource=&quot;http://web.resource.org/cc/Distribution&quot; /> <prohibits rdf:resource=&quot;http://web.resource.org/cc/CommercialUse&quot; /> </License> </rdf:RDF> -->
  36. Example: Artistic Commons <ul><li>Current: rel attribute (HTML4) </li></ul>This work is licensed under a <a rel=&quot;license&quot; href=&quot;http://creativecommons.org/licenses/past/3.0/us/&quot;>Creative Commons Attribution 3.0 United States License</a>. <ul><li>Use of the "rel" attribute for semantic annotation is the nascence of the microformat… </li></ul>
  37. Microformats (μf) <ul><li>Community centered around microformats.org </li></ul><ul><ul><li>Specifications and discussions are hosted there </li></ul></ul><ul><li>Agreements on the way to encode certain kinds metadata in HTML </li></ul><ul><ul><li>Reuse of semantic-bearing HTML elements </li></ul></ul><ul><ul><li>Based on existing standards </li></ul></ul><ul><ul><li>Minimality </li></ul></ul><ul><li>Microformats exist for a express set of objects </li></ul><ul><ul><li>hCard (persons and organizations) </li></ul></ul><ul><ul><li>hCalendar (events) </li></ul></ul><ul><ul><li>hResume </li></ul></ul><ul><ul><li>hProduct </li></ul></ul><ul><ul><li>hRecipe </li></ul></ul><ul><li>Varying degrees of support and stability </li></ul><ul><ul><li>hCard and rel-tag are widely supported </li></ul></ul>
  38. Microformats: limitations <ul><li>No shared syntax </li></ul><ul><ul><li>Each microformat has a dissever syntax tailored to the vocabulary </li></ul></ul><ul><li>No formal schemas </li></ul><ul><ul><li>Limited reuse, extensibility of schemas </li></ul></ul><ul><ul><li>Unclear which combinations are allowed </li></ul></ul><ul><li>No datatypes </li></ul><ul><li>No namespaces, unique identifiers (URIs) </li></ul><ul><ul><li>no interlinking </li></ul></ul><ul><ul><li>mapping between instances is required </li></ul></ul><ul><li>Relationship to folio context is oft unclear </li></ul>
  39. Instance: microformats <cite class=&quot; vcard &quot;> <a class=&quot; fn url &quot; rel=&quot;friend colleague met&quot; href=&quot;http://meyerweb.com/&quot;> Eric Meyer </a> </cite> wrote a post ( <cite> <a href=&quot;http://meyerweb.com/eric/thoughts/2005/12/16/tax-relief/&quot;> Tax Relief </a></cite> ) about an unintentionally humorous letter he received from the <span class=&quot; vcard &quot;> <a form=&quot; fn org url &quot; href=&quot;http://irs.gov/&quot;> Internal Revenue Service </a> </span>. <div class=&quot; vcard &quot;> <a class=&quot; electronic mail fn &quot; href=&quot;mailto:jfriday@host.com&quot;> Joe Fri </a> <div class=&quot; tel &quot;> +i-919-555-7878 </div> <div grade=&quot; title &quot;> Area Administrator, Assistant </div> </div>
  40. Microformats vs. RDFa <ul><li>Cull microformats when yous find a microformat that fits your needs and supported by Yahoo! </li></ul><ul><ul><li>Microformats are first option considering they are simple </li></ul></ul><ul><ul><li>We support all major microformats, see the documentation </li></ul></ul><ul><ul><li>It'southward a common misconception that RDFa requires XHTML: it doesn't </li></ul></ul><ul><li>If you find none that perfectly fits your needs then you need RDFa </li></ul><ul><ul><li>Microformats take a stock-still schema: you can not add together your own attributes </li></ul></ul><ul><li>Instance: a social networking site with user profiles </li></ul><ul><ul><li>VCard is a skilful candidate, but for example it doesn't have a way to express the user'southward social connections </li></ul></ul><ul><ul><li>You either live without this, or get with RDFa </li></ul></ul><ul><li>The rest of this presentation is nigh RDFa, which is thus more powerful, but likewise more complex </li></ul><ul><ul><li>We will focus on the concepts that are hard to grasp </li></ul></ul>
  41. Keep an eye on HTML5 <ul><li>Currently nether standardization at the W3C </li></ul><ul><ul><li>Last Phone call this autumn, proceed an eye on information technology </li></ul></ul><ul><li>Introduces Microdata </li></ul><ul><ul><li>Similar to microformats </li></ul></ul><ul><ul><ul><li>Some predefined vocabularies with central registration </li></ul></ul></ul><ul><ul><li>Some of the flexibility of RDFa </li></ul></ul><ul><ul><li>Introduce new terms using contrary domain names or full URIs </li></ul></ul><ul><li>Semantic HTML elements such equally <time>, <video>, <article>… </li></ul>
  42. Microdata example <div item> <p>My name is <span itemprop=&quot; proper noun &quot;> Neil </span>.</p> <p>My band is called <span itemprop=&quot; ring &quot;> 4 Parts Water </bridge>. I was born on <time itemprop=&quot; altogether &quot; datetime=&quot; 2009-05-10 &quot;>May 10th 2009</time>. <img itemprop=&quot; image &quot; src=" me.png &quot; alt="me"> </p> </div
  43. Slides courtesy of Marker Birbeck Introduction to RDFa
  44. What does RDFa look like? <ul><li>There are some metadata features in HTML already... </li></ul><ul><li>...then we give them an RDF interpretation... </li></ul><ul><li>...so we generalise them... </li></ul><ul><li>...and then we add a few more. </li></ul>
  45. HTML'south metadata features (1) <ul><li><html>  <head>    <title>RDFa: At present everyone can have an API</title>    <meta proper name=&quot;author&quot; content=&quot;Marker Birbeck&quot; />    <meta proper noun=&quot;created&quot; content=&quot;2009-05-09&quot; />    <link rel=&quot;license&quot; </li></ul><ul><li>     href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; />  </head>  .  .  . </html> </li></ul>
  46. HTML'south metadata features (2) <ul><li><a href=&quot;http://creativecommons.org/licenses/past-sa/iii.0/&quot;  >CC Attribution-ShareAlike</a> </li></ul><ul><li><a rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/past-sa/3.0/&quot;  >CC Attribution-ShareAlike</a> </li></ul>
  47. RDFa extends @rel/@href to images <ul><li><img src=&quot;image01.png&quot; </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href="http://creativecommons.org/licenses/past-sa/three.0/" </li></ul><ul><li>/> </li></ul><ul><li><img src=&quot;image02.png&quot; </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href="http://creativecommons.org/licenses/by-sa/3.0/" </li></ul><ul><li>/> </li></ul>
  48. RDFa extends meta/@content to body <ul><li><html>  <head>    <title>RDFa: Now everyone can have an API</title>    <meta name=&quot;writer&quot; content=&quot;Mark Birbeck&quot; />    <meta name=&quot;created&quot; content=&quot;2009-05-09&quot; />  </head>  <torso>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em>Marking Birbeck</em>    Created: <em>May ninth, 2009</em>  </body> </li></ul><ul><li></html> </li></ul>
  49. RDFa extends meta/@content to body <ul><li><html>  <caput>    <championship>RDFa: At present everyone tin take an API</title>  </head>  <body>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em holding=&quot;writer&quot; content=&quot;Marker Birbeck&quot;     >Mark Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </trunk> </li></ul><ul><li></html> </li></ul>
  50. RDFa extends meta/@content to torso <ul><li><html>  <caput>    <championship>RDFa: At present everyone can have an API</title>  </head>  <body>    <h1>RDFa: Now anybody can have an API</h1>    Writer: <em holding=&quot;author&quot;     >Marking Birbeck</em>    Created: <em property=&quot;created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </trunk> </li></ul><ul><li></html> </li></ul>
  51. Vocabularies use CURIEs <ul><li><html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;> </li></ul><ul><li> <head>    <title>RDFa: Now everyone can have an API</title>  </head>  <torso>    <h1>RDFa: Now everyone can have an API</h1>    Author: <em holding=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </li></ul><ul><li></html> </li></ul>
  52. CURIEs, or Compact URIs <ul><li>Named after Marie Curie, who was the first person to receive two Nobel prizes, one for physics and 1 for chemistry. </li></ul><ul><li>CURIEs allow a full URI to exist expressed in a unproblematic prefix:suffix form. </li></ul><ul><li>The 'suffix' part is looser than in XML namespaces, supporting formulations such as abc:123. </li></ul>
  53. Backdrop can also apply to images <ul><li><img src=&quot;image01.png" </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href="http://creativecommons.org/licenses/by-sa/3.0/" </li></ul><ul><li>/> </li></ul><ul><li><img src=&quot;image02.png" </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/by-sa/3.0/" </li></ul><ul><li>/> </li></ul>
  54. Backdrop can also apply to images <ul><li><img src=&quot;image01.png&quot; </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/by-sa/3.0/&quot; </li></ul><ul><li>belongings=&quot;dc:creator&quot; content=&quot;Mark Birbeck" </li></ul><ul><li>/> </li></ul><ul><li><img src=&quot;image02.png&quot; </li></ul><ul><li>rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/by-sa/iii.0/&quot; property=&quot;dc:creator&quot; content=&quot;Mark Birbeck&quot; </li></ul><ul><li>/> </li></ul>
  55. Relationships and properties on anything <ul><li><a </li></ul><ul><li>  href=&quot;http://world wide web.slideshare.net/mark.birbeck/the-v-minute-guide-to-rdfain-only-6-minutes-xl-seconds&quot;  >The 5 minute guide to RDFa...in but half-dozen minutes and forty seconds</a> </li></ul>
  56. Relationships and properties on anything <ul><li><a rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://world wide web.slideshare.internet/mark.birbeck/the-v-minute-guide-to-rdfain-only-six-minutes-40-seconds&quot;  >The 5 minute guide to RDFa...in merely 6 minutes and 40 seconds</a> </li></ul><ul><li>Doesn't say what we desire. </li></ul>
  57. Relationships and backdrop on anything <ul><li><a </li></ul><ul><li>   href=&quot;http://www.slideshare.net/mark.birbeck/the-five-minute-guide-to-rdfain-only-6-minutes-forty-seconds&quot;  >The five minute guide to RDFa...in only 6 minutes and forty seconds</a> is licensed under <a </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/by-sa/two.v/&quot; </li></ul><ul><li>  >CC Past SA</a>. </li></ul>
  58. Relationships and properties on anything <ul><li><a </li></ul><ul><li>   href=&quot;http://www.slideshare.cyberspace/mark.birbeck/the-5-minute-guide-to-rdfain-only-half-dozen-minutes-40-seconds&quot;  >The five minute guide to RDFa...in only 6 minutes and forty seconds</a> is licensed under <a about=&quot;http://world wide web.slideshare.internet/marker.birbeck/the-5-minute-guide-to-rdfain-merely-6-minutes-40-seconds&quot; </li></ul><ul><li>  rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/by-sa/2.five/&quot; </li></ul><ul><li>  >CC By SA</a>. </li></ul>
  59. Relationships and properties on anything <ul><li><a </li></ul><ul><li>   href=&quot;http://www.slideshare.net/mark.birbeck/the-5-minute-guide-to-rdfain-only-six-minutes-40-seconds&quot;  >The 5 infinitesimal guide to RDFa...in only 6 minutes and 40 seconds</a> is licensed nether <a well-nigh=&quot;http://www.slideshare.net/marking.birbeck/the-5-minute-guide-to-rdfain-simply-6-minutes-40-seconds&quot; </li></ul><ul><li>  rel=&quot;license&quot; </li></ul><ul><li>  href=&quot;http://creativecommons.org/licenses/past-sa/2.5/&quot; </li></ul><ul><li>  property=&quot;dc:creator&quot; content=&quot;Marker Birbeck> </li></ul><ul><li>CC BY SA </li></ul><ul><li></a>. </li></ul>
  60. @almost sets context <ul><li><div most=&quot;http://www.slideshare.cyberspace/marker.birbeck/the-5-minute-guide-to-rdfain-only-6-minutes-xl-seconds&quot;> </li></ul><ul><li>   <h1>The v infinitesimal guide to RDFa...</h1>    Author: <em belongings=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em holding=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em> </li></ul><ul><li></div> </li></ul>
  61. @most sets context <ul><li><html xmlns:dc=&quot;http://purl.org/dc/terms/&quot;> </li></ul><ul><li> <head>    <title>RDFa: Now anybody tin can take an API</title> </li></ul><ul><li>  </head>  <body> </li></ul><ul><li>    <h1>RDFa: Now everyone tin have an API</h1>    </li></ul><ul><li>   Writer: <em holding=&quot;dc:creator&quot;     >Mark Birbeck</em>    Created: <em property=&quot;dc:created&quot; content=&quot;2009-05-09&quot;     >May 9th, 2009</em>  </body> </li></ul><ul><li></html> </li></ul>
  62. Basics of RDFa <ul><ul><li>generalise HTML'south existing semantic features; </li></ul></ul><ul><ul><li>add together support for CURIEs for belongings and relationship names; </li></ul></ul><ul><ul><li>add @about. </li></ul></ul>
  63. Advanced RDFa <ul><ul><li>apply of @datatype to gear up the data type of @content; </li></ul></ul><ul><ul><li>use of @typeof to set rdf:type; </li></ul></ul><ul><ul><li>support for bnodes; </li></ul></ul><ul><ul><li>support for XML literals; </li></ul></ul><ul><ul><li>power to concatenation statements together. </li></ul></ul><ul><li>Annotation that since RDFa supports all of the features you'll detect in RDF, then it means that you can even mark-up OWL documents in HTML. </li></ul>
  64. The process of annotating with RDFa <ul><ul><li>Invest in familiarizing with the RDFa syntax by reading the RDFa Primer </li></ul></ul><ul><ul><ul><li>It is also highly recommended that you read the RDF Primer . RDF is the data model used past RDFa. </li></ul></ul></ul><ul><ul><li>Choose a vocabulary from the SearchMonkey documentation that fits your needs </li></ul></ul><ul><ul><ul><li>A vocabulary describes a set up of types and attributes within a given domain </li></ul></ul></ul><ul><ul><ul><li>If yous don't fin d a good candidate , extend an existing 1 or create a new 1 </li></ul></ul></ul><ul><ul><li>Annotate your page. </li></ul></ul><ul><ul><ul><li>Before you lot start, you might want to validate your page for (X)HTML conformance using the W3C'south (X)HTML Validator to reduce the gamble of errors. Choose Document Type XHTML + RDFa. </li></ul></ul></ul><ul><ul><ul><li>No specific tool support. If y'all have an HTML or XML editor that supports DTDs, you will take syntax checking and highlighting. </li></ul></ul></ul><ul><ul><ul><li>Use the RDFa Distiller to validate which data can exist extracted from your page. </li></ul></ul></ul><ul><ul><ul><li>If you fancy, employ the RDF Validator to graphically visualize the RDF graph that is outputted. </li></ul></ul></ul><ul><ul><li>Put the annotated page online. The data volition extracted the next time your page is crawled </li></ul></ul><ul><ul><ul><li>No need to explicitly submit anything </li></ul></ul></ul><ul><ul><ul><li>No notification when your site is crawled </li></ul></ul></ul><ul><li>Meet http://rdfa.info/rdfa-implementations for new tools and APIs </li></ul>
  65. RDFa pitfalls <ul><li>Validation issues tin can stop us from extracting information </li></ul><ul><ul><li>Use the W3C validator </li></ul></ul><ul><ul><li>Utilise the right DOCTYPE declaration if using XHTML </li></ul></ul><ul><ul><li>Set the encoding of your folio properly (using HTTP headers or XML announcement) </li></ul></ul><ul><li>Prefixes demand to be divers using the xmlns attribute </li></ul><ul><li>Unless you are making statements near the document, gear up the subject using the nearly attribute </li></ul><ul><li>Practise non include HTML elements in literal values </li></ul><ul><ul><li>Wrong: <div property="foaf:name"><b>Peter Mika</b></div> </li></ul></ul><ul><li>Utilise absolute URIs every bit the value of the resource attribute </li></ul><ul><ul><li>Or make certain y'all specify HTML base </li></ul></ul>
  66. More pitfalls: precedence rules <ul><li>Be conscientious when using rel and typeof in combination because of the precedence rules </li></ul><ul><li>BAD example: </li></ul><ul><ul><li><div about="#id"> </li></ul></ul><ul><ul><li><span property="foaf:name">Peter Mika</span> </li></ul></ul><ul><ul><li><span rel="foaf:img" typeof="foaf:Prototype"> </li></ul></ul><ul><ul><li><span property="dc:format">jpg</span> </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><ul><li></span </li></ul></ul><ul><ul><li></div> </li></ul></ul><ul><li>To right, you lot need to put the typeof inside the <bridge> node with rel="foaf:img" </li></ul>
  67. More pitfalls: the typeof attribute <ul><li>Typeof does two things at once: it creates a new bailiwick resources and assigns the type to it </li></ul><ul><li>BAD example: </li></ul><ul><ul><li><div almost="#id"> </li></ul></ul><ul><ul><li><span holding="foaf:name">Peter Mika</span> </li></ul></ul><ul><ul><li><bridge rel="foaf:img" resource="http://www.example.org/photograph.jpg"> </li></ul></ul><ul><ul><li><span typeof="foaf:Prototype"> </li></ul></ul><ul><ul><li> <bridge property="dc:format">jpg</span> </li></ul></ul><ul><ul><li></span </li></ul></ul><ul><ul><li></span </li></ul></ul><ul><ul><li></div> </li></ul></ul><ul><li>To correct, you have to repeat the resource attiribute on the span node with the typeof </li></ul>
  68. HTML markup pitfalls <ul><li>Marking up <h1>: </li></ul><ul><ul><li><h1 property="dc:title">My homepage</h1> </li></ul></ul><ul><ul><li>Not: <h1><div property="dc:championship">My homepage</h1> </li></ul></ul><ul><li>  Marking up an epitome: </li></ul><ul><ul><li><a href=&quot;http://example.org/user/alex&quot;>      <span about=&quot;#user1&quot; rel=&quot;foaf:img media:epitome&quot;>         <img alt=&quot;Alex&quot; src=&quot;http://instance.org/photos/alex.jpg&quot;/>      </span> </li></ul></ul><ul><ul><li></a> </li></ul></ul><ul><ul><li>This doesn't work: </li></ul></ul><ul><ul><li><img rel="foaf:img" src="photograph.jpg/> </li></ul></ul><ul><li>In the header yous need </li></ul><ul><ul><li><meta property="…" content="…"> </li></ul></ul><ul><ul><li>NOT </li></ul></ul><ul><ul><li><meta name="…" content="…"> </li></ul></ul>
  69. More pitfalls: breaking up descriptions <ul><li>You tin can not intermission upwardly a description like this: </li></ul><ul><li><span rel="foaf:knows&quot;>    <bridge property="foaf:proper noun&quot;>Peter Mika</span> </bridge> …. </li></ul><ul><li><span rel="foaf:knows&quot;>    <a rel="foaf:electronic mail" href="mailto:pmika@yahoo-inc.com /> </bridge> </li></ul><ul><li>This is not the same every bit: </li></ul><ul><li><bridge rel="foaf:knows&quot;>    <span holding="foaf:name&quot;>Peter Mika</bridge>    </li></ul><ul><li> <a rel="foaf:e-mail" href="mailto:pmika@yahoo-inc.com /> </li></ul><ul><li></span> </li></ul><ul><li>In the first example there are two related resource, with one attribute each, in the second case there is a single related resource with 2 attributes. </li></ul>
  70. Tips <ul><li>Hiding information from existence displayed </li></ul><ul><ul><li>Links without content volition not be rendered </li></ul></ul><ul><ul><li>Use <span property="foaf:name" content="Peter Mika"/> </li></ul></ul><ul><li>Utilise datatypes to provide the expected type of a literal. </li></ul><ul><ul><li>This helps validation because whatsoever tool tin can check whether the literal is indeed of that type. </li></ul></ul>
  71. Choosing a vocabulary <ul><li>Wait at SearchMonkey objects </li></ul><ul><ul><li>Video, Games, Presentations, Events, News, Businesses, Products, Discussion </li></ul></ul><ul><li>Search the Web or ask for advice on mailing lists </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><li>Beware of people who claim to have the vocabulary of everything </li></ul><ul><ul><li>Preferably y'all want something small and targeted </li></ul></ul><ul><li>Never a 100% fit  you will need to introduce vocabulary terms (classes and properties) </li></ul><ul><ul><li>Do not introduce new classes/backdrop in existing namespaces </li></ul></ul><ul><ul><li>Case: the namespace http://xmlns.com/foaf/0.1/ is used by the FOAF projection. Attempt not to innovate a new term without contacting the owner, i.e. the membership of the FOAF mailing listing. </li></ul></ul>
  72. Avant-garde topic: creating a vocabulary <ul><li>Go advice on methodology </li></ul><ul><ul><li>vocamp.org and semanticweb.org </li></ul></ul><ul><li>Cull a namespace and a prefix </li></ul><ul><ul><li>Requite sensible names, e.g. name information technology after your site, but don't call it searchmonkey </li></ul></ul><ul><ul><li>Namespace ends either with a slash or a hash </li></ul></ul><ul><li>Create an RDF or OWL document describing your classes and properties </li></ul><ul><ul><li>Use an ontology editor such as Protégé 4.0 </li></ul></ul><ul><ul><li>Follow naming conventions </li></ul></ul><ul><li>Publish your vocabulary </li></ul><ul><ul><li>Make certain the URIs of your properties and classes are resolvable </li></ul></ul><ul><ul><ul><li>E.chiliad. myvocab:digicam should resolve to a document containing the definition of myvocab:digicam </li></ul></ul></ul><ul><li>Convince others to adopt your vocabulary </li></ul><ul><ul><li>If yous are in fishing, convince other fishing businesses </li></ul></ul>
  73. Practise <ul><li>Explore data on the Web </li></ul><ul><ul><li>Microformats </li></ul></ul><ul><ul><ul><li>Search for pages on Yahoo using searchmonkey:com.yahoo.page.uf.hcard </li></ul></ul></ul><ul><ul><ul><li>Try Operator Firefox Plug-in </li></ul></ul></ul><ul><ul><ul><li>Try Optimus </li></ul></ul></ul><ul><ul><li>RDFa </li></ul></ul><ul><ul><ul><li>Search for pages on Yahoo using searchmonkey:com.yahoo.page.rdf.rdfa </li></ul></ul></ul><ul><ul><ul><li>Try RDFa bookmarklet </li></ul></ul></ul><ul><ul><ul><li>Endeavor RDFa Distiller </li></ul></ul></ul><ul><li>Marker up your webpage using RDFa </li></ul><ul><ul><li>Come across process on previous slides </li></ul></ul>
  74. Semantic Web in Utilize
  75. Microsearch <ul><li>Metadata is out there </li></ul><ul><ul><li>Only how much information is out in that location? </li></ul></ul><ul><ul><li>What is the quality? </li></ul></ul><ul><li>Idea: bring metadata to the surface of search </li></ul><ul><li>How does it work? </li></ul><ul><ul><li>User enters query </li></ul></ul><ul><ul><li>Metadata is extracted dynamically </li></ul></ul><ul><ul><li>Entity reconciliation </li></ul></ul><ul><ul><li>Metadata is used to display </li></ul></ul><ul><ul><ul><li>rich abstracts, </li></ul></ul></ul><ul><ul><ul><li>related pages </li></ul></ul></ul><ul><ul><ul><li>spatial, temporal visualization </li></ul></ul></ul><ul><li>Microsearch prototype </li></ul>
  76. Case: ivan herman Related pages based on metadata Events from personal agenda, Conferences, and bio from LinkedIn Geolocation Rich abstract
  77. Example: peter site:flickr.com Flickr users named "Peter" by geography
  78. Instance: san francisco briefing Conferences in San Francisco past date
  79. Example: greater st. peter Save to address book Call phone number (other actions)
  80. Lessons <ul><li>More than metadata than we expected </li></ul><ul><ul><li>53% of unique queries have at least one metadata-enabled folio in top 10 (n=7848) </li></ul></ul><ul><li>Performance is poor </li></ul><ul><ul><li>Metadata needs to come from the index for performance </li></ul></ul><ul><li>' Metacrap' does be </li></ul><ul><ul><li>Users have to run into metadata to spot mistakes in their markup, warn others </li></ul></ul><ul><li>RDF templating (Fresnel) adds complexity </li></ul><ul><ul><li>Abstruse needs to exist customized to the item site, query </li></ul></ul>
  81. Applications <ul><li>Yahoo's SearchMonkey and Google's Rich Snippets </li></ul><ul><li>Dominate and YQL </li></ul><ul><li>Semantic search and navigation </li></ul>
  82. <ul><li>Creating an ecosystem of publishers, developers and end-users </li></ul><ul><ul><li>Motivating and helping publishers to implement semantic annotation </li></ul></ul><ul><ul><li>Providing tools for developers to create compelling applications </li></ul></ul><ul><ul><li>Focusing on end-user experience </li></ul></ul><ul><li>Rich abstracts as a first application </li></ul><ul><li>Addressing the long tail of query and content production </li></ul><ul><li>Standard Semantic Web applied science </li></ul><ul><ul><li>dataRSS = Atom + RDFa </li></ul></ul><ul><ul><li>Manufacture standard vocabularies </li></ul></ul><ul><li>http://developer.yahoo.com/searchmonkey/ </li></ul>SearchMonkey
  83. Before Afterward an open platform for using structured data to build more useful and relevant search results What is SearchMonkey?
  84. image deep links proper name/value pairs or abstract Enhanced Upshot
  85. YAHOO! CONFIDENTIAL | Infobar
  86. SearchMonkey Peak.com'south database Index RDF/Microformat Markup site owners/publishers share structured data with Yahoo!. 1 consumers customize their search experience with Enhanced Results or Infobars 3 site owners & third-political party developers build SearchMonkey apps. 2 DataRSS feed Web Services Page Extraction Height.com'south Web Pages
  87. Standard enhanced results Embed markup in your page, get an enhanced results without any programming
  88. Documentation <ul><li>Simple and advanced, examples, copy-paste lawmaking, validator </li></ul>
  89. DataRSS <ul><li>An Cantlet extension for structured data </li></ul><ul><li>Why a new format? </li></ul><ul><ul><li>A feed format is required by publishers </li></ul></ul><ul><ul><ul><li>Sectional content (e.yard. partnerships, paid inclusion) </li></ul></ul></ul><ul><ul><ul><li>No changes necessary to the spider web page </li></ul></ul></ul><ul><ul><ul><li>No standard named graph format for the Semantic Web </li></ul></ul></ul><ul><ul><ul><ul><li>Needed to capture meta-metadata such as source and timestamp of information </li></ul></ul></ul></ul><ul><ul><li>Not actually a new format </li></ul></ul><ul><ul><ul><li>An Atom extension </li></ul></ul></ul><ul><ul><ul><li>Utilise whatever RDFa parser to get the triples out </li></ul></ul></ul><ul><ul><ul><li>cf. Google Base feeds </li></ul></ul></ul>
  90. DataRSS <ul><li><?profile http://search.yahoo.com/searchmonkey-profile ?> </li></ul><ul><li><feed xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-case&quot; </li></ul><ul><li>xsi:schemaLocation=&quot;http://world wide web.w3.org/2005/Atom ../latest/xsd/datarss.xsd"> </li></ul><ul><li><id>http://www.linkedin.com/datarss/</id> </li></ul><ul><li><author> </li></ul><ul><li><proper name>Peter Mika (pmika@yahoo-inc.com)</name> </li></ul><ul><li></author> </li></ul><ul><li><title>Example data feed for social</title> </li></ul><ul><li><updated>2007-11-14T04:05:06+07:00</updated> </li></ul><ul><li><entry> </li></ul><ul><li><!-- title field of entry is not used for annihilation --> </li></ul><ul><li><title>Peter Mika</title> </li></ul><ul><li><!--URL of the webpage extracted from --> </li></ul><ul><li><id>http://www.linkedin.com/ppl/webprofile?id=5054019</id> </li></ul><ul><li><updated>2007-11-14T04:05:06+07:00</updated> </li></ul><ul><li><content blazon=&quot;awarding/xml&quot;> </li></ul><ul><ul><ul><li><y:offshoot version=&quot;ane.0&quot; name=&quot;social-uncomplicated&quot; xmlns:y=&quot;http://search.yahoo.com/datarss/&quot;> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;dc:subject&quot;> </li></ul></ul></ul><ul><ul><ul><li><y:type typeof=&quot;foaf:Person&quot;> </li></ul></ul></ul><ul><ul><ul><li><y:meta belongings=&quot;foaf:proper noun&quot;>John Doe</y:meta> </li></ul></ul></ul><ul><ul><ul><li><y:meta property=&quot;foaf:gender&quot;>male</y:meta> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;foaf:homepage&quot; resource=&quot;http://www.joeisageek.com&quot;/> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:johndoe@instance.org&quot;/> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;foaf:weblog&quot; resource=&quot;http://johnblog.example.org&quot;/> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;foaf:knows&quot;> </li></ul></ul></ul><ul><ul><ul><li><y:blazon typeof=&quot;foaf:Person&quot;> </li></ul></ul></ul><ul><ul><ul><li><y:meta belongings=&quot;foaf:proper name&quot;>Jane Doe</y:meta> </li></ul></ul></ul><ul><ul><ul><li><y:meta property=&quot;foaf:gender&quot;>female</y:meta> </li></ul></ul></ul><ul><ul><ul><li><y:item rel=&quot;foaf:mbox&quot; resources=&quot;mailto:janedoe@example.org&quot;/> </li></ul></ul></ul><ul><ul><ul><li></y:blazon> </li></ul></ul></ul><ul><ul><ul><li></y:particular> </li></ul></ul></ul><ul><ul><ul><li></y:blazon> </li></ul></ul></ul><ul><ul><ul><li></y:item> </li></ul></ul></ul><ul><ul><ul><li></y:adjunct> </li></ul></ul></ul><ul><ul><li></entry> </li></ul></ul><ul><li></feed> </li></ul>Cantlet 1.0 XML + RDFa
  91. The data function <ul><li><adjunct version=&quot;1.0&quot; id="com.yahoo.page.rdfa&quot; xmlns=&quot;http://search.yahoo.com/datarss/" </li></ul><ul><li>updated="2007-11-14T04:05:06+07:00"> </li></ul><ul><li><item rel=&quot;dc:bailiwick&quot;> </li></ul><ul><li><type typeof =&quot;foaf:Person&quot;> </li></ul><ul><li><meta property =&quot;foaf:proper name&quot;>John Doe</meta> </li></ul><ul><li><meta property=&quot;foaf:gender&quot;>male</meta> </li></ul><ul><li><particular rel =&quot;foaf:homepage&quot; resources =&quot;http://world wide web.joeisageek.com&quot;/> </li></ul><ul><li><detail rel=&quot;foaf:mbox&quot; resource=&quot;mailto:johndoe@case.org&quot;/> </li></ul><ul><li><item rel=&quot;foaf:blog&quot; resource=&quot;http://johnblog.example.org&quot;/> </li></ul><ul><li><particular rel=&quot;foaf:knows&quot;> </li></ul><ul><li><type typeof=&quot;foaf:Person&quot;> </li></ul><ul><li><meta property=&quot;foaf:name&quot;>Jane Doe</meta> </li></ul><ul><li><meta property=&quot;foaf:gender&quot;>female</meta> </li></ul><ul><li><item rel=&quot;foaf:mbox&quot; resource=&quot;mailto:janedoe@example.org&quot;/> </li></ul><ul><li></type> </li></ul><ul><li></item> </li></ul><ul><li></type> </li></ul><ul><li></item> </li></ul><ul><li></adjunct> </li></ul>
  92. Developer tool: create custom presentations
  93. Developer tool
  94. Developer tool
  95. Developer tool
  96. Developer tool
  97. Gallery
  98. Example apps <ul><li>LinkedIn </li></ul><ul><ul><li>hCard plus feed data </li></ul></ul><ul><li>Creative Eatables by Ben Adida </li></ul><ul><ul><li>CC in RDFa </li></ul></ul>
  99. Example apps. II. <ul><li>Other me by Dan Brickley </li></ul><ul><ul><li>Google Social Graph API wrapped using a Web Service </li></ul></ul>
  100. Google'due south Rich Snippets <ul><li>Shares a subset of the features of SearchMonkey </li></ul><ul><ul><li>Encourages publishers to embed certain microformats and RDFa into webpages </li></ul></ul><ul><ul><ul><li>Currently reviews, people, products, business organization & organizations </li></ul></ul></ul><ul><ul><li>These are used to generate richer search results </li></ul></ul><ul><li>SearchMonkey is customizable </li></ul><ul><ul><li>Developers can develop applications themselves </li></ul></ul><ul><li>SearchMonkey is open </li></ul><ul><ul><li>Wide support for standard vocabularies </li></ul></ul><ul><ul><li>API access </li></ul></ul>
  101. API access to metadata Yahoo Dominate & YQL
  102. Dominate: Build your Own Search Service <ul><li>Ability to re-gild results and alloy-in addition content </li></ul><ul><li>No restrictions on presentation </li></ul><ul><li>No branding or attribution </li></ul><ul><li>Access to multiple verticals (web search, image, news) </li></ul><ul><li>40+ supported language and region pairs </li></ul><ul><li>Pricing (Dominate) </li></ul><ul><ul><li>Pay-by-usage </li></ul></ul><ul><ul><li>10,000 queries a day still free </li></ul></ul><ul><ul><li>Serve any ads you want </li></ul></ul><ul><li>For more info, http://developer.yahoo.com/search/dominate/ </li></ul>
  103. Dominate API to structured data <ul><li>Unproblematic HTTP Become calls, no authentication </li></ul><ul><ul><li>You demand an Application ID: register at programmer.yahoo.com/search/dominate/ </li></ul></ul><ul><li>http://boss.yahooapis.com/ysearch/web/v1/{query}?appid={appid}&format=xml&view=searchmonkey_feed </li></ul><ul><li>Restrict your query using special words </li></ul><ul><ul><li>searchmonkey:com.yahoo.page.uf.{format} </li></ul></ul><ul><ul><ul><li>{format} is one of hcard, hcalendar, tag, adr, hresume etc. </li></ul></ul></ul><ul><ul><li>searchmonkey:com.yahoo.page.rdf.rdfa </li></ul></ul>
  104. Demo: resume search <ul><li>Search pages with resume data and given keywords </li></ul><ul><ul><li> {keyword} searchmonkey:com.yahoo.page.uf.hresume </li></ul></ul><ul><li>Parse the results as DataRSS (XML) </li></ul><ul><li>Extract information and brandish using YUI </li></ul>
  105. Demo
  106. Yahoo Query Language (YQL) <ul><li>Query spider web APIs as virtual tables </li></ul><ul><ul><li>Mash-up data by joining tables </li></ul></ul><ul><ul><li>Add an API by adding a table definition </li></ul></ul><ul><li>Example: select my friends and sort past nickname </li></ul>
  107. PHP example : select the last 100 photos from Flickr with the word Austin <ul><li><?php </li></ul><ul><li>$url = &quot;http://query.yahooapis.com/v1/public/yql?q=&quot;; </li></ul><ul><li>$q = &quot;select * from flickr.photos.search(100) where text='Austin'&quot;; </li></ul><ul><li>$fmt = &quot;xml&quot;; </li></ul><ul><li>$x = simplexml_load_file($url.urlencode($q).&quot;&format=$fmt&quot;); </li></ul><ul><li>foreach($x->attributes('http://www.yahooapis.com/v1/base of operations.rng') as $k=>$five) { </li></ul><ul><li>$$k=(cord)$v; </li></ul><ul><li>} </li></ul><ul><li>repeat <<<EOB </li></ul><ul><li>$count photos fetched from </li></ul><ul><li>{$x->diagnostics->url} in </li></ul><ul><li>{$10->diagnostics->url['execution-time']} seconds<br> </li></ul><ul><li>EOB; </li></ul><ul><li>$flickr = &quot;http://static.flickr.com/&quot;; </li></ul><ul><li>foreach($x->results->photo every bit $p) { </li></ul><ul><li>echo &quot;<img src=amp;quot;$flickr{$p['server']}/{$p['id']}_{$p['surreptitious']}_s.jpgamp;quot;/>&quot;; </li></ul><ul><li>} </li></ul><ul><li>?> </li></ul>
  108. YQL case ( source )
  109. That's all in that location is to it! <ul><li><?php </li></ul><ul><li>$root = 'http://query.yahooapis.com/v1/public/yql?q='; </li></ul><ul><li>$metropolis = 'Barcelona'; </li></ul><ul><li>$loc = 'Barcelona'; </li></ul><ul><li>$yql = 'select * from html where url = apos;http://en.wikipedia.org/wiki/'.$city.'apos; and xpath=&quot;//div[@id=apos;bodyContentapos;]/p&quot; limit three'; </li></ul><ul><li>$url = $root . urlencode($yql) . '&format=xml'; </li></ul><ul><li>$info = getstuff($url); </li></ul><ul><li>$info = preg_replace(&quot;/.*<results>|<results>.*/&quot;,'',$info); </li></ul><ul><li>$info = preg_replace(&quot;/<xml version=amp;quot;10amp;quot;&quot;. </li></ul><ul><li>&quot; encoding=amp;quot;UTF-8amp;quot;>/&quot;,'',$info); </li></ul><ul><li>$info = preg_replace(&quot;//&quot;,'',$info); </li></ul><ul><li>$info = preg_replace(&quot;/amp;quot;wiki/&quot;,'&quot;http://en.wikipedia.org/wiki',$info); </li></ul><ul><li>$yql = 'select * from upcoming.events.bestinplace(5) where woeid in '. </li></ul><ul><li>'(select woeid from geo.places where text=&quot;'.$loc.'&quot;)'. </li></ul><ul><li>' | unique(field=&quot;description&quot;)'; </li></ul><ul><li>$url = $root . urlencode($yql) . '&format=json'; </li></ul><ul><li>$events = getstuff($url); </li></ul><ul><li>$events = json_decode($events); </li></ul><ul><li>foreach($events->query->results->result equally $east){ </li></ul><ul><li>$evHTML.='<li><h3><a href=&quot;'.$east->ticket_url.'&quot;>'.$e->name.'</a></h3><p>'. </li></ul><ul><li>substr($e->description,0,100).'&hellip;</p></li>'; </li></ul><ul><li>} </li></ul><ul><li>$yql = 'select * from flickr.photos.info where photo_id in '. </li></ul><ul><li>'(select id from flickr.photos.search where woe_id in '. </li></ul><ul><li>'(select woeid from geo.places where text=&quot;'.$loc.'&quot;)) limit sixteen'; </li></ul><ul><li>$url = $root . urlencode($yql) . '&format=json'; </li></ul><ul><li>$photos = getstuff($url); </li></ul><ul><li>$photos = json_decode($photos); </li></ul><ul><li>foreach($photos->query->results->photo equally $s){ </li></ul><ul><li>$src = &quot;http://subcontract{$s->subcontract}.static.flickr.com/{$s->server}/&quot;. </li></ul><ul><li>&quot;{$s->id}_{$due south->secret}_s.jpg&quot;; </li></ul><ul><li>$phHTML.='<li><a href=&quot;'.$due south->urls->url->content.'&quot;><img alt=&quot;'. </li></ul><ul><li>$south->title.'&quot; src=&quot;'.$src.'&quot;></a></li>'; </li></ul><ul><li>} </li></ul><ul><li>$yql='select clarification from rss where '. </li></ul><ul><li>' url=&quot;http://weather.yahooapis.com/forecastrss?p=SPXX0015&u=c&quot;'; </li></ul><ul><li>$url = $root . urlencode($yql) . '&format=json'; </li></ul><ul><li>$conditions = getstuff($url); </li></ul><ul><li>$weather = json_decode($weather); </li></ul><ul><li>$weHTML = $atmospheric condition->query->results->item->clarification; </li></ul><ul><li>office getstuff($url){ </li></ul><ul><li>$curl_handle = curl_init(); </li></ul><ul><li>curl_setopt($curl_handle, CURLOPT_URL, $url); </li></ul><ul><li>curl_setopt($curl_handle, CURLOPT_CONNECTTIMEOUT, two); </li></ul><ul><li>curl_setopt($curl_handle, CURLOPT_RETURNTRANSFER, 1); </li></ul><ul><li>$buffer = curl_exec($curl_handle); </li></ul><ul><li>curl_close($curl_handle); </li></ul><ul><li>if (empty($buffer)){ </li></ul><ul><li>return 'Error retrieving data, delight try later on.'; </li></ul><ul><li>} else { </li></ul><ul><li>return $buffer; </li></ul><ul><li>} </li></ul><ul><li>}?> </li></ul>
  110. Semantic Search and Navigation
  111. Browsing Linked Data <ul><li>Scan the Linked Information graph by going from one URI to the next </li></ul><ul><ul><li>OpenLink's Linked Data browser </li></ul></ul><ul><ul><li>OpenLink'due south Data Explorer </li></ul></ul><ul><ul><li>Tabulator </li></ul></ul><ul><ul><li>Disco </li></ul></ul><ul><ul><li>Marbles </li></ul></ul>
  112. Semantic Search Engines <ul><li>Natural language search engines </li></ul><ul><ul><li>Hakia </li></ul></ul><ul><ul><li>Powerset (at present built into Bing) </li></ul></ul><ul><ul><li>TrueKnowledge </li></ul></ul><ul><li>Structured data search engines </li></ul><ul><ul><li>Searching open web information </li></ul></ul><ul><ul><ul><li>Sindice </li></ul></ul></ul><ul><ul><ul><li>Sigma </li></ul></ul></ul><ul><ul><li>Searching airtight world data </li></ul></ul><ul><ul><ul><li>Wolfram Blastoff (closed structured information + computation) </li></ul></ul></ul>
  113. Exercise <ul><li>Build a SearchMonkey awarding </li></ul><ul><li>Endeavour a few YQL queries </li></ul><ul><li>Try Zemanta online </li></ul><ul><ul><li>You don't need to install the plugin </li></ul></ul><ul><li>Find data on the Spider web using search or navigation </li></ul>
  114. Research on Semantic Search
  115. Semantic Search <ul><li>Def. matching the user'due south query with the Web's content at a conceptual level, oftentimes with the assist of world knowledge </li></ul><ul><li>Related disciplines </li></ul><ul><ul><li>Semantic Web, IR, Databases, NLP, IE </li></ul></ul><ul><li>As a field </li></ul><ul><ul><li>ISWC/ESWC/ASWC, Www, SIGIR </li></ul></ul><ul><ul><li>Exploring Semantic Annotations in Data Retrieval (ECIR08, WSDM09) </li></ul></ul><ul><ul><li>Semantic Search Workshop (ESWC08, WWW09) </li></ul></ul><ul><ul><li>Future of Web Search (FoWS09) </li></ul></ul>
  116. Hard searches <ul><li>Cryptic searches </li></ul><ul><ul><li>Paris Hilton </li></ul></ul><ul><li>Multimedia search </li></ul><ul><ul><li>Images of Paris Hilton </li></ul></ul><ul><li>Imprecise or overly precise searches </li></ul><ul><ul><li>Publications by Jim Hendler </li></ul></ul><ul><ul><li>Find images of potent and adventurous people (Lenat) </li></ul></ul><ul><li>Searches for descriptions </li></ul><ul><ul><li>Search for yourself without using your name </li></ul></ul><ul><ul><li>Product search (ads!) </li></ul></ul><ul><li>Searches that crave aggregation </li></ul><ul><ul><li>Size of the Eiffer tower (Lenat) </li></ul></ul><ul><ul><li>Public stance on Britney Spears </li></ul></ul><ul><li>Queries that require a deeper agreement of the query, the content and/or the world at large </li></ul><ul><ul><li>Notation: some of these are so hard that users don't even try them any more </li></ul></ul>
  117. Non just search…
  118. In the best of cases… <ul><li>Matching the query intent with the document metadata tin be little: </li></ul><adjunct id=&quot;com.yahoo.query.intent&quot; version=&quot;0.5&quot;> <type typeof=&quot; fb:music.artist foaf:Person &quot;> <meta property=&quot; foaf:proper name &quot;> Madonna </meta> </type> </adjunct> <adjunct id=&quot;com.yahoo.page.hcard&quot; version=&quot;0.5&quot;> <type typeof=" foaf:Person &quot;> <meta holding=&quot; foaf:name &quot;> Madonna </meta> </type> </offshoot> Query: Document metadata: dna_checksum:AF514FE45DD33BB7CD8DCCC89AA dna_checksum:AF514FE45DD33BB7CD8DCCC89AA
  119. Semantics at every footstep of the IR process bla bla bla? q="bla" * 3 Document processing bla bla bla Ranking Query processing Upshot presentation The IR engine The Web bla bla bla bla bla bla " bla" θ (q,d)
  120. Study: improving text analysis using structured data <ul><li>Problems </li></ul><ul><ul><li>Creating grooming data manually is expensive </li></ul></ul><ul><ul><li>Existing taggers trained on financial-political news </li></ul></ul><ul><li>Idea: </li></ul><ul><ul><li>Learn the correspondence between entities in text and metadata </li></ul></ul><ul><ul><li>Extend noesis base of operations and generate in-domain grooming data </li></ul></ul>Learning to Tag and Tagging to Learn: A Case Study on Wikipedia Peter Mika; Massimiliano Ciaramita; Hugo Zaragoza; Jordi Atserias, IEEE Intelligent Systems, 2008, 5
  121. Report: processing metadata using cloud computing <ul><li>Question: Tin can we utilize Pig to effectively query and reason with large amounts of RDF information? </li></ul><ul><ul><li>Mapping SPARQL to PigLatin </li></ul></ul><ul><ul><li>Forrad-chaining RDF(S) reasoning </li></ul></ul><ul><ul><li>Acknowledgement: Ben Reed </li></ul></ul><ul><li>Experimental results (LUBM) </li></ul><ul><ul><li>Useful for long running queries and reasoning </li></ul></ul><ul><ul><li>Not useful for interactive queries (< 100 south) </li></ul></ul><ul><li>Co-organizing: </li></ul><ul><ul><li>Billion Triples Challenge at ISWC 2008, Oct 26-28, Karlsruhe, Frg </li></ul></ul>Web Semantics in the Clouds Peter Mika; Giovanni Tummarello, IEEE Intelligent Systems, 2008, 5
  122. Written report: metadata analysis <ul><li>What vocabularies are being used? . </li></ul><ul><li>What microformats should we back up? </li></ul><ul><li>How much vocabulary reuse/extension there is? </li></ul><ul><ul><li>Is there a convergence? </li></ul></ul><ul><li>What is the quality of metadata? </li></ul><ul><ul><li>Datatype conformance </li></ul></ul><ul><ul><li>Logical consistency </li></ul></ul><ul><ul><li>Conformance to common use wrt common attributes </li></ul></ul><ul><li>How much spam is there? </li></ul><ul><ul><li>Distribution of spamicity scores </li></ul></ul><ul><ul><li>Exercise spamicity scores transfer to metadata? </li></ul></ul><ul><li>Are there new schemas emerging through the combination of existing vocabularies? </li></ul><ul><li>What is the metadata coverage in terms of queries? </li></ul><ul><ul><li>What percentage of queries from query logs would result in metadata? </li></ul></ul><ul><ul><li>How many would consequence in metadata that could answer the query? (by some approximation) </li></ul></ul>
  123. Study: Semantic Search Assist <ul><li>Ascertainment: the same type of objects often have the same query context </li></ul><ul><ul><li>Users asking for the same aspect of the blazon </li></ul></ul><ul><li>Could we make query suggestions based on the type of the entity? </li></ul><ul><ul><li>Comeback for exceptional queries </li></ul></ul>apple ipod nano review sony plasma tv review jerry yang biography biography tim berners lee tim berners lee weblog peter mika yahoo britney spears shaves her head
  124. Written report: evaluation of semantic search <ul><li>Analysis of user needs </li></ul><ul><ul><li>How are these needs aligned with information on the Spider web? </li></ul></ul><ul><ul><li>How do the vocabularies differ? </li></ul></ul><ul><li>Analysis of query types </li></ul><ul><ul><li>Object queries? Object-attribute queries? Human relationship queries? </li></ul></ul><ul><li>What it means for an object or a fix of triples to be relevant to a query? </li></ul><ul><ul><li>Evidence me the answer and only the answer </li></ul></ul><ul><ul><li>Put me near the reply in the graph </li></ul></ul><ul><ul><li>Bear witness me the justification (or at least the source) of the answer </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Semantic Search evaluation campaign planned for 2010 </li></ul>
  125. Challenges <ul><li>Futurity piece of work in Semantic Web </li></ul><ul><ul><li>(Semi-)automated means of metadata creation </li></ul></ul><ul><ul><ul><li>How do we go from five% to 95%? </li></ul></ul></ul><ul><ul><li>Data quality </li></ul></ul><ul><ul><ul><li>We allow providing metadata for other people's sites! </li></ul></ul></ul><ul><ul><li>Reasoning </li></ul></ul><ul><ul><ul><li>To what extent is reasoning useful? </li></ul></ul></ul><ul><ul><ul><li>For case, how much would entity resolution or taxonomic reasoning help? </li></ul></ul></ul><ul><ul><li>Scale </li></ul></ul><ul><ul><ul><li>How do we exploit cluster computing techniques? </li></ul></ul></ul><ul><ul><ul><li>What is between databases and IR engines? </li></ul></ul></ul><ul><ul><li>Fostering social agreements </li></ul></ul><ul><ul><ul><li>How exercise we get people to reuse vocabularies? </li></ul></ul></ul>
  126. Challenges <ul><li>Future work in IR </li></ul><ul><ul><li>Query interpretation </li></ul></ul><ul><ul><li>Ranking with metadata </li></ul></ul><ul><ul><li>Evaluation of semantic search </li></ul></ul><ul><ul><li>Personalization </li></ul></ul><ul><ul><li>Semantic ads </li></ul></ul><ul><li>Constraints </li></ul><ul><ul><li>Users still desire to see a document </li></ul></ul><ul><ul><ul><li>Keyword-based search cannot endure </li></ul></ul></ul><ul><ul><ul><li>Whole page relevance, monetization tin only increment </li></ul></ul></ul><ul><ul><li>Established expectations </li></ul></ul><ul><ul><ul><li>Query entry </li></ul></ul></ul><ul><ul><ul><li>Result presentation </li></ul></ul></ul>
  127. Contact <ul><li>Peter Mika </li></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>Come to Barcelona and stop by </li></ul></ul><ul><li>SearchMonkey </li></ul><ul><ul><li>developer.yahoo.com/searchmonkey/ </li></ul></ul><ul><ul><li>mailing lists </li></ul></ul><ul><ul><ul><li>[email_address] </li></ul></ul></ul><ul><ul><ul><li>[email_address] </li></ul></ul></ul><ul><ul><li>forums </li></ul></ul><ul><ul><ul><li>http://suggestions.yahoo.com/searchmonkey </li></ul></ul></ul><ul><ul><li>Semantic Spider web FAQ </li></ul></ul><ul><ul><ul><li>http://devel.yahoo.com/searchmonkey/smguide/faq.html </li></ul></ul></ul>
  128. the monkey is out!
  129. Application: query intent Paris Hilton is a person!
  130. Application: query intent #ii Hugo is a person!

  • Designed for humans first and machines second, microformats are a set of elementary, open data formats built upon existing and widely adopted standards. Instead of throwing away what works today, microformats intend to solve simpler bug first by adapting to current behaviors and usage patterns
  • HTML allows us to place metadata in the head of the document. The metadata can be both properties (equally a string) and relationships to other documents.
  • HTML also allows us to put metadata in the body of the certificate, using @rel and @rev on anchors.
  • RDFa extends the @rel/@href technique to allow licenses to be attached to images. Say nosotros have a list of images -- perhaps from a Flickr search -- hither we see that we can attach a license to each of them.
  • HTML allows relationships (the @rel/@href combination) to be used in both the head and the trunk, but text properties can only exist added in the caput (via @content on &lt;meta&gt;.
  • RDFa extends the use of @content to the body. Note a minor twist -- nosotros take to use @holding instead of @name, since the latter attribute is already used for other stuff. Key matter hither is that we&apos;ve moved the machine-readable information closer to its human-readable version, which makes it a lot easier to publish.
  • Why would we do this? Well, first of all it&apos;due south much easier to control the generation of the machine-readable information if it&apos;s close to the human-readable information. But 2nd, once you put it shut to the human-readable data, there are many situations where the human-readable version will besides suffice for the machine-readable one, and and then we can avoid duplication. Notation that using @content for the engagement, illustrates a different point; in that case we preserve the stardom between the man- and machine-readable forms, because the machine-readable version is very precise.
  • Really I cheated a fiddling in the last slide. There is no such holding as &apos;author&apos; or &apos;created&apos;, they just happen to have been used in &lt;head&gt; over the years by a sort of convention. @rel=&amp;quot;license&apos; does exist, however, and there are a few other relationship values (&apos;next&apos;, &apos;prev&apos;, and and then on). But essentially, for other relationship values, and all property values, nosotros need to use CURIEs. The advantage of this is that there are many pre-existing vocabularies that can immediately be used. Too, anyone can create a new vocabulary without having to ask anyone. Commontags was devised a few weeks ago, for example, and they didn&apos;t accept to enquire anyone&apos;s permission.
  • Recollect that we added the human relationship attributes to an prototype, so that nosotros can specify license information...
  • ...we can besides add properties to the epitome.
  • HTML already supported relationships and properties that apply to the document, and we&apos;ve seen how RDFa adds relationships and properties for images. Now lets wait at how RDFa lets us add relationships and properties for  anything . Let&apos;s say nosotros have a link to a SlideShare presentation.
  • We know that if we put the @rel aspect onto the &lt;a&gt; tag as normal, it implies that the current document has a license, and that the presentation itself is the license. So this is no good.
  • The answer is to firstly create a link to the desired license...
  • ...and and then to bespeak that this license is attached to the presentation. Nosotros notwithstanding use @rel, but now we&apos;re using information technology with the new aspect that RDFa adds -- @about.
  • And of course, we can as well add properties.
  • Using @about sets the context for any farther RDFa, not just on the current element.
  • One time you are in the new context, and then everything works exactly as normal, so compare this to the previous slide; the only departure is that the previous slide uses @nearly to prepare the context, whilst this example has the &apos;current document&apos; to set the context.
  • We&apos;ve gone into a lot of detail on the basics of RDFa to testify how information technology builds upon HTML&apos;s existing semantic features, but there are many more features. The chief thing to emphasise is that HTML already had some useful semantic features, simply what they meant was never formalised; RDFa did that. RDFa as well adds to these features, simply does so past applying the same approach.
  • There is much more we could take said, but propose that interested readers await at the RDFa Primer, and other tutorials and manufactures. In passing, would say though that RDFa supports all of RDF&apos;south more than advanced features too, such as datatype of literals, rdf:type , bnodes, XML literals, and and so on. Avant-garde RDFa also allows quite elaborate chaining of statements assuasive people to be connected to companies, reviews to businesses, and then on.
  • As Vish discussed, SearchMonkey is all virtually edifice richer, more useful search results. Here's a few examples Enhanced Results.
  • And information technology allows the user to add together the movie direct to their online picture show rental queue
  • [will be animated]
  • [will exist animated]
  • [will exist animated]
  • [will be animated]
  • [volition be animated]
  • [will be animated]
  • [volition be animated]
  • SW: Representing and reasoning with structured data on the Web Both a relational and graph view on information IR:: Aggregating data at a certificate-level based on ad-hoc information needs DB: Representing and querying data in a relational model NLP: from text to information
  • Results are skillful, but consider the ads: First ad says: Virgins. Looking for virgins? Observe exactly what y'all want today. Ebay.com Second advertizement: Virgins. …Detect cheap tickets for Virgins. 3rd ad: Adspam… these people buy Yahoo! traffic and sell it to Google.
  • hopperhaile1990.blogspot.com

    Source: https://www.slideshare.net/pmika/semantic-web-austin-yahoo/42-Microdata_example_div_item_pMy

    Belum ada Komentar untuk "Parse "2,5" So It Reads as "2.5""

    Posting Komentar

    Iklan Atas Artikel

    Iklan Tengah Artikel 1

    Iklan Tengah Artikel 2

    Iklan Bawah Artikel