Mida
Mida is a Microdata extractor/parser library for Ruby.Installation
Mida keeps RubyGems up-to-date with its latest version, so installing is as easy as:$ gem install midaRequirements:
- Nokogiri
Command Line Usage
To use the command line tool, supply it with the urls or filenames that you would like to be parsed (by default each item is output as yaml):mida http://lawrencewoodman.github.com/mida/news/-t switch followed by a Regular Expression:
mida -t /person/i http://lawrencewoodman.github.com/mida/news/mida -hLibrary Usage
The following examples assume that you have requiredmida
and open-uri.
Extracting Microdata from a page
All the Microdata is extracted from a page when a newMida::Document instance is created.To extract all the Microdata from a webpage:
url = 'http://example.com'
open(url) {|f| doc = Mida::Document.new(f, url)}doc.items.To simply list all the top-level Items that have been found:
puts doc.itemsSearching
If you want to search for an Item that has a specific itemtype/vocabulary his can be done with the search method.To return all the Items that use one of Google’s Review vocabularies:
doc.search(%r{http://data-vocabulary\.org.*?review.*?}i)Inspecting an Item
Each Item is aMida::Item instance and has four main methods of
interest: type, vocabulary, properties
and id.To find out the itemtype of the Item:
puts doc.items.first.typeputs doc.items.first.idString or Mida::Item instances.To see the properties of the Item:
puts doc.items.first.propertiesWorking with Vocabularies
Mida allows you to define vocabularies, so that input data can be constrained to match expected patterns. By default a generic vocabulary (Mida::GenericVocabulary) is registered, which will match against
any itemtype with any number of properties.If you want to specify a vocabulary, you create a class derived from
Mida::Vocabulary and use itemtype,
has_one, has_many and extract
to describe the vocabulary.As an example the following describes a subset of Google’s Review vocabulary:
class Rating < Mida::Vocabulary
  itemtype %r{http://data-vocabulary.org/rating}i
  has_one 'best'
  has_one 'worst'
  has_one 'value'
end
class Review < Mida::Vocabulary
  itemtype %r{http://data-vocabulary.org/review}i
  has_one 'itemreviewed'
  has_one 'rating' do
    extract Rating, Mida::DataType::Text
  end
endMida::Vocabulary it automatically
registers the Vocabulary.Now if Mida is parsing some input and manages to match against the
Review Vocabulary, it will only allow the specified
properties and will reject any that don't have the correct number. It will
also set Item#vocabulary accordingly, e.g.
doc.items.first.vocabulary      # => Reviewinclude_vocabulary:
class Thing < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/thing}i
  has_one 'name', 'description'
end
class Book < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/book}i
  include_vocabulary Thing
  has_one 'title', 'author'
end
class Collection < Mida::Vocabulary
  itemtype %r{http://example.com/vocab/collection}i
  has_many 'item' do
    extract Thing
  end
endBook as an item of Collection this would be accepted
because it includes the Thing vocabulary. When examining the item you would
find #vocabulary set to Book and you would have access to all the properties of
Thing and all the properties of Book.
   
      
       
      