MidaMida is a Microdata extractor/parser library for Ruby.
InstallationMida keeps RubyGems up-to-date with its latest version, so installing is as easy as:
Command Line UsageTo use the command line tool, supply it with the urls or filenames that you would like to be parsed (by default each item is output as yaml):
If you want to search for specific types you can use the
-tswitch followed by a Regular Expression: For more information look at mida‘s help:
Library UsageThe following examples assume that you have required
Extracting Microdata from a pageAll the Microdata is extracted from a page when a new
Mida::Documentinstance is created.
To extract all the Microdata from a webpage:
The top-level Items will be held in an array accessible via
To simply list all the top-level Items that have been found:
SearchingIf you want to search for an Item that has a specific itemtype/vocabulary his can be done with the search method.
To return all the Items that use one of Google’s Review vocabularies:
Inspecting an ItemEach Item is a
Mida::Iteminstance and has four main methods of interest:
To find out the itemtype of the Item:
To find out the itemid of the Item:
Properties are returned as a hash containing name/values pairs. The values will be an array of either
To see the properties of the Item:
Working with VocabulariesMida allows you to define vocabularies, so that input data can be constrained to match expected patterns. By default a generic vocabulary (
Mida::GenericVocabulary) is registered, which will match against any itemtype with any number of properties.
If you want to specify a vocabulary, you create a class derived from
extractto describe the vocabulary.
As an example the following describes a subset of Google’s Review vocabulary:
When you create a subclass of
Mida::Vocabularyit automatically registers the Vocabulary.
Now if Mida is parsing some input and manages to match against the
Vocabulary, it will only allow the specified properties and will reject any that don't have the correct number. It will also set
Item#vocabularyaccordingly, e.g. If you want to include the properties of another vocabulary you can use
include_vocabulary: In the above if you gave a
Bookas an item of
Collectionthis would be accepted because it includes the
Thingvocabulary. When examining the item you would find
Bookand you would have access to all the properties of
Thingand all the properties of Book.