Page Locations

This class is an intermediate between the Element class and the MiningEngine class. It holds all of the Elements you want to mine data from, so that the MiningEngine knows where to locate everything.

It does not contain any functions, just static variables (defined outside of an __init__ function) containing Elements.

class centaurminer.PageLocations

The base class for locating article elements on a site.

Some fields are gathered in a default way, based on standard metadata:

  • title : MetaData(“citation_title”)

  • authors : MetaData(“citation_author”)

  • doi : MetaData(“citation_doi”)

  • abstract : MetaData(“citation_abstract”)

  • date_publication : MetaData(“citation_date”)

These can be overwritten if required. Also, see centaurminer.MiningEngine to see how authors specifically are handled.

To include an Element for another piece of information, just subclass this class and add a static variable that stores an Element.