Helper Functions

There are currently two helper functions, listed below:

centaurminer.TagList(str_list, tag='item')

Tags a list and converts to a string - avoids data corruption by using a more complex tag.

Parameters
  • str_list (list) – List of strings to be joined with tags.

  • tag (str, optional) – Tag to use for the list. This tag will be wrapped in <…> and </…> to close.

centaurminer.CollectURLs(start_url, link_elem, next_elem=None, limit=10000, **kwargs)

Collects a list of URLs from a search of the site.

Parameters
  • start_url (str) – URL for the first page of a site search.

  • link_elem (centaurminer.Element) – An Element indicating where individual URL links can be found on the search page.

  • next_elem (centaruminer.Element, optional) – Indicates where on the page the “next page” button is, to navigate through search pages.

  • limit (int) – If the number of URLs collected exceeds this number, it will stop searching and return the list or URLs.

  • kwargs – Additional arguments are passed directly into the centaurminer.Engine constructor.