Develop for Haystax

At NewsHack Day in San Francisco, the team behind Haystax started with a wish list, outlining several common types of public databases users might want to scrape. Here's an outline of our next steps. In each case, the desired outcome would be to download a complete copy of the database, preferably in CSV format.

Want to contribute to Haystax? Fork the code from Github or join our open discussion forum.

WISH LIST:

BRITISH COLUMBIA TEACHER DATABASE

Structure

Null searches aren't allowed. There's no way to list all the records at once, so Haystax should be able to search using an array of terms entered through the search form -- not through the URL. Results are only returned 10 at a time and are formatted in a table. The program must then navigate through multiple levels to return data from main pages and detail pages by triggering javascript.

Process


CALIFORNIA DEPARTMENT OF CONSUMER AFFAIRS FIDUCIARY LICENSEES

Structure

Accepts a null search, returning a paginated table of all results. Although it appears as a simple table, it does require users to navigate through to detail pages. Up to 100 results are returned at a time, depending on a parameter set by the user from a drop down box.

Process


ADVANCED VERSION USE CASES

British Columbia physicians and surgeons database

Search results limited to 200 results.

State Bar of CA

Search results limited to 500 results. Contains detail pages.

Obama-Biden transition team memos

Uses div tags instead of HTML tables. Requires file downloads.