sql - Best method to scrape large number of Wikipedia tables to MySQL database -
what best programmatic way grab html tables of wikipedia main article pages pages' titles match keywords? take column names , table data , put them database.
would grab url , page name attribution.
i don't need specifics recommended methods or links tutorials perhaps.
the easy approach not scrape wikipedia website @ all. of data, metadata, , associated media form wikipedia available in structured formats; preclude need scrape web pages.
to data wikipedia database (which may search, slice , dice 'til heart's content):
- download data files.
- run sqlize tool of choice
- run mysqlimport
- drink coffee.
the url of original article should able re-constructed page title pretty easily.
Comments
Post a Comment