sql - Best method to scrape large number of Wikipedia tables to MySQL database -

April 15, 2013

what best programmatic way grab html tables of wikipedia main article pages pages' titles match keywords? take column names , table data , put them database.

would grab url , page name attribution.

i don't need specifics recommended methods or links tutorials perhaps.

the easy approach not scrape wikipedia website @ all. of data, metadata, , associated media form wikipedia available in structured formats; preclude need scrape web pages.

to data wikipedia database (which may search, slice , dice 'til heart's content):

download data files.
run sqlize tool of choice
run mysqlimport
drink coffee.

the url of original article should able re-constructed page title pretty easily.

Search This Blog

Error

sql - Best method to scrape large number of Wikipedia tables to MySQL database -

Comments

Post a Comment

Popular posts from this blog

basic authentication with http post params android -

c++ - End of file on pipe magic during open -

vb.net - Virtual Keyboard commands -