|
|
|
|
|
by zulko
649 days ago
|
|
> reduce it do only sections that are likely to be relevant (eg. "Life and career") True but I also managed to do this from HTML. I tried getting pages wikitext through the API but couldn't find how to. Just querying the HTML page was less friction and fast enough that I didn't need a dump (although when AI becomes cheap enough, there is probably a lot of things to do from a wikipedia dump!). One advantage of using online wikipedia instead of a dump is that I have a pipeline on Github Actions where I just enter a composer name and it automagically scrapes the web and adds the composer to the database (takes exactly one minute from the click of the button!). |
|