Hacker News new | ask | show | jobs
by zulko 209 days ago
Total plug but this year I scraped 400,000 wikipedia pages with Gemini to create landnotes.org, an atlas where you can ask "what happened in Japan in 1923":

https://landnotes.org/?location=xnd284b0-6&date=1923&strictD...

https://github.com/Zulko/landnotes

My plan has been to overlay historical map borders on top of it, like the Geacron one from this post, but they all seem to be protected by copyright - and understandably so, given the amount of work involved.

6 comments

very cool. Made something with a similar idea, but using timelines instead of maps. I wonder if the two could be combined in some way https://timeline-of-everything.milst.dev/
Nice, how does your timeline work under the hood? Does it read from wikipedia? What could be interesting in your project is to be able to compare timelines. See for instance this website specifically for comparing composer works (with timelines pre-extracted from wikipedia):

https://zulko.github.io/composer-timelines/?selectedComposer...

under the hood messages are sent to Tambo which knows how to use the timeline component (sort of like an LLM tool) and can fill the component's props with whatever data it decides. The actual data at this point is completely generated by the LLM (which seems to be ok for historical data on popular topics.) I should add a tool to allow tambo to fetch data from wikipedia before trying to generate timeline data.

Comparing timelines is an awesome idea, understanding when certain events happened in relation to others is really interesting. Maybe even overlaid in different colors or something instead of separate timelines.

Here's the repo btw: https://github.com/MichaelMilstead/timeline-of-everything

This is very very cool! I went right to the month and year of my birth; kind of the same vibe as finding a newspaper published on the day you were born but all over the world. Thanks for sharing!
This looks pretty cool actually, nice job!
Wikipedia doesn't have an API?
It does, why?
If they have an API, why scrape if you can collect their data in the way they want it to be collected?
Wow, this is actually so cool. Fantastic idea, I would LOVE something like this in Wikipedia. Nicely done!
Yeah it would be nice if Wikipedia would host it, but it would probably require some more serious ground work so the project fits in the wikipedia ecosystem. Could be a pipeline Wikipedia -> Wikidata -> Atlas.

There are many projects that could be done with with wikipedia and LLMs, for instance "equalizing" all languages by translating all pages into all other languages where they are missing. Or, more surgically, finding which facts are reported in some languages of a page but not others, and adding these facts to all languages.

For now, it seems that wikipedia doesn't want to use generative AI to produce wikipedia pages, and that's understandable, but there may be a point where model quality will be too good to ignore.

Understandable for not using it to write net-new content from outside sources, but agreed that at some point the translation becomes good enough to bridge all language gaps, where it's simply an obvious call that a translation of the more fully written English article is better than relying on a local writer.
Cool project. Seems like your link to "wiki-dump-extractor" is broken.
Thank you for reporting this!