| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zulko 209 days ago

Total plug but this year I scraped 400,000 wikipedia pages with Gemini to create landnotes.org, an atlas where you can ask "what happened in Japan in 1923":

https://landnotes.org/?location=xnd284b0-6&date=1923&strictD...

https://github.com/Zulko/landnotes

My plan has been to overlay historical map borders on top of it, like the Geacron one from this post, but they all seem to be protected by copyright - and understandably so, given the amount of work involved.

6 comments

milst 209 days ago

very cool. Made something with a similar idea, but using timelines instead of maps. I wonder if the two could be combined in some way https://timeline-of-everything.milst.dev/

link

zulko 209 days ago

Nice, how does your timeline work under the hood? Does it read from wikipedia? What could be interesting in your project is to be able to compare timelines. See for instance this website specifically for comparing composer works (with timelines pre-extracted from wikipedia):

https://zulko.github.io/composer-timelines/?selectedComposer...

link

milst 209 days ago

under the hood messages are sent to Tambo which knows how to use the timeline component (sort of like an LLM tool) and can fill the component's props with whatever data it decides. The actual data at this point is completely generated by the LLM (which seems to be ok for historical data on popular topics.) I should add a tool to allow tambo to fetch data from wikipedia before trying to generate timeline data.

Comparing timelines is an awesome idea, understanding when certain events happened in relation to others is really interesting. Maybe even overlaid in different colors or something instead of separate timelines.

Here's the repo btw: https://github.com/MichaelMilstead/timeline-of-everything

link

llbbdd 209 days ago

This is very very cool! I went right to the month and year of my birth; kind of the same vibe as finding a newspaper published on the day you were born but all over the world. Thanks for sharing!

link

pu_pe 209 days ago

This looks pretty cool actually, nice job!

link

lippihom 208 days ago

Wikipedia doesn't have an API?

link

zulko 207 days ago

It does, why?

link

lippihom 198 days ago

If they have an API, why scrape if you can collect their data in the way they want it to be collected?

link

annodomini2019 209 days ago

Wow, this is actually so cool. Fantastic idea, I would LOVE something like this in Wikipedia. Nicely done!

link

zulko 209 days ago

Yeah it would be nice if Wikipedia would host it, but it would probably require some more serious ground work so the project fits in the wikipedia ecosystem. Could be a pipeline Wikipedia -> Wikidata -> Atlas.

There are many projects that could be done with with wikipedia and LLMs, for instance "equalizing" all languages by translating all pages into all other languages where they are missing. Or, more surgically, finding which facts are reported in some languages of a page but not others, and adding these facts to all languages.

For now, it seems that wikipedia doesn't want to use generative AI to produce wikipedia pages, and that's understandable, but there may be a point where model quality will be too good to ignore.

link

annodomini2019 208 days ago

Understandable for not using it to write net-new content from outside sources, but agreed that at some point the translation becomes good enough to bridge all language gaps, where it's simply an obvious call that a translation of the more fully written English article is better than relying on a local writer.

link

qq66 209 days ago

Cool project. Seems like your link to "wiki-dump-extractor" is broken.

link

zulko 209 days ago

Thank you for reporting this!

link