Hacker News new | ask | show | jobs
by scblock 302 days ago
September 30 is a pretty small window to migrate. Hopefully it's enough.
2 comments

Coincidentally, and fortunately, I was already in the process of migrating a blog with 10 year’s worth of content from TypePad. It’s not enough.

TypePad’s export process is awful: the output is a poorly formatted .txt file which is hard to parse reliably. The export process itself fails at least half the time. There’s no export process for images or style files. Automated crawl processes fail.

I LOL’ed at this part of the post:

>> If you have any questions, please refer to our Frequently Asked Questions page here.

The “here” link 404s.

The link has a typo. It should point to: https://help.typepad.com/shutdown-faq.html
Whats the URL? ArchiveTeam is planning on saving all the blogs to archive.org.
That's unfortunate. Good luck.
At least with LLMs, we can just write a query to migrate the export to whatever target format we want. The main issue is just breaking 20 years worth of inbound links.
Was this somehow not doable without LLMs? It's trivial data massaging.
I mean with LLMs you just copy-paste in the input format of your new CMS and then upload your export file. Even if it only took 15 minutes before, it now takes 0 minutes.
Old solutions scale 1000+ posts but this does not.
It's too bad the LLMs will hallucinate and mess up the content while doing that migration.
I think the best way would be to get an LLM to write a script that would export/import the data, so it only writes the code, doesn't touch the data.
> The main issue is just breaking 20 years worth of inbound links.

That is annoying indeed - it would be nice if the Web had some way to keep links eternally valid. But people didn't even manage Digital Object Identifiers (DOIs) to work beyond the death of the publishing companies that issued them...

Perhaps links could be auto-replaced to archive.org links if they ceased to exist?