| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mihaip 4736 days ago

The tool caches the API responses (in the _raw_data directory), so if you're re-running it, most of the initial requests will be served from the cache.

If the XML parse errors are listing any item IDs, feel free to email them to me (mihai at persistent dot info) and I'll see if there's any workaround from my side.

Edit: If it's "XML parse error when fetching items, retrying with high-fidelity turned off" messages that you're seeing, then those are harmless (assuming no follow-up exceptions). The retry must have succeeded.

1 comments

ivank 4736 days ago

Have you tried the JSON API? (See the requests that Google Reader itself makes.) It requires no cookies and supports getting up 1000 items per continuation.

link

mihaip 4736 days ago

I wrote most of Reader's JSON API in 2006-2007 :)

The tool uses the "high-fidelity" Atom output mode for getting at item bodies. That preserves namespaced XML elements and other extra data from the feed. It uses JSON for everything else, and will fall back to regular Atom output if the high fidelity mode is not well-formed (it was added in late 2010, as things were winding down, and thus never got a lot of testing).

link