Hacker News new | ask | show | jobs
by lijogdfljk 1592 days ago
Hah, i am working on a near identical app. How did you deal with pulling content from the web without reinventing wheels?

I was thinking about reusing an existing store of scraping instructions, InstantView by Telegram (iirc that's the name). Looks fairly straight forward to write a parser for that spec. However, i wanted to be able to store some in a repo on my own as well, but i fear DMCA strikes on a repo that stores instructions to scrape pages.

How did you solve this?

2 comments

I used mozilla's Readability.js as the base. So didn't reinvent anything really :D
Since I'm working on a similar project, this is how I am planning to pull content from the web, utilizing percollate[1] to get the HTML content, I haven't written any implementation for this in Python yet.

If you don't mind me asking, how were you going to implement spaced repetition? Since the Incremental Reading algorithm has never been published as far as I know.

[1]: https://github.com/danburzo/percollate

I wanted to dogfood and experiment on most of the things, from design to SR algo.

My primary goal was to make something i wanted, which means lots of experimentation across the board. I also want to make a very general purpose/flexible system where you can tweak the underlying SR algo based on the type of knowledge. Ie i want to store music scores, as well as units of fact, and the SR algo or logarithms should support either.

Pulling content (Reader) seemed the hard part to me, mostly because of legality concerns.