Hacker News new | ask | show | jobs
Show HN: NLP Flashcards for Most of the Internet
29 points by samjgorman 1751 days ago
Hello HN! We're Sam and Kanyes. We're building an extension to help you remember what you read online. We're calling it Ferret [1].

When you open Ferret on an HTML page, it generates recall-based questions + answers to reinforce key concepts with NLP. Consider the following toy example where we open Ferret on an explanation of Bayesian statistics. [2]

Q: What does the frequentist interpretation view probability as? A: the limit of the relative frequency of an event after many trials

Q: What is often computed in Bayesian statistics using mathematical optimization methods? A:The maximum a posteriori

We do this by (1) Parsing the DOM tree of an HTML page for <p> tags on the client, and segmenting these into preprocessed chunks (2) Performing inference on question-generation with a T5-base model pretrained on SQuAD (3) Extractive question-answering with the chunk & question we've generated with RoBERTa, also pretrained on SQuAD.

No GPT-3 here— where's the fun in an API call when you can do it yourself. Ferret is built as a React.JS app deployed as a chrome extension, with models hosted on AWS Sagemaker.

Finally, why could this be helpful? Human memory is lossy. Psychologists have shown for forever that your memory can be modeled with a forgetting curve. If you don't attempt to retain knowledge, you'll likely lose it. But most of the content we read online (technical blog posts, documentation, course notes, articles) gets ingested and quickly forgotten. We're interested in low-friction approaches to helping people better remember this content , starting with fellow engineers who depend on their ability to remember key concepts to do the best job.

We've open-sourced the full repo and are actively responding to PRs + issues. [3]. You can read more about the technical + product challenges we faced if that interests you as well. [4]

We appreciate all feedback and suggestions!

[1]https://chrome.google.com/webstore/detail/ferret/mjnmolplinickaigofdpejfgfoehnlbh [2] https://en.wikipedia.org/wiki/Bayesian_statistics

[3] https://github.com/kanyesthaker/qgqa-flashcards

[4] https://samgorman.notion.site/Ferret-c7508ec65df841859d1f84e518fcf21d

5 comments

Hi, Kanyes here from Ferret. Starting the discussion by sharing an unsolved technical hurdle that may be of interest. We made a decision early in development to perform all inference on CPU to avoid unfriendly production costs and inefficiencies processing single inputs instead of batches.

Sequential models like T5 tend to be large (300mb >), and we observed high latency per inference of approx 8s. We've masked this latency on the frontend, mainly sending concurrent requests with async code (4 at a time) and preloading content early. However, this is kind of hacky and we'd (ideally) want to reduce inference time.

To this end, we've demonstrated roughly 1.7x speedup by converting our model weights in pytorch to a quantized ONNX graph. However, we've found a lot of friction in trying to deploy ONNX graphs to AWS. We understand there are a variety of potential solutions (training smaller distilled models, deploying ONNX, contesting our rationale to use CPU etc), so we're looking for suggestions for the optimal method to make inference faster!

Aside from challenges regarding per inference latency, any other unique challenges you guys faced when deploying nlp models to web? It's pretty cool to see ml being applied more actively in day-to-day web browsing.
One thing that we're trying to hit is the actual quality of the generated questions/answers; there's an immense amount of variance in internet content at large and it's difficult to ensure high-quality content without some kind of supervised QA (aside from some kind of hyperparam grid search). We were able to achieve some control over quality through rule-based heuristic filters on our generated questions, but we're trying to make this more robust.
Really could’ve used this during my time in school, good work. What’s the GTM, and who are the ideal users your building for here?
Ideal users: engineers. At least to start with. We know that the typical engineer has to learn a lot of relevant content online with technical documentation, blog posts, notes, etc, so are seeking to be helpful there first. Plus, we'd like to solve our own problems we faced. Down the line if people like it, can imagine this could be pretty helpful for students in general of most ages.

Go to market: We're hoping sharing here on HN is a good starting point. We've gotten modest traction on Twitter among the ML community [1] And in r/machinelearning as well. [2] Next steps are monitoring word of mouth and trying to iterate on the product until we hit a point where people enjoy sharing Ferret with their friends.

[1]https://twitter.com/KanyesThaker/status/1431378692912025604 [2]https://www.reddit.com/r/MachineLearning/comments/pc3hyx/p_g...

Is there any way to export flashcards to Anki?
Wow this is cool!
Thanks for taking the time to check it out!