Hacker News new | ask | show | jobs
by Uehreka 820 days ago
When I’ve tried listening to YouTube videos explaining, say, Attention Is All You Need, I find that I cannot do it passively at all. The first 10 or so minutes I’m nodding along, folding laundry or doing dishes, then the presenter says something like “by reifying this tensor against the priors I was just talking about, we’re able to—“ and I have to pause, rewind a couple minutes, grab a piece of paper and actually engage with what’s going on.

I have to imagine listening to raw papers (not even someone like Andrei Karpathy interpreting and presenting it) would be even more difficult. I don’t know if there’s an easy way to passively consume academic literature at all. If it’s important stuff, it will usually be pretty challenging.

6 comments

There is definetly a way to make this happen though. Little bit o' whisper, Mixtral in some RAG, and you've got yourself a buddy to talk about the paper while it's reading it to you.

Of course everyone will immediately say this is dangerous and it may mislead you by giving wrong explanations, etc etc. and then others will counter with 'it will definitely get better over time' (the best models as products are ~3 years behind the improvements being show in academic work for example). However, ultimately this is just a neat product to make, even if it has some bugs. Listening to TTS right now spends about half the time reading jumbled numbers from tables and listing off author names. So just tackling that alone (which this would do much better) would be valuable.

But listening to a paper passively is the not the same thing as being mentally prepared to converse with an LLM about a dense topic. I feel the usecases are quite different, and I doubt that there is a middle ground between listening passively and learning a complex topic. But maybe I am missing something.
This is a bit different than the "read a paper" TTS app. I mentioned the idea just to say it's possible and coming. The blend of the two isn't out of the question though.

Think of asking for a reading of a paper wherein you could interject at any time.

System: "This work is presented fromainly 3 groups: Deepmind, University of Pennsylvania, and ETH Zurich - the authors are Matthew Botvinick, Dani Bassett, and Bastian Rieck. They uncover a useful meta-learning program that relies on an AT methodology rooted in the bifiltration of the Ricci curvature of the embeddings and training step, wherein ..." You: "Wait a second - the algebraic topology method - what are the prior works in that area and why would that be the starting point for this paper" System: "It appears that the relevant citations point to Anne Sizemore's work while in Bassett's lab, with a few other key authors such as Guisti. The titles suggest that..."

(...) System: "Now that we've cleared that up a bit (and added it to a research list for further exploration later), to continue on the paper ..."

And so on.

This is very achievable today with a little bit of work. Perhaps not easy to work _super well_ - but likely easy enough to get working to _some degree_. A well polished product that does work super well certajnly isn't out of the question though.

How much would this product be worth to you?
I came to post essentially this. I could listen to review articles in a area I'm familiar with, but listening to primary papers could never work for me.
It also really depends on what type of papers. Psychology papers are very accessible to audio format as are, in general, quite comprehensive.
Just listening while doing nothing else is soporific, but I can imagine finding this invaluable if I had a long commute to work.
I use a combined approach of pre listening then reading the technical writing later sometimes
Visual reading of dense papers also leads to failing to understand concepts or distractions.
This is a great point. People will complain if LMs are applying to anything, but ultimately it improves accessibility, and allows for someone to dive deeper when needed.

There will always be ways to misinterpret some academic work, and there are plenty of opportunities in the path of understanding a work to do that.

Allowing someone to engage with a work _at all_ by lifting some barriers (visually impaired people's for exampld) should be acknowledged as an improvement, not discouraged continually for having some bugs.