Hacker News new | ask | show | jobs
by hapanin 916 days ago
Since nobody is actually recommending papers, here's an incomplete reading list that I sent out to some masters students I work with so they can understand the current research (academic) my little team is doing:

Paper reference / main takeaways / link

instructGPT / main concepts of instruction tuning / https://proceedings.neurips.cc/paper_files/paper/2022/hash/b...

self-instruct / bootstrap off models own generations / https://arxiv.org/pdf/2212.10560.pdf

Alpaca / how alpaca was trained / https://crfm.stanford.edu/2023/03/13/alpaca.html

Llama 2 / probably the best chat model we can train on, focus on training method. / https://arxiv.org/abs/2307.09288

LongAlpaca / One of many ways to extend context, and a useful dataset / https://arxiv.org/abs/2309.12307

PPO / important training method / idk just watch a youtube video

Obviously these are specific to my work and are out of date by ~3-4 months but I think they do capture the spirit of "how do we train LLMs on a single GPU and no annotation team" and are frequently referenced simply by what I put in the "paper reference" column.

3 comments

Mamba: Linear-Time Sequence Modeling with Selective State Spaces / https://arxiv.org/abs/2312.00752
I would say that the chinchilla paper is a prerequisite to all of the ones mentioned above

https://arxiv.org/abs/2203.15556

DPO should be listed as well: https://arxiv.org/abs/2305.18290

It's extremely zeitgeisty atm