Hacker News new | ask | show | jobs
by youngprogrammer 916 days ago
Little late to this thread but from my list:

LLM (foundational papers)

* Attention is all you need - transformers + self attention

* BERT - first masked LM using transformers + self attention

* GPT3 - big LLM decoder (Basis of gpt4 and most LLM)

* Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning)

* Chain of Thought (improve performance via prompting)

some other papers which are become trendy depending on your interest

* RLHF - RL using human feedback

* Lora - make models smaller

* MoE - kind of ensembling

* self instruct - self label data

* constitutional ai - self alignment

* tree of thought - like CoT but a tree

* FastAttention,Longformer - optimized attention mechanisms

* React - agents