| Little late to this thread but from my list: LLM (foundational papers) * Attention is all you need - transformers + self attention * BERT - first masked LM using transformers + self attention * GPT3 - big LLM decoder (Basis of gpt4 and most LLM) * Instruct GPT or TKInstruct (instruction tuning enables improved zero shot learning) * Chain of Thought (improve performance via prompting) some other papers which are become trendy depending on your interest * RLHF - RL using human feedback * Lora - make models smaller * MoE - kind of ensembling * self instruct - self label data * constitutional ai - self alignment * tree of thought - like CoT but a tree * FastAttention,Longformer - optimized attention mechanisms * React - agents |