Hacker News new | ask | show | jobs
by RockyMcNuts 1162 days ago
not sure what 'mixing LLM models' entails but these are maybe some good starting points

- karpathy - https://www.youtube.com/watch?v=kCc8FmEb1nY

- https://towardsdatascience.com/beautifully-illustrated-nlp-m...

- https://dzone.com/articles/a-deep-dive-into-the-transformer-...

- https://peterbloem.nl/blog/transformers

- http://nlp.seas.harvard.edu/2018/04/03/attention.html

- https://lilianweng.github.io/posts/2023-01-27-the-transforme...

- https://blog.quickchat.ai/post/tokens-entropy-question/

- https://dugas.ch/artificial_curiosity/GPT_architecture.html

- https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-...

- https://d4mucfpksywv.cloudfront.net/better-language-models/l...

- https://arxiv.org/pdf/2005.14165.pdf

- https://arxiv.org/pdf/2303.08774.pdf

- https://arxiv.org/pdf/2303.17564.pdf