| not sure what 'mixing LLM models' entails but these are maybe some good starting points - karpathy - https://www.youtube.com/watch?v=kCc8FmEb1nY - https://towardsdatascience.com/beautifully-illustrated-nlp-m... - https://dzone.com/articles/a-deep-dive-into-the-transformer-... - https://peterbloem.nl/blog/transformers - http://nlp.seas.harvard.edu/2018/04/03/attention.html - https://lilianweng.github.io/posts/2023-01-27-the-transforme... - https://blog.quickchat.ai/post/tokens-entropy-question/ - https://dugas.ch/artificial_curiosity/GPT_architecture.html - https://writings.stephenwolfram.com/2023/02/what-is-chatgpt-... - https://d4mucfpksywv.cloudfront.net/better-language-models/l... - https://arxiv.org/pdf/2005.14165.pdf - https://arxiv.org/pdf/2303.08774.pdf - https://arxiv.org/pdf/2303.17564.pdf |