|
|
|
|
|
by dheera
414 days ago
|
|
> employs a single-layer vector quantizer (VQ) codec and a single Transformer architecture to fully align I really wish when new models were released that they would draw a diagram of all the layers and the tensor input and output sizes at each layer, with zoom in/out capabilities if needed using D3.js or whatever visualization framework if needed. Every single layer should be on there with its input and output sizes. These one-sentence descriptions, and approximate block diagrams with arrows pointing at each other are never enough to understand how something is actually implemented. |
|
You can also build a custom version of llama.cpp that writes out the ggml compute graph. What's irritating is that hugging face didn't add it to their GGUF file viewer.