The llava multi-modal models are fun. I find requesting json formatted output lets you overcome the limited response length baked in. https://huggingface.co/mys/ggml_bakllava-1 (a CLIP+Mistral-7B instead of CLIP+llama2-7B) is my favorite.
The llava multi-modal models are fun. I find requesting json formatted output lets you overcome the limited response length baked in. https://huggingface.co/mys/ggml_bakllava-1 (a CLIP+Mistral-7B instead of CLIP+llama2-7B) is my favorite.