|
|
|
|
|
by csdvrx
407 days ago
|
|
Is anyone here using phi-4 multimodal for image-to-text tasks? The phi models often punch above their weight, and I got curious about the vision models after reading https://unsloth.ai/blog/phi4 stories of finetuning Since lmarena.ai only has the phi-4 text model, I've tried "phi-4 multimodal instruct" from openrouter.ai. However, the results I get are far below what I would have expected. Is there any "Microsoft validated" source (like https://chat.qwen.ai/c/guest for qwen) to easily try phi4 vision? |
|