Y
Hacker News
new
|
ask
|
show
|
jobs
by
dartharva
852 days ago
Does it support other visual-input-accepting language models? GPT-V is paywalled.
2 comments
Deverauxi
852 days ago
The code isn’t very complicated. You could edit in any vision model you want:
https://github.com/microsoft/UFO/blob/main/ufo/llm/llm_call....
link
dartharva
852 days ago
I don't have the resources currently but would love to test something like LLaVA out for this, but I won't keep my hopes up.
link
dartharva
852 days ago
Update: no, it doesn't as of now.
link
https://github.com/microsoft/UFO/blob/main/ufo/llm/llm_call....