Hacker News new | ask | show | jobs
by jerpint 834 days ago
Interesting that it’s not vision based, I suspect you will get much better performance once vision is incorporated, using e.g LLaVa style models