Hacker News new | ask | show | jobs
Video-LLaMA: Instruction-Tuned Audio-Visual Lang Model for Video Understanding (github.com)
1 points by rhogar 1107 days ago