Hacker News new | ask | show | jobs
by baptiste1 972 days ago
Thanks for your question. You are right, current vision-language foundation models are quite heavy. However, for example in NLP there are some works on smaller foundation models. In addition, you could also use a foundational model to help train your smaller model or label more data.