Hacker News new | ask | show | jobs
by baptiste1 972 days ago
Thank you for your comment. Indeed, you are right not every company has terabytes of data to train their model. I like your example "Another company needs to detect loose screw heads in engine blocks”.

I actually got the idea for Datasaurus because of a similar problem. My brother wanted to check if sheets of metal were bent and needed to be rejected in a production line setting. However, he did not have any data and could maybe annotate a couple of images manually but not create a full dataset. We tested the fine-tuning approach and he was able to have good results in a couple of minutes.

That’s why I think this could be quite valuable and I decided to package it into an open-source application.

1 comments

Which foundational model did you finetune with few images? just curious. I personally believe language interface or language conditioning is not very relevant or even harmful for many downstream CV applications. In your case, you don't need to ask whether the metal is bent in language interface, or there could be hundred ways you could ask these questions and outputs would be slightly different in each one. That's an unwanted instability, I feel conditioning inputs that were based on few examples would be much more relevant. i.e. Instead of conditioning with text embeddings, why not condition with embeddings of these 3 images and their labels?
I used LLaVA. Unfortunately, I signed a NDA :( so I cannot share the code and the data is private. We fine-tuned it with example images, labels, and text prompts. We also tried in-context learning. Indeed, the prompt was static but we could do data augmentation and provide a series of equivalent prompts. We just used the prompt that gave us the best performance during initial model testing with in-context learning. I am unsure if the existence of equivalent prompts creates instability because a sentence with the same meaning should be quite close in the latent space of the foundation model so it understands them in a similar manner.