Hacker News new | ask | show | jobs
by baptiste1 972 days ago
Foundational models are generally trained on internet scale level of data. They have seen billions of images, so they would have seen some medical images. For example, extracted from public datasets or textbooks. However, indeed, they may not be specialized to your use case. You could still fine-tune the model with a couple of examples to be more tailored to what you desire. Having a foundation model does not exclude training and your data could still be valuable. Indeed, you could achieve better performance by fine-tuning the larger model than just using your training data alone to train a model from scratch.

Also for the medical domain, I think vision-text segmentation models like SEEM (https://github.com/UX-Decoder/Segment-Everything-Everywhere-...) are really cool. You could for example ask “Where is the tumor located on that image?” and then the tumor is highlighted in the picture.