Hacker News new | ask | show | jobs
by Serginusa 115 days ago
[flagged]
1 comments

Thanks! The quantization tradeoffs have been a grind. We do not have an exact number but we found that a few thousand images was not enough once you account for the variance on farm. Lighting changes throughout the day, water clarity shifts between feedings, fish density varies by tank. Early on our calibration sets were too homogenous and the INT8 models would work great in testing and then fall apart when conditions shifted.

We also found that segmentation required significantly fewer images compared to keypoint pose detection models. Segmentation generalizes faster since you are just finding body boundaries. Keypoints are more finicky because anatomical landmarks vary a lot more across species, life stages, and body deformation while swimming. We had to be much more intentional about diversity in the keypoint training data. What made the difference overall was building calibration sets that intentionally captured edge cases. Low light, high turbidity, dense occlusion, different life stages. We also started stratifying by time of day and tank conditions rather than just grabbing random frames. It is still not perfect but the models are much more stable now.

Don't you find transformer based models perform better? Are they too heavy?
they're heavy but we have some post processing tasks related to the smoothness of the countours created by the labels. sometimes CNNs do not have smooth segmentation. We do various calculations to determine deformities or welfare indicators on a fish. For that we need some smoother contours.

Given that, we have trained a variety of models and we are still experimenting on what works the best. We are even considering using VLMs for certain tasks if we can fine-tune them well enough.