|
|
|
|
|
by MichaelRazum
337 days ago
|
|
Technical question: Can someone explain how the vision backbone can be replaced after training? I think this is what they mentioned in the video. Just wondering how it would work, since I would suspect that the visual embedings would be highly affected. PS: Is the approach something like LORA or a complete retrain on the visual part? |
|
It was giving coordinate bounding boxes and likelihood matches to generic classifications for each:
…