|
|
|
|
|
by visarga
611 days ago
|
|
Why not label a fine-tuning dataset with human descriptions based on video recordings. We explain in human language what they do, and then tune the model. It doesn't need to be a very large dataset, but it would allow for models to directly translate to human language from bird calls. |
|
As I mentioned before, only the most obvious examples of behaviors and context can be translated into anything meaningful.