| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by visarga 611 days ago
	Why not label a fine-tuning dataset with human descriptions based on video recordings. We explain in human language what they do, and then tune the model. It doesn't need to be a very large dataset, but it would allow for models to directly translate to human language from bird calls.

2 comments

lossolo 611 days ago

What if they just sit and talk? What is the description of this? What if only part of the communication is relevant? What if it's not relevant at all because they reacted to atmospheric changes? Or electromagnetic signals, that can't be observed on video? Or smell? Or sound outside of human hearing frequency? What if the decision based on communication is deferred? etc etc

As I mentioned before, only the most obvious examples of behaviors and context can be translated into anything meaningful.

link

amelius 611 days ago

But then it's not a translation of the bird tweets, but more like a predictive mapping from tweets to behaviors.

link