|
|
|
|
|
by buran77
640 days ago
|
|
The "Mistral Pixtral multimodal model" really rolls off the tongue. > It’s unclear which image data Mistral might have used to develop Pixtral 12B. The days of free web scraping especially for the richer sources of material are almost gone, with anything between technical (API restrictions) and legal (copyright) measures building deep moats. I also wonder what they trained it on. They're not Meta or Google with endless supplies of user content, or exclusive contracts with the Reddits of the internet. |
|
My hunch is that most AI labs are already sitting on a pretty sizable collection of scraped image data - and that data from two years ago will be almost as effective as data scraped today, at least as far as image training goes.