|
|
|
|
|
by shnkr
808 days ago
|
|
GenAI novice here. what is training data made of how is it collected? I guess no one will share details on it, otherwise a good technical blog post with lots of insights! >At Databricks, we believe that every enterprise should have the ability to control its data and its destiny in the emerging world of GenAI. >The main process of building DBRX - including pretraining, post-training, evaluation, red-teaming, and refining - took place over the course of three months. |
|
Llama 2 was much more opaque about the training data, presumably because they were already being sued at that point (by Sarah Silverman!) over the training data that went into the first Llama!
A couple of things I've written about this:
- https://simonwillison.net/2023/Aug/27/wordcamp-llms/#how-the...
- https://simonwillison.net/2023/Apr/17/redpajama-data/