|
|
|
|
|
by barneso
3707 days ago
|
|
Most teams I have seen have either template scripts or boilerplate that generates datasets, and share both the generated data and the scripts via normal ways that people share data and code: disk, S3, github, emailing of notebooks, etc. It requires a fair amount of set-up, but works surprisingly well once there is a core team and problems established. We are building mldb.ai to help bring the data and the algorithms for ML together in a less ad-hoc manner and to help move things out of research and into prod once they are ready. Many of the hosted ML solutions (Azure ML, Amazon ML, Google Data Lab, etc) and other toolkits (eg Graphlab) are working on similar ML workflow and organizational structure problems. |
|