> inaccurate due to accidental or intentional test data leaking into training data and other ways of training to the test.
Even if you assume no intentional data leakage it is fairly easy to accidentally do it. Defining good test data is hard. Your test data should be disjoint from training, which even exact deduplication is hard. But your test data should belong to the same target distribution BUT be sufficiently distant from your training data in order to measure generalization. This is ill-defined in the best of cases, and ideally you want to maximize the distance between training data and test data. But high dimensional settings mean distance is essentially meaningless (you cannot distinguish the nearest from the furthest).Plus there's standard procedures that are explicit data leakage. Commonly people will update hyperparameters to increase test results. While the model doesn't have access to the data, you are passing along information. You are the data (information) leakage. Meta information is still useful to machine learning models and they will exploit it. That's why there's things like optimal hyper-parameters, initialization schemes that lead to better solutions (or mode collapse), and even is part of the lottery ticket hypothesis. Measuring is pretty messy stuff, even in the best of situations. Intentional data leakage removes all sense of good faith. Unintentional data leakage stresses the importance of learning domain depth, and is one of the key reasons learning math is so helpful to machine learning. Even the intuition can provide critical insights. Ignoring this fact of life is myopic. > smaller academic, student or community teams working part-time will tend to score lower than they would on a level playing field.
It is rare for academics and students to work "part-time". I'm about to defend my PhD (in ML) and I rarely take vacations and rarely work less than 50hrs/wk. This is also pretty common among my peers.But a big problem is that the "GPU Poor" notion is ill-founded. It ignores a critical aspect of the research and development cycle: basic research. You might see this in something like NASA TRL[0]. Classically academics work predominantly in the low level TRLs, but there's been this weird push in ML (and not too uncommon in CS in general) for placing a focus on products rather than expanding knowledge/foundations. While TRL 1-4 have extremely high failure rates (even between steps), they lay the foundation that allows us to build higher TRL things (i.e. products). This notion that you can't do small scale (data or compute) experiments and contribute to the field is damaging. It sets us back. It breeds stagnation as it necessitates narrowing of research directions. You can't be as risky! The consequence of this can only lead to a Wiley Coyote type moment, where we're running and suddenly find there is no ground beneath us. We had a good thing going. Gov money funds low level research which has higher risks and longer rates of return, but the research becomes public and thus provides foundations for others to build on top of. [0] https://www.nasa.gov/directorates/somd/space-communications-... |
Sorry, that phrasing didn't properly convey my intent, which was more that most academics, students and community/hobbyists have other simultaneous responsibilities which they must balance.