|
Well, it doesn't just require them to click a menu item and hit a button -- then they have to fix all the problems that arose because RStudio encouraged a hackish development style. There's a workaround, but RStudio still actively encourages your workspace to get out of sync from your script. Compare that to DrRacket, where the code is labeled the "Definitions" window, and every time you reload the definitions, your workspace starts from scratch. You can't accidentally interact with deleted code. This would be a problem in a straight-line script where you're doing a bunch of data munging and analysis, since reloading from scratch might mean redoing expensive computations. But if you're building clean, reusable functions to implement interesting algorithms -- building a package and not a script -- then it's exactly the behavior you want. Our course very much focuses on software engineering. Major topics include writing modular code, object-oriented design, thorough testing, and version control. We don't cover statistics concepts in the class -- it's computing for statisticians, not computational methods in statistics. We believe that teaching statisticians to compute like software engineers will, in the long term, dramatically improve their work, since they'll have a stable base of robust, modular, well-tested, reusable code. One recent project, for example, required students to write a pipeline of scripts: one script takes the name of a CSV file as a command-line argument, processes and filters the data, and dumps it on STDOUT so the next script can read from STDIN and load the data into PostgreSQL, so another script (an R Markdown document) can do some queries and generate an automated report on the new batch of data. The processing and analysis stages have to be written as functions, not just top-level scripts, so they can be thoroughly tested. A future project will involve using dual k-d trees for fast approximate kernel density estimation, or building R trees to efficiently query spatial data. These are definitely more like packages than scripts. |
We take reproducibility very seriously. The fact that RStudio's Knit button uses a new R session, instead of the current R session, to compile R Markdown documents was a deliberate choice to make sure your output is produced from a clean R session. But if you are doing EDA, it may not be very pleasant to click this button over and over again every time you update your code (you can if you want).
If your course is focused on software engineering, everything you said makes perfect sense. Statisticians can learn the good principles in CS, but they are statisticians after all. There must be tradeoffs.