|
|
|
|
|
by fiatmoney
4694 days ago
|
|
It's not a terrible idea to support the absolute basics like mean & variance, but anything beyond that (particularly things like models or tests) is not a good idea for a standard library. Once you hit even something simple like a linear regression you have issues of how to represent missing or discrete variables, handling colinearity, or whether to do online or batch modes which can give different results. Tests in particular are fraught because if you're going to make them available for general consumption they need a good explanation of when they're appropriate, which is basically a semester course in statistics and well out of scope for standard library docs. Basically, the idea of "batteries included" should also mean that if something looks like you can put a D-cell in there, you're unlikely to blow your arm off. |
|
Similarly, Excel/etc. support these functions without a "semester course in statistics." Instead, you'll find that there are many web pages from semester courses in statistics which end up teaching how to use Excel. The same would no doubt happen with Python.
I don't why a statistics standard library module needs to provide a "good explanation of when they're appropriate" to a higher standard than any other module. Python provides trigonometric and hyperbolic functions without teaching trigonometry. It provides complex numbers and cmath without teaching people about complex numbers. It provides several different random distribution functions without teaching anything about Pareto, Weibull, or von Mises distributions.
For that matter, data structures is a semester course as well, but the Python documentation doesn't teach those differences in its documentation of deque, stack, hash table, etc., nor describe algorithms like heapq and bisect.
"whether to do online or batch modes which can give different results". The PEP says it will prefer batch mode: