|
I don't like the code. My start point would be that import pandas as pd
import numpy as np
if __name__ == "__main__":
n_samples = 10000
samples_np = pd.DataFrame(np.random.randint(1, 7, n_samples), columns=["face_value"])
print(samples_np.face_value.mean())
Speaking about abstraction, I don't know math, so first thought would be to look for *existing* abstractions. When I work with relational data, my first option to check is SQL. For math looks like DataFrame is a *standard* abstraction. To be fair, maybe first I would be using build-in `random.randin` I am not very familiar with `numpy`, but I would definitely google "pandas random sample", that would bring https://pandas.pydata.org/docs/reference/api/pandas.DataFram... if __name__ == "__main__":
n_samples = 10000
sample_pd = pd.DataFrame({'face_value': [1, 2, 3, 4, 5, 6]})
print(sample_pd.sample(
n=n_samples,
replace=True,
random_state=np.random.bit_generator.randbits(20)).face_value.mean())
code uses lambda functions in some examples, it probably kills advantages of `numpy` performance. Using DataFrame API at least helps to avoid those pitfalls.Type annotation, I like the idea, but in the end code looks like Java, but doesn't performs like Java. It is very hard to make it right in Python, also some of them wrong. (
@dataclass(frozen=True): - don't need ":"
Gaussian.sample - missing return ) when return added it doesn't return `-> Sequence[float]:` Gaussian().sample(90).dtype
>>> dtype('float64')
-> Sequence[Union(numpy.float64, numpy.float32, numpy.float16)]: # ?I don't believe "scientific code" is fundamentally different from any other code, I would go with following normal development practices 1) review design ("don't reinvent wheel") 2) add tests 3) make code review 4) version control etc. |
That depends on the field. In the part of bioinformatics I work (mostly combinatorial algorithms; floating point numbers are rare) normal software development practices often cease to be relevant the moment someone mentions design.
Writing research code is a part of doing research, and a key feature of research is that you often don't know what you are supposed to do. When I start writing code, I tend to expect that the code will solve the wrong problem in the wrong way. Once I have something that runs, I start experimenting with data to learn more about the problem domain. Eventually I have a better idea what the code is supposed to do (and maybe even how it should do that), and then it's time to rewrite and iterate. Design only becomes relevant in late stages of the project when I'm confident I know the problem I'm supposed to solve.
There are some similarities to prototyping, but it's prototyping over problems rather than over solutions to a particular problem.