Hacker News new | ask | show | jobs
by nooorofe 1578 days ago
I don't like the code. My start point would be that

    import pandas as pd
    import numpy as np

    if __name__ == "__main__":
        n_samples = 10000
        samples_np = pd.DataFrame(np.random.randint(1, 7, n_samples), columns=["face_value"])
        print(samples_np.face_value.mean())

Speaking about abstraction, I don't know math, so first thought would be to look for *existing* abstractions. When I work with relational data, my first option to check is SQL. For math looks like DataFrame is a *standard* abstraction. To be fair, maybe first I would be using build-in `random.randin` I am not very familiar with `numpy`, but I would definitely google "pandas random sample", that would bring https://pandas.pydata.org/docs/reference/api/pandas.DataFram...

    if __name__ == "__main__":
        n_samples = 10000
        sample_pd = pd.DataFrame({'face_value': [1, 2, 3, 4, 5, 6]})
        print(sample_pd.sample(
            n=n_samples, 
            replace=True, 
            random_state=np.random.bit_generator.randbits(20)).face_value.mean())
code uses lambda functions in some examples, it probably kills advantages of `numpy` performance. Using DataFrame API at least helps to avoid those pitfalls.

Type annotation, I like the idea, but in the end code looks like Java, but doesn't performs like Java. It is very hard to make it right in Python, also some of them wrong.

( @dataclass(frozen=True): - don't need ":" Gaussian.sample - missing return )

when return added it doesn't return `-> Sequence[float]:`

    Gaussian().sample(90).dtype
    >>> dtype('float64')
 
-> Sequence[Union(numpy.float64, numpy.float32, numpy.float16)]: # ?

I don't believe "scientific code" is fundamentally different from any other code, I would go with following normal development practices

1) review design ("don't reinvent wheel")

2) add tests

3) make code review

4) version control

etc.

2 comments

> I don't believe "scientific code" is fundamentally different from any other code, I would go with following normal development practices

That depends on the field. In the part of bioinformatics I work (mostly combinatorial algorithms; floating point numbers are rare) normal software development practices often cease to be relevant the moment someone mentions design.

Writing research code is a part of doing research, and a key feature of research is that you often don't know what you are supposed to do. When I start writing code, I tend to expect that the code will solve the wrong problem in the wrong way. Once I have something that runs, I start experimenting with data to learn more about the problem domain. Eventually I have a better idea what the code is supposed to do (and maybe even how it should do that), and then it's time to rewrite and iterate. Design only becomes relevant in late stages of the project when I'm confident I know the problem I'm supposed to solve.

There are some similarities to prototyping, but it's prototyping over problems rather than over solutions to a particular problem.

> I don't believe "scientific code" is fundamentally different from any other code, I would go with following normal development practices

> 1) review design ("don't reinvent wheel")

> 2) add tests

> 3) make code review

> 4) version control

> etc.

In my experience, this is pretty quixotic and will lead to your failure as a scientist. Basically nobody is writing tests. Code review is pretty much unheard of. There only "design review" comes from your journal's peer review process and generally has nothing to with the code. You can jump through all those hoops while your colleagues keep cranking out papers.

I just got reminded of this scene from Big Bang Theory.

Sheldon: laughing at his own joke Howard: I haven't seen him laugh that hard since the day Leonard made that multiplication error. Sheldon: laughing hysterically Oh. Oh, Lord! That multiplication error. He though he carried a 1. But he didn't! Leonard: It's not funny. That mistake got published. Sheldon: Stop! I'm gonna wet myself.

Yes, that part is different, reusability of the code is not expected to be the same as normal software code. Not different that the code starts from requirements (which are different) and it should be correct, which basically is the "following normal development practices".