Hacker News new | ask | show | jobs
by screye 754 days ago
No offense taken.

My tasks aren't usually bottlenecked by the df creation operation. To me, the convenience offered by dfs outstrips the compute hit. However, if this is an order of magnitude difference , then it would push me to adopt the more-itertools formulation.

1 comments

> However, if this is an order of magnitude difference , then it would push me to adopt the more-itertools formulation.

My friend it's much worse than a single order magnitude for small inputs

    import time
    import pandas as pd

    ls = list(range(10))

    b = time.monotonic_ns()
    odds = [v for v in ls if v % 2]
    e = time.monotonic_ns() - b
    print(f"{e=}")

    bb = time.monotonic_ns()
    df = pd.DataFrame(ls)
    odds = df[df % 2 == 1]
    ee = time.monotonic_ns() - bb
    print(f"{ee=}")
    print("ratio", ee/e)

    >>> e=1166
    >>> ee=656792
    >>> ratio 563.2864493996569
my experience is also that numpy and pandas can add 1-2 seconds to python startup time (which is terrible for the testing experience).