Hacker News new | ask | show | jobs
by Stasis5001 3266 days ago
This comparison isn't so great because of the following:

    for product_id in unique_products:
        product_items = [x for x in dataset_python if x[0]==product_id ]
This is O(unique_products * observations), and it looks like O(unique_products) = O(observations). Thus we have a quadratic scan when a linear one would suffice. You'll get the best performance using whichever solution lets you code this to linear the fastest. E.g. pure python, make a dict from product_id to observations and iterate over that, or for pandas, use groupby.