Hacker News new | ask | show | jobs
by vtuulos 478 days ago
if you want to see similar tricks applied in Python (with a JIT compiler for query-time optimization), take a look at this fun deck that I presented a long time ago: https://tuulos.github.io/sf-python-meetup-sep-2013

we were able to handle trillion+ datapoints with relatively modest machines - definitely a useful approach if you are ready to do some bit twiddling

1 comments

Just use `s = sys.intern(s)` for every string and be done with it. Or something like:

    _my_intern_dict = {}
    my_intern = lambda x: _my_intern_dict.setdefault(x, x)
    s = my_intern(s)
Just make sure to delete _my_intern_dict when it's not needed anymore.