Hacker News new | ask | show | jobs
by shepik 1598 days ago
1) DuckDB - like the performance and that it's tightly integrated into python. Clickhouse local - i saw the announcement when it came out, but i just don't see the usecase for it in analytics. DataFusion is new for me. I'd say that the ecosystem is moving towards snowflake/bigquery/redshift/...

2) Why would you want to do that? You'd use bitmap index because it's quite compact and you can process the data at the speed of memory bandwidth, using ascii defeats that, no?

3) Not really. I just can't imagine costs of running any large website or app with serverless.

4) I think it's the same in the US - everyone competes for the same talent with FAANG (MAANG?). The salary gap between big companies and startups is even lower in Russia. Also, we humans want to do meaningful things, some of us struggle to find meaning in being another bigco employee.

5) Py

1 comments

Thanks.

> Why would you want to do that?

We store user preferences (200+ yes/no knobs) in a bitmap (well, a bitmap-index like the one in Hash Mapped-Array Tries). We want to capture those prefs in a single sub-domain (limited to 63 lower-case alphanumeric chars) or a URL (limited to 200 mixed-case alphanumerics). Today, we simply convert the bitmap into url-b64 (or, b32 to store it in the subdomain), but we will soon run out the 63-char limit if we introduce more knobs.

A demonstration of it is here, in case the above didn't explain it well: https://rethinkdns.com/configure (choose blocklists, and see the selection generate a path appended to the base-url shown in the search-bar).

Oh. No, i don't think it's possible - i'd suggest to just use multiple subdomains.

Technically, you can squeeze out some bits: there are 36-37 possible characters of which you are using only 32, so with arithmetic coding you would be looking at about 1 extra bit for 5 characters, but it's a nightmare to code. And after those extra bits run out, you will get the same problem anyway.

Does another party need to decode the url? What about using a dictionary for the top 10k seen starting combinations and then encode the rest?

What about run length encoding? 1-9 for positive sequences. a-i for negative sequences (max means pattern continues) and the rest for frequent patterns like alternating sequences, etc

9967b would be 24 yes, 1 no, 7 yes, 3 nos, 1 yes etc

Thanks. RLE is a hit-and-a-miss (101010101010 etc); top-k is heuristic-based. Those are viable solutions, nevertheless (esp, top-k). Puny Code, which DNS uses, I thought was pretty neat for fitting in a state-machine in printable characters. Takes a bunch of CPU resources to restore it, though.

I was wondering if there are other techniques that I may not be aware of.