Hacker News new | ask | show | jobs
by keepamovin 146 days ago
You got me on the ZIP/LZW mix-up -- that was a mistake in the readme drafting. I'll fix that.

Regarding 'Me or Claude': The core concept (applying bioinformatics edit-distance/alignment to compression rather than just exact prefix matching) is something I worked on back in 2013. The implementation in this repo was heavily assisted by Claude, yes.

You're right that DEFLATE and modern algos (Zstd, Brotli) are the production standard. This project isn't trying to replace Zstd tomorrow; it's a research prototype testing the hypothesis that fuzzy matching + edit scripts can squeeze out entropy that exact-match dictionaries miss. The 8-10x slowdown means it's definitely experimental, but as a starting point for further exploration? That's what I want.

1 comments

This is better presentation than README, which currently is marketing-heavy and technically weak. Project as an experiment is acceptable and interesting but certainly isn't "next-generation" when has (assuming benchmarks are valid) <0.2% ratio improvement to an outdated algorithm, at expense (assuming description is valid) of much worse compression/decompression speed. Note such slowdown isn't implementation detail but expected by design; neighbor graph, Levenshtein distance, edit scripts, etc, kill speed. In the end compression is trade-off between ratio and speed, and methods benchmark to both rather one.

As overall note, AIs when you prompt "apply concept X in Y" (or anything really) will tell you what a great idea and then output something that without domain knowledge you've no idea if it's correct or if even makes sense at all. If don't want to do a literature research/study, recommend at least throwing the design back to the machine and asking for critique.

> an outdated algorithm

Sorry, not my area. Which are the current best algorithms? (Bonus points if they are open source so the OP can add them to the benchmark.)

There was a post on HN a few years back on Zstandard which at the time was one of the better compression algorithms out there:

https://news.ycombinator.com/item?id=25455314

I launched with the hype version README. What can I say? Rolled the dice, didn't really care that much. Because the code worked. Spent a few hours iterating on it from the first version - to get the speed to that, and the gains over LZW. That's what I wanted, that's how it happened.