Hacker News new | ask | show | jobs
I deleted 78% of my Redis container and it still works (medium.com)
41 points by codervinod 1435 days ago
8 comments

The last time this was submitted, the comments were filled with obvious sockpuppet accounts praising the author and write up: https://news.ycombinator.com/item?id=31768241

Are you a spammer, Vinod Gupta?

Obviously not. There are better ways for spammers to spend their time , rather than writing articles on redis and containers
You say that but very obviously did it last time, so why would this be any different?

Last time this was posted it went full reddit with the sockpuppets. You didn't even try and hide it.

It was my first time posting article and my real friend put those comments. I realized it was bad to ask my “real” friends to read the article.
> Unused code => lots of CVEs

I have a problem with that. Yeah, there's the potential that having some extra binary available could result in the main app calling out to it, in a way that passes the data triggering the vulnerability, that can then reach back to the main service/data. People achieve more crazy things in practice.

It's of course great to strip those if possible, but I feel like that heading and "inherent security trade-offs" in cloud native apps is overplaying it quite a bit.

At least one benefit of doing something like this isn't necessarily that you automatically gain better security by removing stuff you don't need, but you instead remove security scanning positives tripped by finding CVEs in packages you're not using. Not having your developers slog through justifying all of them to see if they're false positives or not to release to production means they can get back to doing their actual jobs.

And, it's important to remember that when an attacker does gain a foothold, often the very first thing they're going to do is run a scan of every single binary on your system to see if there is anything they can use to escalate privilege, so having these around does present a true risk even if your program never calls out to any of them.

I should've phrased my comment better. I don't disagree with anything you mentioned. The post is correct. I just think they're overselling that aspect.
Leaving unused components in your workloads is indeed a security issue. A data breach takes over seven months to detect on average. During that time, the attacker is mainly squatting on your infra and finding deeper attack vectors to move laterally. This of course is a broad statement. Somethings are a lot harder to exploit than others.
This is a valid point, but definitely a misleading title.
How is it misleading? The entire post reads like a PR piece for their company, but the title seems faithful.
They didn't delete a part of container, they deleted a part of the docker file. Actually removing data from already built container image, as title suggests, would be much more impressive.
Yeah, I guess that's true.

The final result is a 78% smaller docker image, not container. But the way they achieve it, is by running a container of the image, running the functional test suite, then remove everything that was unused when the test suite ran, creating a new image from the results of the removal.

I don't think the title is purposefully misleading, it's just incorrect by mistake (confusing image/container).

TBH, I thought that they would delete a part of a running container, something to do with de-allocating memory pages in runtime and all that low-level C magic, but now I see that it was just a really weird way to read this on my part.
No I read it that way too.
> then remove everything that was unused when the test suite ran

What tooling does one use to determine this? How can you see what files are being hit or not? Ptrace?

Edit; I RTFA and think that’s the product they sell. But how would one go about this without their product?

My guess would be a combination of ptrace/strace/overriding libc and/or eBPF.
Maybe this would help in that regard: https://github.com/docker-slim/docker-slim
Docker slim would help in limited ways. EBF filtering fails at scale and has I here t limitations.
We deleted files from a pre-built container and not Dockerfile.
I would like to understand what’s misleading. You could download the container and compare sizes and binary with source container.
I have updated the title to remove confusion. Thanks for calling it out. There was no intention to mislead the community.
as is stated initially, that goes back to how bitnami is building its Docker images, basing on a set of debian packages (minideb) - there's also a shell library/framework embedded that does useful things, but that makes you read more code when you go check how the sausage is made. That minideb is the basis for the higher CVE count compared to scratch or alpine images.

> it’s a well-kept secret that no one wants to talk about

the maintainer side most casual docker image users aren't aware of I'd rephrase, but bitnami at least documents the issue

https://github.com/bitnami/minideb#security

https://docs.bitnami.com/kubernetes/open-cve-policy/

It looks like the approach this takes is to use instrumentation to record which files in the container are used at runtime when run under a test harness, and then build a new image omitting any files/packages that weren't used during the instrumented run.

I think there are several serious problems with this approach.

First, I would be very wary about trusting an image modified like this in production: it would be very hard to be certain I've exercised every code path I care about--including rarely hit error paths--when running the instrumented build. Perhaps a localization file with error messages is only loaded when an error condition is hit, and removing that file converts a non-fatal logged error into a fatal file-not-found?

Removing files also makes it very easy for your CVE scanner to report false negatives. For example, running docker scan on bitnami/redis returns a long scary list, including:

   Low severity vulnerability found in coreutils/coreutils
    Description: Race Condition
    Info: https://snyk.io/vuln/SNYK-DEBIAN11-COREUTILS-527269
    Introduced through: coreutils/coreutils@8.32-4+b1
    From: coreutils/coreutils@8.32-4+b1
The docker scan output on rapidfort/redis is empty, so great, we have no vulnerabilities, right?

   Tested rapidfort/redis for known vulnerabilities, no vulnerable paths found.
This particular CVE is present in chown and chgrp, according to Synk's info link. The same version of chown that sync thinks is vulnerable in bitnami/redis is also present in the rapidfort image:

   docker run --entrypoint=/bin/chown rapidfort/redis --version
  chown (GNU coreutils) 8.32

   docker run --entrypoint=/bin/chown bitnami/redis --version
  chown (GNU coreutils) 8.32
In this particular case, it looks like the original "vulnerability" is a false positive, but that doesn't change the wider point -- by trying to clean up an image by removing files, it's really easy to remove whatever signatures a CVE scanner is looking for without actually removing the vulnerable code. Here, it looks like you removed /var/lib/dpkg/info/coreutils*, so Synk doesn't think coreutils is installed, but some of the binaries are still present.

In my mind, false negatives are far scarier than false positives.

Finally, publishing an image modified like this without further cleanup is being a poor community participant.

For example, Redis is distributed under the 3-clause BSD license, requiring the license conditions to be distributed alongside any binary distribution. Your image removes all of the license files, and your Dockerhub page simply says "free to use and has no license limitations". You're quite likely violating Redis' license, and that of other software still present in the image.

You've also left Bitnami's welcome banner in place:

   docker run rapidfort/redis
  redis 12:06:41.40
  redis 12:06:41.43 Welcome to the Bitnami redis container
  redis 12:06:41.44 Subscribe to project updates by watching https://github.com/bitnami/containers
  redis 12:06:41.46 Submit issues and feature requests at https://github.com/bitnami/containers/issues
If a user does encounter an issue with your modified image, you're directing the support burden to Bitnami, who will then have to spend time in triage determining that a modified image was in use and that their code may not actually be at fault.

I think trying to reduce attack surface by removing unnecessary parts of an image is a noble goal, but I don't think a mostly-automated approach is a safe way to do so. I would much prefer to see the output of the instrumented run being used by a human to guide slimming down a Dockerfile manually, which would produce safer images without the risks of automated post-processing.

I have taken care of welcome banner. Thanks for pointing this out.

  docker run rapidfort/redis
  redis 01:21:19.08
  redis 01:21:19.08 Welcome to the RapidFort optimized, hardened image for Bitnami redis container
  redis 01:21:19.08 Subscribe to project updates by watching https://github.com/rapidfort/community-images
  redis 01:21:19.09 Submit issues and feature requests at https://github.com/rapidfort/community-images/issues/new/choose
Missing licenses was a bug in my scripts. I have updated the container images to include all licenses. Thanks for pointing this out.

I have also included metadata in the container. This will allow any scanner to generate the scan report from any tool.

I have also added links to the scan report on the docker hub, so it's easy to see what all CVEs still exist in the docker image. There is no intention to mislead anyone in the community.

RapidFort system, which I am using in the open source community images, allows the user to select files and integrate them into Ci/CD manually. The community images project on GitHub uses Github actions Ci/Cd to achieve the same manual curation.
You can direct the hardening process based on the instrumentation data, then bake it into your CI/CD and automate it. That’s how the community images are produced.
The issue is beyond careful package selection at dockerfile level. Even after carefully building your images, you’ll pull in a good number of unused dependencies.
Thanks for pointing licensing issue. I will fix it, this is open source and I am happy to accept and contributions.
Regarding support, our open source project has issues page, I welcome community to add issues and we will prioritize.
See: Distroless images[0]

This is one of the huge benefits of recent systems languages like go and rust -- they compile to single binaries so you can use things like scatch[1] containers. You may have to fiddle with gnu libc/musl libc (usually when getaddrinfo is involved/dns etc), but once you're done with it, packaging is so easy.

Even languages like Node (IMO the most progressive of the scripting languages) have packages like vercel/pkg[2] which produce native binaries.

BTW if you're considering running redis these days... Check out KeyDB[3], it's impressive. There are a lot of redis alternatives with interesting features these days that I wonder if running vanilla redis is even a good idea anymore (outside of ensuring complete feature-set compatibility).

[0]: https://github.com/GoogleContainerTools/distroless

[1]: https://hub.docker.com/_/scratch/

[2]: https://github.com/vercel/pkg

[3]: https://docs.keydb.dev

Mind sharing these other alternatives of Redis you are talking about?

There's also Redis Streams. Do any of these alternatives have similar streaming features or are there any other databases that are lightweight (instead of going full on Kafka)?

I thought you'd never ask:

- KeyDB (https://keydb.dev)

- Pelican Cache (https://www.pelikan.io/)

- Tendis (https://github.com/Tencent/Tendis)

- SSDB (https://github.com/ideawu/ssdb)

- Dynomite (https://github.com/Netflix/dynomite/)

- Dragonfly (https://github.com/dragonflydb/dragonfly)

- Skytable (https://github.com/skytable/skytable)

- Tidis (https://github.com/yongman/tidis)

- Anna (https://github.com/hydro-project/anna)

- Skyhook (https://github.com/aerospike/skyhook)

And some which are kinda dead but still interesting -- redis is the kind of workload that does actually become feature complete so these are still usable in my mind though maybe not first choice:

- ledisdb (https://github.com/ledisdb/ledisdb)

- Codis (https://github.com/CodisLabs/codis)

- xcodis (https://github.com/ledisdb/xcodis)

I'm planning on doing a comparison with these at some point, because they're fascinating (all these projects go off in subtly different directions, I'll spare you the details), but here's a recent comparison someone else did:

https://news.ycombinator.com/item?id=31796311

Basically, redis compatibility is like step 1 for any KVS that wants to seem at least a little usable/real-world-focused so you get so many cool entrants.

I don't really personally keep up with redis for the stream use-case -- it's a great use for redis but that doesn't really make/break for me usually.

Is this related to this [1] HN discussion?

[1] - https://news.ycombinator.com/item?id=21755871

Docker slim tries to solve the same problem with a very different technology. EBF packet filtering doesn’t work for many cases and hence results are sub optimal.
My account was new at that time and HN didn’t allow me to post the article and hence asked my friend. But obviously someone from community flagged. I am happy that community is active but it was a genuine mistake on my end.
your friend posting instead of you was not the problem with that submission...
It was a genuine mess between me and my friend. He used my account and then switched our accounts. Anyway I understand how it reflects from community point of view. Please accept my sincere apologies in the past post.