I deleted 78% of my Redis container and it still works

Y	Hacker News new \| ask \| show \| jobs

	I deleted 78% of my Redis container and it still works (medium.com)
	41 points by codervinod 1435 days ago

8 comments

ramphastidae 1434 days ago

The last time this was submitted, the comments were filled with obvious sockpuppet accounts praising the author and write up: https://news.ycombinator.com/item?id=31768241

Are you a spammer, Vinod Gupta?

link

codervinod 1434 days ago

Obviously not. There are better ways for spammers to spend their time , rather than writing articles on redis and containers

link

effingwewt 1434 days ago

You say that but very obviously did it last time, so why would this be any different?

Last time this was posted it went full reddit with the sockpuppets. You didn't even try and hide it.

link

codervinod 1434 days ago

It was my first time posting article and my real friend put those comments. I realized it was bad to ask my “real” friends to read the article.

link

viraptor 1435 days ago

> Unused code => lots of CVEs

I have a problem with that. Yeah, there's the potential that having some extra binary available could result in the main app calling out to it, in a way that passes the data triggering the vulnerability, that can then reach back to the main service/data. People achieve more crazy things in practice.

It's of course great to strip those if possible, but I feel like that heading and "inherent security trade-offs" in cloud native apps is overplaying it quite a bit.

link

nonameiguess 1435 days ago

At least one benefit of doing something like this isn't necessarily that you automatically gain better security by removing stuff you don't need, but you instead remove security scanning positives tripped by finding CVEs in packages you're not using. Not having your developers slog through justifying all of them to see if they're false positives or not to release to production means they can get back to doing their actual jobs.

And, it's important to remember that when an attacker does gain a foothold, often the very first thing they're going to do is run a scan of every single binary on your system to see if there is anything they can use to escalate privilege, so having these around does present a true risk even if your program never calls out to any of them.

link

viraptor 1435 days ago

I should've phrased my comment better. I don't disagree with anything you mentioned. The post is correct. I just think they're overselling that aspect.

link

farimani 1434 days ago

Leaving unused components in your workloads is indeed a security issue. A data breach takes over seven months to detect on average. During that time, the attacker is mainly squatting on your infra and finding deeper attack vectors to move laterally. This of course is a broad statement. Somethings are a lot harder to exploit than others.

link

cycomachead 1435 days ago

This is a valid point, but definitely a misleading title.

link

capableweb 1435 days ago

How is it misleading? The entire post reads like a PR piece for their company, but the title seems faithful.

link

golergka 1435 days ago

They didn't delete a part of container, they deleted a part of the docker file. Actually removing data from already built container image, as title suggests, would be much more impressive.

link

capableweb 1435 days ago

Yeah, I guess that's true.

The final result is a 78% smaller docker image, not container. But the way they achieve it, is by running a container of the image, running the functional test suite, then remove everything that was unused when the test suite ran, creating a new image from the results of the removal.

I don't think the title is purposefully misleading, it's just incorrect by mistake (confusing image/container).

link

golergka 1435 days ago

TBH, I thought that they would delete a part of a running container, something to do with de-allocating memory pages in runtime and all that low-level C magic, but now I see that it was just a really weird way to read this on my part.

link

cycomachead 1435 days ago

No I read it that way too.

link

tluyben2 1435 days ago

> then remove everything that was unused when the test suite ran

What tooling does one use to determine this? How can you see what files are being hit or not? Ptrace?

Edit; I RTFA and think that’s the product they sell. But how would one go about this without their product?

link

capableweb 1435 days ago

My guess would be a combination of ptrace/strace/overriding libc and/or eBPF.

link

abdusco 1435 days ago

Maybe this would help in that regard: https://github.com/docker-slim/docker-slim

link

codervinod 1434 days ago

Docker slim would help in limited ways. EBF filtering fails at scale and has I here t limitations.

link

codervinod 1434 days ago

We deleted files from a pre-built container and not Dockerfile.

link

codervinod 1434 days ago

I would like to understand what’s misleading. You could download the container and compare sizes and binary with source container.

link

codervinod 1434 days ago

I have updated the title to remove confusion. Thanks for calling it out. There was no intention to mislead the community.

link

flas9sd 1435 days ago

as is stated initially, that goes back to how bitnami is building its Docker images, basing on a set of debian packages (minideb) - there's also a shell library/framework embedded that does useful things, but that makes you read more code when you go check how the sausage is made. That minideb is the basis for the higher CVE count compared to scratch or alpine images.

> it’s a well-kept secret that no one wants to talk about

the maintainer side most casual docker image users aren't aware of I'd rephrase, but bitnami at least documents the issue

https://github.com/bitnami/minideb#security

https://docs.bitnami.com/kubernetes/open-cve-policy/

link

sveiss 1435 days ago

It looks like the approach this takes is to use instrumentation to record which files in the container are used at runtime when run under a test harness, and then build a new image omitting any files/packages that weren't used during the instrumented run.

I think there are several serious problems with this approach.

First, I would be very wary about trusting an image modified like this in production: it would be very hard to be certain I've exercised every code path I care about--including rarely hit error paths--when running the instrumented build. Perhaps a localization file with error messages is only loaded when an error condition is hit, and removing that file converts a non-fatal logged error into a fatal file-not-found?

Removing files also makes it very easy for your CVE scanner to report false negatives. For example, running docker scan on bitnami/redis returns a long scary list, including:

   Low severity vulnerability found in coreutils/coreutils
    Description: Race Condition
    Info: https://snyk.io/vuln/SNYK-DEBIAN11-COREUTILS-527269
    Introduced through: coreutils/coreutils@8.32-4+b1
    From: coreutils/coreutils@8.32-4+b1

The docker scan output on rapidfort/redis is empty, so great, we have no vulnerabilities, right?

   Tested rapidfort/redis for known vulnerabilities, no vulnerable paths found.

This particular CVE is present in chown and chgrp, according to Synk's info link. The same version of chown that sync thinks is vulnerable in bitnami/redis is also present in the rapidfort image:

   docker run --entrypoint=/bin/chown rapidfort/redis --version
  chown (GNU coreutils) 8.32

   docker run --entrypoint=/bin/chown bitnami/redis --version
  chown (GNU coreutils) 8.32

In this particular case, it looks like the original "vulnerability" is a false positive, but that doesn't change the wider point -- by trying to clean up an image by removing files, it's really easy to remove whatever signatures a CVE scanner is looking for without actually removing the vulnerable code. Here, it looks like you removed /var/lib/dpkg/info/coreutils*, so Synk doesn't think coreutils is installed, but some of the binaries are still present.

In my mind, false negatives are far scarier than false positives.

Finally, publishing an image modified like this without further cleanup is being a poor community participant.

For example, Redis is distributed under the 3-clause BSD license, requiring the license conditions to be distributed alongside any binary distribution. Your image removes all of the license files, and your Dockerhub page simply says "free to use and has no license limitations". You're quite likely violating Redis' license, and that of other software still present in the image.

You've also left Bitnami's welcome banner in place:

   docker run rapidfort/redis
  redis 12:06:41.40
  redis 12:06:41.43 Welcome to the Bitnami redis container
  redis 12:06:41.44 Subscribe to project updates by watching https://github.com/bitnami/containers
  redis 12:06:41.46 Submit issues and feature requests at https://github.com/bitnami/containers/issues

If a user does encounter an issue with your modified image, you're directing the support burden to Bitnami, who will then have to spend time in triage determining that a modified image was in use and that their code may not actually be at fault.

I think trying to reduce attack surface by removing unnecessary parts of an image is a noble goal, but I don't think a mostly-automated approach is a safe way to do so. I would much prefer to see the output of the instrumented run being used by a human to guide slimming down a Dockerfile manually, which would produce safer images without the risks of automated post-processing.

link

codervinod 1431 days ago

I have taken care of welcome banner. Thanks for pointing this out.

  docker run rapidfort/redis
  redis 01:21:19.08
  redis 01:21:19.08 Welcome to the RapidFort optimized, hardened image for Bitnami redis container
  redis 01:21:19.08 Subscribe to project updates by watching https://github.com/rapidfort/community-images
  redis 01:21:19.09 Submit issues and feature requests at https://github.com/rapidfort/community-images/issues/new/choose

link

codervinod 1434 days ago

Missing licenses was a bug in my scripts. I have updated the container images to include all licenses. Thanks for pointing this out.

I have also included metadata in the container. This will allow any scanner to generate the scan report from any tool.

I have also added links to the scan report on the docker hub, so it's easy to see what all CVEs still exist in the docker image. There is no intention to mislead anyone in the community.

link

codervinod 1434 days ago

RapidFort system, which I am using in the open source community images, allows the user to select files and integrate them into Ci/CD manually. The community images project on GitHub uses Github actions Ci/Cd to achieve the same manual curation.

link

farimani 1434 days ago

You can direct the hardening process based on the instrumentation data, then bake it into your CI/CD and automate it. That’s how the community images are produced.

link

farimani 1434 days ago

The issue is beyond careful package selection at dockerfile level. Even after carefully building your images, you’ll pull in a good number of unused dependencies.

link

codervinod 1434 days ago

Thanks for pointing licensing issue. I will fix it, this is open source and I am happy to accept and contributions.

link

codervinod 1434 days ago

Regarding support, our open source project has issues page, I welcome community to add issues and we will prioritize.

link

hardwaresofton 1435 days ago

See: Distroless images[0]

This is one of the huge benefits of recent systems languages like go and rust -- they compile to single binaries so you can use things like scatch[1] containers. You may have to fiddle with gnu libc/musl libc (usually when getaddrinfo is involved/dns etc), but once you're done with it, packaging is so easy.

Even languages like Node (IMO the most progressive of the scripting languages) have packages like vercel/pkg[2] which produce native binaries.

BTW if you're considering running redis these days... Check out KeyDB[3], it's impressive. There are a lot of redis alternatives with interesting features these days that I wonder if running vanilla redis is even a good idea anymore (outside of ensuring complete feature-set compatibility).

[0]: https://github.com/GoogleContainerTools/distroless

[1]: https://hub.docker.com/_/scratch/

[2]: https://github.com/vercel/pkg

[3]: https://docs.keydb.dev

link

DistrictFun7572 1435 days ago

Mind sharing these other alternatives of Redis you are talking about?

There's also Redis Streams. Do any of these alternatives have similar streaming features or are there any other databases that are lightweight (instead of going full on Kafka)?

link

hardwaresofton 1435 days ago

I thought you'd never ask:

- KeyDB (https://keydb.dev)

- Pelican Cache (https://www.pelikan.io/)

- Tendis (https://github.com/Tencent/Tendis)

- SSDB (https://github.com/ideawu/ssdb)

- Dynomite (https://github.com/Netflix/dynomite/)

- Dragonfly (https://github.com/dragonflydb/dragonfly)

- Skytable (https://github.com/skytable/skytable)

- Tidis (https://github.com/yongman/tidis)

- Anna (https://github.com/hydro-project/anna)

- Skyhook (https://github.com/aerospike/skyhook)

And some which are kinda dead but still interesting -- redis is the kind of workload that does actually become feature complete so these are still usable in my mind though maybe not first choice:

- ledisdb (https://github.com/ledisdb/ledisdb)

- Codis (https://github.com/CodisLabs/codis)

- xcodis (https://github.com/ledisdb/xcodis)

I'm planning on doing a comparison with these at some point, because they're fascinating (all these projects go off in subtly different directions, I'll spare you the details), but here's a recent comparison someone else did:

https://news.ycombinator.com/item?id=31796311

Basically, redis compatibility is like step 1 for any KVS that wants to seem at least a little usable/real-world-focused so you get so many cool entrants.

I don't really personally keep up with redis for the stream use-case -- it's a great use for redis but that doesn't really make/break for me usually.

link

LinuxBender 1435 days ago

Is this related to this [1] HN discussion?

[1] - https://news.ycombinator.com/item?id=21755871

link

codervinod 1434 days ago

Docker slim tries to solve the same problem with a very different technology. EBF packet filtering doesn’t work for many cases and hence results are sub optimal.

link

detaro 1434 days ago

previously: https://news.ycombinator.com/item?id=31768241

link

codervinod 1434 days ago

My account was new at that time and HN didn’t allow me to post the article and hence asked my friend. But obviously someone from community flagged. I am happy that community is active but it was a genuine mistake on my end.

link

detaro 1434 days ago

your friend posting instead of you was not the problem with that submission...

link

codervinod 1434 days ago

It was a genuine mess between me and my friend. He used my account and then switched our accounts. Anyway I understand how it reflects from community point of view. Please accept my sincere apologies in the past post.

link