| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dominiek 1891 days ago

Amazing.

I never thought I would say this, but I actually implemented an FTP server in 2020. This was needed to support firmware updates to specific hardware (Electric Vehicle charging stations). Apparently embedded software developers choose FTP whenever a spec doesn't specify how binary file transfers should work.

It was kind of amusing getting FTP to work in a modern cloud environment. I run a single Kubernetes pod with a Node.js based FTP server optimized for one thing: Transferring files between FTP and Google Cloud Storage. A series of ports are specified in the Docker file to enable passive FTP transfers.

Even more amusing was the number of varieties in which FTP was implemented by different hardware manufacturers. I regularly had to dive into the FTP libraries to add support for crazy edge cases (tcpflow in kubectl exec -it is your friend!). Example: one device added a newline in the middle of a command (USER\n myusername)..

The latest curve ball I received this week is that a certain firmware version of a Qualcomm modem chip cannot deal with the size of the IP packets coming from our FTP server... Fun stuff!

9 comments

tpmx 1891 days ago

Implementing an FTP server from scratch that had to be compatible with lots of clients in 2020 was an interesting choice. Just to have it in javascript? Perhaps security-motivated? There are probably battle tested implementations in e.g. Python, Java or other safe-ish languages to build on?

I learned this lesson in the mid 90s when fixing client compatibility bugs in an FTP server module we had built in an interpreted language, because, how hard could it be...

> The latest curve ball I received this week is that a certain firmware version of a Qualcomm modem chip cannot deal with the size of the IP packets coming from our FTP server... Fun stuff!

Right.

klodolph 1890 days ago

> There are probably battle tested implementations in e.g. Python, Java or other safe-ish languages to build on?

My experience... there aren't a ton of choices in this space. There are a few FTP servers designed to power B2B backend services. Many of the options are designed only to provide access to the local filesystem.

axiolite 1890 days ago

> My experience... there aren't a ton of choices in this space. There are a few FTP servers designed to power B2B backend services. Many of the options are designed only to provide access to the local filesystem.

The "local filesystem" doesn't have to be a local file system. It's just a good, common abstraction useful for interoperability. Why not "rclone mount" your Google Drive, or use some other FUSE based file system to get easy interoperability between legacy FTP servers and modern storage options?

klodolph 1890 days ago

I don't think I agree that the local filesystem is a good, common abstraction. It's a serviceable abstraction most of the time, and a terrible abstraction at other times. This isn't just some contrarian stance I'm taking--I've just spent too much time fighting with filesystem semantics on too many OSs.

Going through FUSE is, in my mind, a last resort.

rectang 1890 days ago

How much better have we gotten at specifying protocols? Have we learned how to make protocols less ambiguous and less susceptible to crazy edge cases which make it burdensome to implement support in practice once there are lots of sloppy implementations in the field?

aidenn0 1890 days ago

1. in many cases the misbehaving clients and servers are obviously wrong, but Postel's law means they "worked for me" when the developer tested it 10 years ago before abandoning it.

2. The FTP protocol got a lot of cruft added to it that modern clients don't implement (e.g. any transfer mode other than stream).

3. FTP over TCP predates NATs and firewalls, which caused a lot of problems as well.

4. FTP was designed for human-readable, not machine-readable output. In particular the output of a LIST command is woefully underspecified.

I think #1 is the biggest issue for long-term viability of protocols. Not following Postel's law is a recipe for death (same reason why it's suicide for a browser to unilaterally untrust a major CA; any site that doesn't work in browser X is assumed to be browser X's fault), but following Postel's law is a recipe for undocumented de-facto standards with crazy edge cases.

rectang 1890 days ago

Thanks, wonderful reply!

I see that there's been a debate over the applying Postel's Law, a.k.a. the Robustness Principle, for some time:

https://en.wikipedia.org/wiki/Robustness_principle#Criticism

> a defective implementation that sends non-conforming messages might be used only with implementations that tolerate those deviations from the specification until, possibly several years later, it is connected with a less tolerant application that rejects its messages.

That Wikipedia page led me on to this IETF draft from 2019 on protocol maintenance

https://tools.ietf.org/html/draft-iab-protocol-maintenance-0...

> Abstract

> The robustness principle, often phrased as "be conservative in what you send, and liberal in what you accept", has long guided the design and implementation of Internet protocols. The posture this statement advocates promotes interoperability in the short term, but can negatively affect the protocol ecosystem over time. For a protocol that is actively maintained, the robustness principle can, and should, be avoided.

It seems that you need both for the protocol to be unambiguously, fully specified, and for popular implementations to avoid applying Postel's Law! But we've seen how market forces conspire to work against that.

Brainstorming opportunities for improvement beyond the suggestions in the IETF doc:

• Accompany the protocol with a validation test suite.

• Provide a validation service.

• Treat non-validating messages skeptically ("quirks mode").

winrid 1891 days ago

Just curious on the motivation.

Why not run a regular FTP server and have your application periodically look for new files to process? For horizontal scaling, you just take a distributed lock on the file name.

admax88q 1891 days ago

That honestly sounds more complicated. FTP isn't that difficult of a protocol, especially if you only need to support one known client, you can take all sort of shortcuts.

If you deploy an existing FTP server, and _then_ integrate with it at the filesystem level you now have two components, and your sysadmin requirements grow. Now you gotta administrate an FTP server that's probably written for classic UNIX single server usage, gotta handle filesystem permissions, gotta somehow hook up your distributed locks to the filesystem, sanitize filenames for your chosen filesystem.

Honestly filesystems suck, there's so many gotchas from a security perspective, when all you really want is to pipe binary data in this side, and out the other side.

I implemented an IRC bot in a few hours in javascript one day. Those classic IETF text based protocols are actually really fun and easy to implement, especially in a language that makes strings safe and easy (i.e. not C).

I could easily see figuring out all the deployment concerns around integrating with an existing FTP server end up taking way longer than just integrating the subset needed for this use case.

stjohnswarts 1890 days ago

Strings in c aren't that bad with stuff like bstring or glib. Just gotta be careful with deallocations so you don't get leaks. Much better than all the security issues with std strings. I mean not anywhere close to python/ruby/javascript easy but it's not as bad as HN likes to declare it.

recursive 1891 days ago

Active mode is pretty weird. Coordinating a single client across two ports sounds difficult to me, but I've never implemented it. If that's not a difficult protocol, then what is?

tuwtuwtuwtuw 1891 days ago

Http3, IMAP, Caldav and MAPI?

Cordinating a client across two ports sounds trivial compared to for example properly implementing client and server versions of IMAP search commands when no client or server follows the specification.

pixl97 1890 days ago

Until you realize that the two ports commonly involve dealing with middleware that cant handle it properly.

tuwtuwtuwtuw 1890 days ago

You will have similar issues with other protocols as well. I have seen physical bandwidth limiters, windows drivers, corrupt winsock LSPs etc mess with IMAP traffic. I have seen middleware replacing SMTP commands sent from a client to server for no good reason, such as replacing "EHLO hostname" with "EHLO **".

winrid 1891 days ago

The lock would be handled by something like Redis or a DB.

But yeah, if you only need to support one client, I can see the reasoning. It would never have flied at any of the places I've worked, though, having to support tons of clients.

admax88q 1890 days ago

Yeah totally I hear you. If you have to support many external clients using an existing ftp implementation makes lot more sense.

dominiek 1891 days ago

I had considered this, but decided against this for a couple of reasons:

- Scaling requirements are relatively low. Even though we're dealing with 10 thousands of devices, the amount of firware updates at a given time to those devices is minimal. Our main scaling challenges are around OCPP over websockets. Story for another day.

- I have bad memories of ProFTPd etc buffer overflow exploits.

- I wanted something simple that could bridge between FTP and our cloud persistence (MongoDB and Cloud Storage).

- I found this Node.js library that I since then forked: https://github.com/autovance/ftp-srv - The great thing about this library is that it allows a quick implementation of a custom filesystem.

- For Kubernetes pods the file system should really be treated as a /tmp - which we are doing.

- When a charge station connects, the FTP username/password is a temporary generated set of tokens that is checked against our MongoDB.

Essentially, I'm using FTP as a throwaway here.

If you think through this you can imagine it would be quite a lift to accomplish this with an existing FTP server.

stjohnswarts 1890 days ago

They describe why, tons of edge cases from poorly implemented ftp clients, they needed full control over the protocol to handle weird edge cases because of multiple embedded clients.

meritt 1891 days ago

Because nodejs kubernetes modern cloud.

dominiek 1891 days ago

The main reason for using Node.js is because the rest of our stack is Node.js: https://bedrock.io

We use MongoDB as persistence and have existing wrappers for dealing with Google Cloud Storage.

Since it's an isolated service we could've used a different implementation language.

In our case Node.js in our existing Kubernetes environment was the least amount of friction

dvfjsdhgfv 1891 days ago

> Because nodejs kubernetes modern cloud.

Not necessarily so. The history of FTP servers is ridden by bugs with practically no exceptions. At some point some folks decided they finally implement a bug-free implementation and even dared to call it "Very Secure FTPd." Needless to say, it turned out it has bugs, too.

As most of these bugs were related to buffer overflows and similar issues, implementing a new FTP server in a safer language is not such a bad idea, and today's JavaScript is efficient enough to make it a reasonably well-working implementation. I pity the author though for the bugs they encounter and workarounds that will need to be implemented.

tyingq 1891 days ago

I agree, but I don't see why we would then assume that forking some ftp server library from npm would fare any better, security wise.

I see a fairly alarming open issue: https://github.com/autovance/ftp-srv/issues/167

dominiek 1891 days ago

Exactly my reasoning. See my comment above for more info.

I'm making the maintenance of this less painful by doing a hacking/debugging session with manufacturers once a month where we hook up many devices and fix issues. After addressing most edge cases fewer are coming up now (despite a relentless stream of new cheaply manufactured devices)

gogopuppygogo 1891 days ago

I’m just glad they used ftp over tftp. Maybe someday they’ll use FTPS but I have my doubts it’ll ever catch on with the popularity of SFTP.

lightdot 1891 days ago

Just to note for those who don't know, FTPS and SFTP are completely different protocols.

The similarity of the names often causes confusion, but SFTP has nothing to do with the venerable FTP.

SFTP stands for "SSH file transfer protocol" and it's a completely different beast. IMHO, a somewhat unfortunate naming choice, but that's water under the bridge.

(...while FTPS stands for "FTP over SSL", and that actually uses plain old FTP with an additional SSL/TLS layer...)

chasil 1891 days ago

These standards are so very different, and they don't scale well.

TFTP is actually over UDP, guarantees only one data packet on the wire at any one time (no sliding window), does not support listing a remote directory, and is extreme in simplicity.

FTPS has such arbitrary controls for TLS optional versus required status over control and data channels that it is easy to misconfigure.

SFTP lacks two key features (amidst jump host and other scope creep frenzy), anonymous mode and URL support in a browser.

A new file transfer protocol, restricted to DJB ciphers a la Wireguard, able to run over TCP or UDP would likely be best. If Chrome and Safari both added browser clients, the server world would likely dump most FTP the next day.

https://mywiki.wooledge.org/FtpMustDie

LinuxBender 1891 days ago

SFTP supports anonymous access. I actually just shut down my sftp server to move it or I would be able to show you, but it's super easy on CentOS. Just set up chroot and set a null pw for the usernames of your choice. You can use posix permissions to hide subdirs or files if you wish. You can use chattr or mount permissions to make it read-only or write-only. The only thing missing is browser support. I might have time to put it back online later today and will update this thread.

chasil 1891 days ago

Ideally, an FTP emulation of any password for FTP/anonymous, recorded to /var/log/secure, would be within SFTP (maybe checking for an "@" character followed by some dots, hoping for an email).

Forcing the null password up the stack to /etc/shadow (or other credential sources) potentially compromises PAM and other applications that may depend upon it.

It sounds like you've implemented a separate SSH server within a chroot for this to protect the base OS; I've done the same for tinyssh with nspawn for an internal project. This is not easy.

Anonymous access for SFTP doesn't scale to the extent used in FTP, even omitting browser access.

LinuxBender 1891 days ago

FTP is certainly more flexible and virtual users are far more secure than adding folks to /etc/passwd. PureFTPd [1] was my favorite for that very reason. There have been a few FTP daemons that supported the SFTP protocol and had virtual users, but they had too many bugs for me. I believe ProFTPd was one of them.

Regarding SFTP and null passwords, I do not use a separate sshd. I just use the "Match" stanza in OpenSSH. Any SFTP users I add are in the sftpusers group and don't have a shell. SELinux will block some nonsense. For a few years, I had a cron job that was dynamically adding any account that bots would try. I think I was up to about 23k SFTP accounts. I will fire it back up either today or tomorrow and you are welcome to do a pen-test on it. I will also post the sshd_config.

[1] - https://www.pureftpd.org/project/pure-ftpd/

chasil 1891 days ago

I was forced to implement chroot() for SFTP users under Oracle/RedHat Linux 5. We are, alas, still running it.

The OpenSSH 4.3 release on this platform does not support the "match" keyword, but I was able to coerce it to run a separate SFTP-only on port 24, where I constrained the SFTP-specific accounts. I find that I prefer this approach.

My wily users then discovered that the working passwd entry also let them login with FTP on port 21, so careful control of allowed groups for both protocols was eventually required. Afterwards there is always the nagging suspicion that something was missed.

OpenSSH would also be much better with localized SFTP accounts that were not defined in /etc/passwd. Add that to the wishlist.

layoutIfNeeded 1890 days ago

>SFTP lacks two key features (amidst jump host and other scope creep frenzy), anonymous mode and URL support in a browser.

You forgot the abysmal performance compared to FTPS due to the SSH flow control conflicting with the underlying TCP flow control.

dominiek 1891 days ago

Keep in mind that I have no control on the client. No manufacturer has implemented TLS let alone sFTP.

If I had any influence on the protocol it would be HTTPs. This is why I wasn't expecting to build an FTP server in 2020.

dheera 1891 days ago

The good old days ...

Username: anonymous

Anonymous login accepted, enter e-mail address as password.

Password: aoeu@aoeu.com

I bet aoeu.com and asdf.com got a good amount of unwanted mail back then.

tdeck 1890 days ago

I get ASDF, but what is AOEU based on?

dirkt 1890 days ago

https://www.aoeu.com/

> It resulted from a typo. I was writing a domain registration system for an ISP and, during some debugging, forgot to use the --no-act command line flag and ended up registering aoeu.com for real. Since it was so easy to type, I kept it.

The letters A O E U are the first four characters of the left hand on a Dvorak keyboard, which is the layout I use.

alricb 1890 days ago

Dvorak

gpvos 1890 days ago

This is why example.com exists.

1vuio0pswjnm7 1890 days ago

Will Kubernetes or Node.js make it to 50.

a-dub 1891 days ago

i sincerely hope that insecure ftp is either running over tls or a vpn...

dominiek 1891 days ago

Yep VPN. The devices don't support TLS. (We have Cloudflare in front of other services)

stjohnswarts 1890 days ago

doesn't that open up like a ton of man-in-the-middle attacks? I realize it may just be one of those "no other option" things but dang gritted-teeth-emoji

waynesonfire 1891 days ago

> I never thought I would say this, but I actually implemented an FTP server in 2020.

If you did this for work, junior engineer move in my opinion. This practice is called not invented here syndrome.

tooltower 1891 days ago

This very much depends on the ecosystem they had in their specific embedded environment. Some have a few kilobytes of working memory: so they'd have to download the file and write it straight to flash (yes, security is a problem but manageable). There are many common cases where, for instance, malloc is disabled.

You don't always have off-the-shelf packages for every conceivable environment you work in.

tinus_hn 1890 days ago

It’s a 50 years old protocol, chances are you’re not the first one who has this problem.