Hacker News new | ask | show | jobs
by tcarn 2587 days ago
I work in the enterprise payments space that functions on ftp transfers of xml, edi and text files and it's not necessarily a bad thing, just as long as it works. Much easier to troubleshoot why you got bad data if you have a ftp to look at with a daily xml file that gets posted.
2 comments

Except that failure modes of ftp (technically not ftp-the-protocol itself, but what the server chooses to do with the file) is not well defined. What happens if the connection dies half way through? Is the partial file processed? None of it? Does the file get moved after the upload is done or after it has been fully processed? Does every single last company in this space's ftp site behave the same in the face of errors? Also is this literally ftp and not anything more recent that includes encryption, but if it includes encryption, what ciphers are supported? Nevermind the files may be in ebdic or something else wonderfully obscure...

(I also work in payments. Bank's SFTP sites have under-defined failure modes.)

As do I (bank file transmissions representing!) To compensate this partial file fiasco, we tend to rely on the .done file methodology (i.e. we won't pick your file from your server until your script writes out a dummy file we can locate). Or we'll allow you to just push the file to us. Our system will notice partial file send of course, where the transmission stopped. But we can't determine if the file was partial to begin with. So we rely on balance reports to come via alternate FTP or email transmissions.

Don't get me started on sending ASCII as binary to the mainframe to compensate for the EBCDIC formatting. Or the lack of carriage return and line feed characters that cause so many fun issues.

All of that to say that none of it's pretty, but all of it works. The balancing is key as are the extra staff needed to verify them against each other.

Can this be solved by sending two files, one with the data and one with a checksum of the data?
That's a reasonable idea! There are a wide variety of ways to solve the problem, using ftp uploads as the primitive, but ultimately that's... kind of the problem. Everyone's solution is different from everyone else's, but those different solutions have different ramifications, so when they fail, they have to handled very differently. That is to say, OOP, classes and inheritance, only gets you so far.

(An issue w/ whole file checksums is that there are cases where partial file processing is desirable, but that's not to say there's not use of checksums.)

That's what we do. Have a summary file and a detail file and they need to match.
It's also bad when your trading partner's sender process crashes and they build up a backlog of a million messages, and then once they fix it, they want to dump their whole backlog on you at once.

Or their process gets stuck in an infinite loop and resends you the same message millions of times, etc.

Yeah, I've dealt with SFTP + CSV workflows and it's not so bad. I ended up writing a virtual SFTP server which was not backed by a filesystem, and which would prevent malformed data from being written, and make closed/authoritative files immutable.

It's obviously not ideal, compared to a well-thought-out purpose-built API and a complete set of tools, but that takes work, and isn't always better faster than the refined hack.

Wouldn't it be simpler to upload a file with a tmp name and then at the end rename the file if there was no error? Renaming is pretty much an atomic operation.
It would be, but they might not have the latitude to dictate the connection AND it might be multiple-file batches.