| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by isoprophlex 363 days ago

I swear I'm not making this up; a guy at my current client needed to join two CSV files. A one off thing for some business request. He wrote a REST api in Java, where you get the merged csv after POSTing your inputs.

I must scream but I'm in a vacuum. Everyone is fine with this.

(Also it takes a few seconds to process a 500 line test file and runs for ten minutes on the real 20k line input.)

4 comments

fredrikholm 362 days ago

The worst part of stories like this is how much potential there is in gaslighting you, the negative person, on just how professional and wonderful this solution is:

  * Information hiding by exposing a closed interface via the API
  * Isolated, scalable, fault tolerant service
  * Iterable, understandable and super agile

You should be a team player isophrophlex, but its ok, I didn't understand these things either at some point. Here, you can borrow my copy of Clean Code, I suggest you give it a read, I'm sure you'll find it helpful.

link

cfiggers 363 days ago

I'm really dumb, genuinely asking the question—when people do such things, where are they generally running the actual code? Would it be in a VM on generally available infra that their company provides...? Or like... On a spare laptop under their desk? I have use cases for similar things (more valid use cases than this one, at least my smooth brain likes to think) but I literally don't know how to deploy it once it's written. I've never been shown or done it before.

link

marifjeren 363 days ago

Typically you run both the client program and the server program on your computer during development. Even though they're running on the same machine they can talk with one another using http as if they were both on the world wide web.

Then you deploy the server program, and then you deploy the client program, to another machine, or machines, where they continue to talk to one another over http, maybe over the public Internet or maybe not.

Deploying can mean any one of umpteen possible things. In general, you (use automations that) copy your programs over to dedicated machines that then run your programs.

link

pbohun 362 days ago

Was it joining on some columns or just concatenating the files?

I'm going to laugh pretty hard if it could just be done with: cat file1.csv file2.csv > combined.csv

link

xnx 353 days ago

There are also a lot of command line options for joining by column like csvkit

link

Xenoamorphous 362 days ago

You need to account for the headers, which many (most?) csv files I've encountered have.

So I guess something like this to skip the headers in the second file (this also assumes that headers don't have line breaks):

  cp file1.csv combined.csv && tail -n+2 file2.csv >> combined.csv

link

withinboredom 363 days ago

I mean, it would be faster to just import them into an in-memory sqlite database, run a `union all` query and then dump it to a csv...

That's still probably the wrong way to do it, but 10 minutes for a 20k line file? That seems like poor engineering in the most basic sense.

link

isoprophlex 363 days ago

It's a twenty line bash script. Pipe some shit into sqlite, done.

But the guy 'is known to get the job done' apparently.

link

bee_rider 363 days ago

Maybe he’s recognized something brilliant. Management doesn’t know that the program he wrote was just a reimplementation of the Unix “cut” and “paste” commands, so he might as well get rewarded for their ignorance.

And to be fair, if folks didn’t get paid for reinventing basic Unix utilities with extra steps, the economy would probably collapse.

link

isoprophlex 363 days ago

Clearly I'm the dumbass in this story, as we're all paid by the hour...

link

bee_rider 363 days ago

Clearly! He’s found a magic portal to the good old days when the fruit was all low hanging, and you keep showing up with a ladder.

link

xnx 353 days ago

csvkit and duckdb would also be good options. Any llm will spit out a one-liner for any type of join you can describe.

link

strken 363 days ago

I'd probably think of xsv, go to its github repo, remember it's unmaintained and got replaced by qsv, and then use qsv.

link