Hacker News new | ask | show | jobs
by michaelochurch 4217 days ago
Now if you are a programmer, how do you get into the industry? You need to know stats, machine learning, and programming, really well.

My experience being a quant and being around quants is that, sadly, they don't get to use much machine learning. Some do, but it seems like 97-98% of the work is much more mundane.

Am I correct in perceiving that many hiring managers expect more in the way of interesting experience than they really need (or have to offer in the work that needs done)?

1 comments

> Am I correct in perceiving that many hiring managers expect more in the way of interesting experience than they really need (or have to offer in the work that needs done)?

Yes I think you are Michael. Especially for quant jobs 2 and 3 that I enumerated.

The only reason I listed machine learning is that it is the trogan horse that gets programmers to learning stats. I can't over emphasize enough just how much you need to know stats. half of trading is knowing stats as all good traders trade only when they have the advantage. 100% upside for 10% downside is a good rule of thumb.

I mentioned the deep dive into the machine learning techniques as unfortunately most of the programmers I meet who call themselves data scientists, just aren't very good at stats. I'll ask them what methods they use and they'll tell me the library they used.

Unfortunately most interviews go like this:

me: I see you did some K-means clustering, tell me about it.

them: Well, we used R to cluster related books into groups.

me: how did you choose the inital partition

them: blank stare, um the library does that.

me: yes, well I mean did you use Forgy or Random Partition

them: I don't know what those are. That's just random trivia that I would look up when working. Do you find asking random trivia helps you hire people:

me:...So do you have any questions for me?

Hedge funds have lots of mundane programming jobs. Most are back office stitching together WPF and accounting systems. Even at the HFT firms I've seen only a small portion of the programmers actually do anything on the main trading system.

I'd love to hear about your Jane Street experience here:)

Like most things the closer you are to the money the more you'll make and the more exciting your job will be:)

Most quant jobs that I've seen were very mundane, category 3 is most of the quant jobs. Though most will tell you that they are in category 1 :)

Me: ...So do you have any questions for me?

Instead of trying to "expose" these people for supposedly over-representing their competence in some skill area X mentioned in an ancillary way on their resume (relating to some job they had N years in the past), you might want to try just asking them a simple normal, human question first: "So was this something you go to delve into the internal workings of? Or did you just use an API for some small task?"

If it's the former answer, then fine -- drill away into the internals. If it's the latter (which is by far the more typical case -- the way things go in most dev environments), then you can just say "That's fine", and look for some other bullet point to zoom in on. Or if in fact X is a major requirement for you -- you can say "That's fine, but we're really looking for people who have done at least a fair amount of X here. But we thank you very much for taking the to talk with us."

Either way -- there's no reason to go into snark mode and assume these people are frauds or idiots just because they don't have a lot of detail to offer you. The bottom line is that unless they mention X as some major skill area, you pretty much have to assume that it was just another random thing they were forced to learn that day (you know, to keep their jobs). And unless it's mentioned in a way that would suggest they spent a significant amount of time on it, of course they aren't going to remember very much beyond the most superficial aspects of how it works.

"The bottom line is that unless they mention X as some major skill area, you pretty much have to assume that it was just another random thing they were forced to learn that day"

If you call yourself a data scientist but don't know a lick of statistics then you are the one being misleading. You can't just pick this stuff up overnight. It is a bit like claiming you are a systems developer but then in the interview you don't know a thing about pointers or what the heap is. If your actual knowledge of systems development is "I wrote a python wrapper around a c library once" then representing yourself as a systems dev is intentionally misleading your interviewer.

If you want to find out if a candidate knows "a lick of" statistics, try asking about something generic, like confidence intervals. But he didn't do that; instead he grilled them on the default partition scheme for algorithm X.

Which (unless the candidate advertises themselves as having significant experience in algorithm X) is about as useful as asking them if they know the airspeed velocity of an unladen swallow.

You are seriously underestimating the importance of deep statistical knowledge in some of these jobs.

In the example, the candidate wasn't grilled on the default partition scheme; he or she was grilled on the partition scheme that was used. If someone thinks that's an unimportant implementation detail, why would you ever want to hire them?

You are seriously underestimating the importance of deep statistical knowledge in some of these jobs.

In some of these jobs, maybe. But the person I was initially responding to wasn't looking for someone with "deep statistical knowledge"; rather, he was looking for that species which goes by the trendy, loosey-goosey catch-all moniker, "data scientist".

And if you're going to put "data scientist" in a job title -- with no further qualifications -- then you had better understand that it's nearly useless as a description of anything (beyond a few colloquial definitions floating about, which there's still no real consensus on).

If on the other hand, you want an "MS/PhD background in Machine Learning and/or Statistics, or equivalent work experience" then that's fine too, of course -- just put it in the job description. It's really quite easy to do, and it will save everyone tons and tons of time (and grief).

In the example, the candidate wasn't grilled on the default partition scheme; he or she was grilled on the partition scheme that was used. If someone thinks that's an unimportant implementation detail, why would you ever want to hire them?

Again, it matters to the extent that they, themselves, emphasize it as a core skill area. When someone simply says "did $foo using X" in some project description, I personally don't read anything more into it than that.

No regrets.... if you are going to suck up to traders to get in on their pile of gold, either commit to it or don't. I used to complain about code exams until i realized "this is how you prove you want/deserve the job.... if someone else can pass where i fail, that is not necessarily the hiring managers fault". if you cant answer their questions dont expect the job.... you either want it enough to fortify your defenses or you dont. Even if they are going to ask a bunch of redundant, irrelevant, easily Google-able crap -- not having an answer is going to beg the question "how come they never, out of curiosity or to impress people in my position, Googled this easily Google-able crap?"

i just dont think these environments r made for you or me...... i used to win many math comps and was at the top of my class for years & years. in college i went into media programming & psychology because i didnt really want to study stuff i thought was boring all day just to be a market kiss-ass

as a result, i have a pretty boring job but i let my mind wander creatively & still work on a ton of music & creative projects. I've designed some pretty sophisticated machine learning algorighms/research in audio signal processing but i don't get to study it all day because there is not much market demand.

i know i have more raw talent than most people i interact with (in fact im confident i can self-teach myself various technical disciplines to near-expertise) but i've also come to understand that employers don't care. beyond entry level, they are just looking for someone who says the right things and has already done work that resembles the job they are hiring for.

sometimes i consider self-teaching toward a goal like being a quant, but then i remember that that world is not for me. im not insanely greedy so its not a big deal. and though my enterprise work now is boring, there are several things on the horizon that seem much more promising & appealing than finance. i am accumulating skills & enjoying having room in my head for other thoughts. i don't work in the startup world because i haven't encountered anyone with compelling ideas but i suppose thats where people like me are destined to end up.... i stay sharp & am confident that i could code a mini-enterprise in a year, once i find the right idea.

What areas of statistics would you say are the important ones? Obviously, having a excellent grasp of probability is a must, but what about other things? If I were a statistics major, what courses would you say are most important? It seems like things like regression analysis would be much more important than experimental designs, no?
The only reason I listed machine learning is that it is the trogan horse that gets programmers to learning stats.

Ah, that makes perfect sense.

To me, the appeal of machine learning is that it's challenging and respected enough that programmers doing it get the autonomy to work in any subfield of computer science, from the high level to the low. If you're a data scientist and you say you want to use Clojure or Haskell (a high-level concern) or that you want to do GPU programming or dive deep on assembly (low-level work) you can. Machine learning, 10 years ago, was extremely appealing because software managers were figuring out that they needed it, but most admitted they knew little about it, so they gave a lot of autonomy to individual contributors. (That may change, and "data science" may become thoroughly commoditized.)

It's the Fundamental Theorem of Employment: you're usually hired either to do (a) something your boss can't do for himself or (b) something he doesn't want to do. With (a) you get respect and autonomy and high pay; with (b) you get treated like a commodity. "Data scientist" (or software "architect" vs. "engineer") is, often, a programmer who's managed to learn enough of "the hard stuff" to move himself over to (a). It's the (b) category of engineers who get stuck on "Scrum teams".

I feel like some of the crowding of "data science" (and, as you noted, not all of the "data scientists" know what they're talking about) comes from the way that "Agile"-style micromanagement has made the rest of programming so braindead. There are people like me who enjoy the hard mathematical aspect, but others who've just learned that if they call themselves "data scientists" they get more interesting projects and don't have to munch on Scrum tickets. For them, the math is an impediment rather than a challenge and an attraction.

I mentioned the deep dive into the machine learning techniques as unfortunately most of the programmers I meet who call themselves data scientists, just aren't very good at stats.

There's a depth vs. breadth problem, because machine learning is a much, much bigger field than many people think. I've gone pretty deep on penalized regressions (e.g. ridge and Lasso with large numbers of features) but know only the basics about tree-based models. I can read the papers on neural network architectures (e.g. convolutional nets) and implement them, and I understand the theory that led to them, but I still lack some of the intuition (like why rectified linear units are more useful in image processing than regular logistics).

I feel like there are some people who pick up a lot of vocabulary and interview well (but not with you, because you actually know the field) but are really just playing around with parameters. I like the math, but "data science" is mostly bullshit and I hope the term will die; I want to see more machine learning and less business pomposity.