Hacker News new | ask | show | jobs
by webspiderus 5353 days ago
having taken the 224W, 229, and 246 courses, I will say that they provide a good introduction to a lot of the data mining algorithms (with 224w emphasizing graph-based algorithms, 246 emphasizing things like decision trees and association rules, and 229 emphasizing regression and SVMs).

however, the main issue for me is bridging the gap between completing those courses and actually being able to apply the skills learned to real-world problems. i hoped to do that with my internship this past summer, but it proved to be more difficult than i thought, particularly without having much guidance from anyone with similar specializations.

how much value does a certificate like this signal then? i feel like there's not much it shows beyond commitment and a focused interest, since it seems to me that the true test of anyone in data mining comes in the form of projects, not classes.

3 comments

Given a real world problem, algorithms or the mathematics behind it do not provide insight on their own. The only value knowledge of the algorithms provide is a general understanding of the ways in which you can approach a problem. If you can do the mathematics, you can look at how the way you structure your question influences the results you are seeing.

The process of exploratory data analysis is best done in (imho) the style of a Lakatos research programme.

- prepare and clean the data

- explore using many fast methods and charts, come up with some working hypotheses about the data that are important to your client

- select method(s) to test those hypotheses

- perform the analyses

- determine what your results mean in terms of your research goal.

- alter your list of working hypotheses

- repeat [possibly collecting more data]

Obviously the only hard part about this is step five. And unfortunately this is the step that isn't t really taught in my experience. A simple case: Let's say you had a linear regression and you ran it once with 2 variables, got some parameter estimates (a,b) , and ran it again with 3 parameters and got some more parameter estimates (a1,b1,c). If b != b1, what does this mean? If you are using a custom link function (e.g. cloglog or logit), how should you interpret this now? This is where having a deep understanding of the mathematics behind the techniques starts to pay off. And this is the simplest example of a basic regression.

So, being someone in the market to hire someone with Data Mining and AI skills, this: " i feel like there's not much it shows beyond commitment and a focused interes" is very valuable. Obviously there are other ways that may be cheaper, but this is not a bad way.

The other things I look for are evidence of raw smarts and a track record of accomplishment, CS fundamentals, and, of course, personality. (Also, prefer someone with a poor understanding of Football so that they don't upset me in the fantasy league.)

Well, if their data mining skills are good enough, they still might be a threat in your fantasy league.
touche
I've used SPSS and SAS for the following tasks:

churn analysis, market basket analysis, sequence analysis, segmentation, anomaly detection, risk management, forecasting