|
|
|
|
|
by webspiderus
5353 days ago
|
|
having taken the 224W, 229, and 246 courses, I will say that they provide a good introduction to a lot of the data mining algorithms (with 224w emphasizing graph-based algorithms, 246 emphasizing things like decision trees and association rules, and 229 emphasizing regression and SVMs). however, the main issue for me is bridging the gap between completing those courses and actually being able to apply the skills learned to real-world problems. i hoped to do that with my internship this past summer, but it proved to be more difficult than i thought, particularly without having much guidance from anyone with similar specializations. how much value does a certificate like this signal then? i feel like there's not much it shows beyond commitment and a focused interest, since it seems to me that the true test of anyone in data mining comes in the form of projects, not classes. |
|
The process of exploratory data analysis is best done in (imho) the style of a Lakatos research programme.
- prepare and clean the data
- explore using many fast methods and charts, come up with some working hypotheses about the data that are important to your client
- select method(s) to test those hypotheses
- perform the analyses
- determine what your results mean in terms of your research goal.
- alter your list of working hypotheses
- repeat [possibly collecting more data]
Obviously the only hard part about this is step five. And unfortunately this is the step that isn't t really taught in my experience. A simple case: Let's say you had a linear regression and you ran it once with 2 variables, got some parameter estimates (a,b) , and ran it again with 3 parameters and got some more parameter estimates (a1,b1,c). If b != b1, what does this mean? If you are using a custom link function (e.g. cloglog or logit), how should you interpret this now? This is where having a deep understanding of the mathematics behind the techniques starts to pay off. And this is the simplest example of a basic regression.