Hacker News new | ask | show | jobs
by olooney 2483 days ago
Terminology note: data like images and voice which have strong spatial or temporal patterns are actually referred to as "unstructured" data; while data you get from running "SELECT * FROM some_table" or the carefully designed variable of a clinical trial are referred to as "structured" data.

If this seems backwards to you (as it did to me at first) note that unstructured data can be captured raw from instruments like cameras and microphones, while structured data usually involved a programmer coding exactly what ends up in each variable.

As you say, deep neural networks based on CNNs are SOTA on unstructured image data, RNNs are SOTA on unstructured voice and text data, while tree models like random forest and boosted trees usually SOTA on problems involving structured data. The reason seems to be the that the inductive biases inherent to CNNs and RNNs, such as translation invariance, are a good fit for the natural structure of such data, while the the strong ability of trees to find rules is well suited to data where every variable is cleanly and unambiguously coded.

1 comments

Yeah, that's right. Doing too little proofreading with HN comments...