Hacker News new | ask | show | jobs
by Jemaclus 3902 days ago
Clever. One of the more interesting challenges that I've run into in the last few years is just the sheer amount of raw data out there. It's mind-boggling how many problems can be solved if we could sift through that data quickly, from human trafficking down to weather. I'm particularly fascinated by her intuition that writing patterns and templates can identify pimps. I'm not sure how long it would have taken me to come to that conclusion.. but now that it's out there, it's obvious.

I wonder what other problems we can solve with the same toolset.

2 comments

People have been trying to do textual analysis to divine all kinds of things about the people who wrote the text. You'll even find papers in psychology or psychiatry journals claiming to be able to distinguish mental illnesses, based purely on a textual analysis. Also, all manner of religions are essentially founded on textual analysis.

It is correct to call such analyses, tools. They cannot give you answers that you can rely on. At best, they may give you hints of other places to look. However, one problem such analysis can run into is that when the signal (the patterns and templates that the analysis was looking for) disappears, the tool becomes rather worthless (in which case, you may want to consider how useful/correct-for-the-job the tool really was).

In the case described in the article, it seemed like an appropriate use of textual analysis.

The USGS learns about earthquakes in regions without censors via Twitter. [1]

I'm of the opinion we have only scratched the surface of what is possible to predict by analyzing realtime data from social networks, user groups and message board communities.

[1] https://blog.twitter.com/2015/usgs-twitter-data-earthquake-d...

You mean sensors not censors. Sensors sense, censors censor.
Indeed I did!