Hacker News new | ask | show | jobs
by sergey-obukhov 3495 days ago
Re-pasting my comment here where it belongs as a reply:

It could be a great idea - to plot the data on 2 axes (x - html length or html size, y - processing time, if I understand this correctly). It's simple and elegant. I'll try that. It could be though that the chart will get messy with all this data points. One of the reasons I like the percentiles approach is that it makes it clear that there is a trade-off between message processing time and the number / percentage of messages we can process.

3 comments

> It's simple and elegant.

The OP likely made that comment because plotting the data is often done before the fancy machine learning as a part of the exploratory data analysis. (especially with a low number of variables!)

IanCal's comment about setting the alpha is a good one. Another easy option is to make a heatmap of 2D bin counts. Since you're using R already, you can use ggplot for this:

http://docs.ggplot2.org/current/geom_bin2d.html

Set a low alpha to the points on the chart of there are loads.