| In general, a very nice thing and thanks for sharing! But... 1. Are you sharing the raw data? (It would be great!) 2. It would be useful to see frequencies (e.g. as sizes of nodes). 3. Why we cannot see 'git', 'ls', 'cd'? 4. At the first glance things like "pytho", "sourc", "worko" look like a glitch.
Also, when there is a cluster of commands starting with the same thing, the subgraph is hardly informative. How about scaling text (instead of cutting them)? 5. When it comes to a measure of co-occurrences, a nicer quantity than correlation is the following - http://stats.stackexchange.com/questions/6047/does-this-quan... with a direct interpretation of "how does the observed coincidence rate correspond to the expected one for independent variables". I used it a few times (after testing other measures of co-occurrences (also: conditional probability) and being dissatisfied by results, especially ones favouring edges for big or small nodes). Examples (with their recipes) below: My StackExchange visualization:
https://github.com/stared/tag-graph-map-of-stackexchange/wik... And my visualization of themes in books:
http://stared.github.io/wizualizacja-wolnych-lektur/polish_b... |
Thanks for the link in 5)! That's really useful.
I really like the two visualizations you've made -- I'll definitely look into incorporating some of those ideas when I have time.