Hacker News new | ask | show | jobs
by blowski 4200 days ago
Can someone explain how the author classified the topics? As I understood the article (and to be honest, I didn't understand it very well) he:

1. Takes a corpus of 'words often used in topic X'

2. Compares that corpus to the script, divided into 12 sections

3. Gives a value to how much the corpus corresponds to the script

A couple of things which interested me:

* Finding original films - would it be possible to come up with a list of films which have been manually classified as 'romantic' but which don't follow the standard 'romance' plot arc?

* Unusual direction or editing - Are there films for which the dialogue can't be used to classify what's going on? Perhaps analysing the soundtrack (loudness, bpm, minor vs major keys) and the video (brightness, colouring, movement) and comparing it to the dialogue would show something interesting.

* Compare the 'deviation from the norm' to reviews, awards, box office takings, and press coverage.

Unfortunately, I have absolutely no idea how to do something like that. Just wondering if it's been done before.

2 comments

Answering your first question:

Latent Dirichlet Allocation (http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation )

He links to a good explanation about topic modeling and LDA: http://tedunderwood.com/2012/04/07/topic-modeling-made-just-...
Thank you both.
Answering your second question: "Quest for Fire".
Koyaanisqatsi