|
Also, I am going to go out on a limb here and guess that R's `read.csv` doesn't do what one hopes it would when fed this CSV: 10,3,Brian,"You mean like the time you had tea with
Mohammad, the prophet of the Muslim faith?
Peter:
Come on, Mohammad, let's get some tea.
Mr. T:
Try my ""Mr. T. ...tea.""
"
Well, it seems people are not understanding the problem with this line. Here is the screenshot of the original script: http://imgur.com/pcu5N2U Brian: You mean like the time you had tea with Mohammad, the prophet of the Muslim faith? [flashback #3]
Peter: Come on, Mohammad, let's get some tea. [Mohammad is covered by a black box with the words "IMAGE CENSORED BY FOX" printed several times from top to bottom inside the box. They stop at a tea stand.]
Mr. T: Try my "Mr. T. ...tea." [squints]
There, three characters speak.However, R's read.csv will assign all three characters' speech to Brian: http://imgur.com/gLpPKdl > x[596, ]
Season Episode Character
596 10 3 Brian
Line
596 You mean like the time you had tea with Mohammad, the prophet of the Muslim faith? \nPeter:\nCome on, Mohammad, let's get some tea. \n
> x[597,]
Season Episode Character
597 10 3 Brian
Line
597 You mean like the time you had tea with Mohammad, the prophet of the Muslim faith? \nPeter:\nCome on, Mohammad, let's get some tea. \nMr. T:\nTry my "Mr. T. ...tea." \n
as well as seemingly duplicating part of the conversation.PS: In addition, both Muhammad and Mohammad appear, presumably under-counting the references to the prophet. |
The data sources are CSVs in this repository: https://github.com/BobAdamsEE/SouthParkData/
Looks like all the data is preprocessed, with everyone mostly having only 1 line. (Actually, it appears the line you note in 10-3 is broken!) You can make an argument that the script isn't processed correctly, but that's beyond the scope of the analysis, although a note might be helpful.