| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by glifchits 2472 days ago

Did I miss something or does this project include popularity measures as features?

In the section on dataset features, they include "popularity" (calculated by Spotify) as well as Billboard chart stats like weeks, rank, and a custom-made "score". To me it's not clear whether these features were hidden from the train/test sets or whether the popularity features were only used in their "artist past performance" measures.

If they included these popularity features, it's like asking "can we predict whether a song is a hit just by looking at how popular it is?" If it is the case that they peeked into the future and observed ex-post song popularity, obtaining just 89% accuracy hints at how unpredictable song success truly is. Check out [1] for a famous study of song success which experimentally demonstrates the unpredictability of song success.

[1] Salganik, M. J., Dodds, P. S., & Watts, D. J. (2006). Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science, 311(5762), 854–856. https://doi.org/10.1126/science.1121066

1 comments

Pils 2472 days ago

From the paper:

>To extend previous work, in addition to audio analysis features, we consider song duration and mine an additional artist past-performance feature. Artist past-performance for a given song represents how many prior Billboard hits the artist has released before that track’s release date

emphasis mine.

I wonder how accurate a model using this feature alone would be.

glifchits 2472 days ago

Right, this sentence made it unclear to me whether they only used the popularity features to compute past-performance, or whether they included past-performance in addition to other popularity features.

To your question, other work on success prediction of tweets [1, 2] demonstrates that past-performance is indeed much more predictive than the typical content features. This way of looking at success of "cultural products" assumes it depends to varying extents on both inherent "quality" (measured by content features), and the social processes of sharing (which are much harder to understand ahead of time, as the paper I referenced in my parent post shows).

[1] Martin, T., Hofman, J. M., Sharma, A., Anderson, A., & Watts, D. J. (2016). Exploring Limits to Prediction in Complex Social Systems. Proceedings of the 25th International Conference on World Wide Web - WWW ’16, 683–694. https://doi.org/10.1145/2872427.2883001

[2] Bakshy, E., Hofman, J. M., Mason, W. A., & Watts, D. J. (2011). Everyone’s an Influencer: Quantifying Influence on Twitter. Proceedings of the Fourth ACM International Conference on Web Search and Data Mining - WSDM ’11, 65. https://doi.org/10.1145/1935826.1935845