Martin O’Leary 3rd Prize Winner – EMI Music Data Science Hackathon

What was your background prior to entering this competition?

I’m a Research Fellow in glaciology at the University of Michigan. Prior to this I’ve been involved in a number of Kaggle competitions, including mapping dark matter, automated essay scoring, and predicting shopper behaviour.

What made you decide to enter?

As I said, I’m a regular Kaggle competitor, and I enjoyed the opportunity to do something quick and fun, rather than getting bogged down in a multi-month project.

What preprocessing and machine learning methods did you use?

I did some very minimal pre-processing to get the data into a usable form, then threw a wide variety of machine learning algorithms at it. I had the most success with random forests.

What was your most important insight into the data?

It’s really important to look at individual user biases. A 50 from one person could be the equivalent of a 70 from someone else. It’s also really effective to divide things up by artist. You can get away with very simple models if you have a separate one for each artist.

Were you surprised by any of your insights?

I was really surprised how little demographic factors mattered. I assumed that age and gender would be really good predictors of musical taste, but they turned out to be not so great.

Which tools and programming language did you use?

I wrote all my code in R, except for some C extensions.

What have you taken away from the Music Data Science Hackathon?

I really need to read up on modern collaborative filtering techniques. I managed to do okay with just generic machine learning algorithms, and some basic stuff I picked up from the Netflix Prize papers, but there’s a lot more stuff out there nowadays which would have made things easier.

What did you think of the 24 hour hackathon format?

I really enjoyed it. Looking forward to the next one!

Here is my approach

And here is my code in github

Leave a Comment:

Data Science London © 2017