We introduce the Music Listening Histories Dataset (MLHD), a large-scale collection of music listening events assembled from more than 27 billion time-stamped logs extracted from Last.fm. The logs are organized in the form of listening histories per user, and have been conveniently preprocessed and cleaned.
Attractive features of the MLHD are:
We describe the process of assembling the dataset, its content, its demographic characteristics, and discuss about the possible uses of this collection, which, currently, is the largest research dataset of this kind in the field.
The Music Listening Histories Dataset (MLHD) will be released at the 18th International Society for Music Information Retrieval Conference (ISMIR) on October 23rd, 2017.
We are extremely grateful of all users of Last.fm that have agreed to make their data available for non-commercial use, and also to the Last.fm service, which has collected and offered this data since 2002 uninterruptedly, helping the field of music informatics research to move forward.
This research has been supported by BecasChile Bicentenario, Comision Nacional de Ciencia y Tecnologia, Gobierno de Chile, and the Social Sciences and Humanities Research Council of Canada. Important parts of this work used ComputeCanada’s High Performance Computing resources.
For additional information contact Gabriel Vigliensoni at gabriel.vigliensonimartin (at) mcgill.ca