UPDATED 2012-08-29: A few more improvements to the alignment for compound metres (9/8 and 12/8) in the mirex_triad.txt files can be found at:
burgoyne2011chords-2012-08-29.tar.bz2
All other files remain the same as the 27 August release.
UPDATED 2012-08-27: With a huge amount of help from Bas de Haas, a large number of errors have been corrected. All users of the Billboard data set should download the new archive below:
burgoyne2011chords-2012-08-27.tar.bz2
In addition to the corrections, each directory also now includes mirex_triad.txt, a file in the style that will be used to evaluate the MIREX 2012 competition. A technical report on the generation of these files is forthcoming.
UPDATED 2012-01-28: With special help from Greg Burlet, two quick-and-dirty CSV files are available to make using the annotations for some common tasks easier.
As with the original database, please use the ISMIR 2012 paper [1] to cite these data whenever you use them.
UPDATED 2011-11-12: A huge thank-you to Kazuyoshi Yoshii, who reported a number a files that were inconsistent with the format standard and helped identify a bug that had removed many leading-instrument annotations. Any researchers who downloaded the corpus prior to 12 November 2011 should download the new version.
Thank you for your interest in the McGill Billboard annotations! You can download the first (updated) release here:
This release contains the annotations and audio features corresponding to the first 1000 slots from the random sample, as presented at ISMIR 2011 [1], which constitute approximately half of the total data available. We will release the remaining data progressively over the next couple of years in order to ensure that there are unseen data available for evaluating algorithms at MIREX or related events.
The set includes annotations and features for 649 slots, as we were unable to acquire audio for every slot in the sample, and comprises 545 distinct songs, as due to the nature of the sampling algorithm, some slots correspond to the same song. Training algorithms that assume independent, identically distributed data (as most do) should retain the duplicates.
Each slot appears in the archive as a numbered folder containing three files:
Each annotation begins with a header including the title of the song (prefixed by # title:), the name of the artist (prefixed by # artist:), the metre (prefixed by # metre:), and the tonic pitch class of the opening key (prefixed by # tonic:). Similar metre and tonic comments may also appear in the main body of the annotations, corresponding to changes of key or metre. In some cases, there is no obviously prevailing key, in which case the tonic pitch class is denoted ?.
The main body of each annotation consists of a single line for each musical phrase or other sonic element at a comparable level of musical structure. Each line begins with a floating-point number denoting the timestamp of the beginning of the phrase (in seconds) followed by a tab character. There are special lines for silence at the beginning and end of the audio file and a special line for the end of the piece. The other lines continue with a comma-separated list of elements among the following.
->), which is a musicological hint that the phrase is musically elided into the following phrase.More detail on the structural annotations is available in [5]. Essentially, these annotations replace the lower level of structural annotations (lowercase letters) from this reference with chord annotations. Beware that the structural annotations have been vetted less rigorously than the chord annotations; if you find any errors, please contact Ashley Burgoyne.
The chord annotations are simplified to the beat level. All chord symbols follow the standard presented at ISMIR 2005 and used in MIREX since [4], with a few additions to the shorthand to facilitate the richness of these annotations: 1 for unharmonised bass notes, 5 for power chords, and sus2, maj11, 11, min11, maj13, 13, and min13 for the corresponding chords in traditional jazz notation. An additional pseudo-chord type of 1 denotes bass notes with no chord on top. To save space, repeated chords are denoted with a dot instead of the full chord name. To further save space, bars containing a single chord on all beats list the chord symbol only once; likewise, in quadruple metres (4/4 or 12/8), bars with only two chords and the change on the third beat list those two chords with no dots. For brief changes of metre, the metre may appear in parentheses at the beginning of the bar rather than as a full metre comment.
Two non-chord symbols may appear within bars. For passages that were too musically elaborate to merit beat-level chord annotations, annotators sometimes filled the bar with an asterisk (*). For brief pauses of arbitrary length (often a single beat), annotators added a bar with the special annotation &pause.
Please e-mail any questions or comments to Ashley Burgoyne.
[1] John Ashley Burgoyne, Jonathan Wild, and Ichiro Fujinaga, ‘An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis’, in Proceedings of the 12th International Society for Music Information Retrieval Conference, ed. Anssi Klapuri and Colby Leider (Miami, FL, 2011), pp. 633–38.
[2] Thierry Bertin-Mahieux, Daniel P. W. Ellis, Brian Whitman, and Paul Lamere, ‘The Million Song Dataset’, in Proceedings of the 12th International Society for Music Information Retrieval Conference, ed. Anssi Klapuri and Colby Leider (Miami, FL, 2011), pp. 591–96.
[3] Matthias Mauch and Simon Dixon, ‘Approximate Note Transcription for the Improved Identification of Difficult Chords’, in Proceedings of the 11th International Society for Music Information Retrieval Conference, ed. J. Stephen Downie and Remco C. Veltkamp (Utrecht, the Netherlands, 2010), pp. 135–40.
[4] Christopher A. Harte, Mark B. Sandler, Samer A. Abdallah, and Emilia Gómez, ‘Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations’, in Proceedings of the 6th International Conference on Music Information Retrieval, ed. Joshua D. Reiss and Geraint A. Wiggins (London, England, 2005), pp. 66–71.
[5] Jordan B. L. Smith, J. Ashley Burgoyne, Ichiro Fujinaga, David De Roure, and J. Stephen Downie, ‘Design and Creation of a Large-Scale Database of Structural Annotations’, in Proceedings of the 12th International Society for Music Information Retrieval Conference, ed. Anssi Klapuri and Colby Leider (Miami, FL, 2011), pp. 55–60.