May 23, 2019

Attendees: Ich, Glenn (guest), Laurent, Tim, Sylvain, Sam, Nestor, Yaolong

Ich: Let’s be very specific in the documentation of the dataset What are we annotation? Why? What are we NOT annotating? WHY? (even more important than what we are annotating) It happens with code all the time, we annotate what we do but not why we got there, and then someone else in the future (including us) repeats the same errors that got us to make the decisions we did but not document

We have decided the corpus, but not the annotation process and annotation’s format

Ich : What would we say if people ask why Mendelssohn String Quartets?

Sylvain: Easy-peasy: We were looking to create a small, novel, and complete corpus of a classical(ish) composer. The corpora of Beethoven, Haydn, and Mozart are too big to be annotated in the amount of time we assigned for this project. Mendelssohn has only 6 string quartets so it is a feasible amount of work. We don’t want to choose a few string quartets from different composers because, eventually, we want to make a study with different composers and see if they deviate from Mendelssohn, and if those deviations could be considered his “style”. We chose string quartets because the form and musical structure is relevant to some of us working with music structure (e.g., sonata form, cadences) while still maintains a challenging and appropiate dataset for other features like keys and harmony.

Nestor: We could look at what other datasets have done. I think we also may be one of the first datasets of symbolic music with more than 2-3 annotators

Tim: In the folk songs dataset from Anja Volk et al. they put all the annotators together, in the same room with the same scores, until they reached consensus, and they did reach consensus

How long would it take to annotate the Mendelssohn String Quartets ?

Sam: Some annotations will be easier than others, we can split in groups for specific annotations

What is the gameplan?

OMR the scores that Laurent got (the 1800s ones) We need to do some pre-processing on them first, then use PhotoScore Laurent will start with one movement for testing: OMR and correction We can divide the rest of the movements between the other folks After that, we need to agree on the annotations Do some literature review of other datasets during the meantime and… start annotating!