Note: This tutorial uses CDN-Mlr 073.

Document Analysis

Image Layering

OMR requires the complicated mixture of information in a manuscript image to be separated in specific layers (e.g., staff lines, background, text, music symbols). Manual layer separation can be performed using Pixel.js or a desktop program like Pixelmator, however this process is time consuming. Layer separation can be sped up if you have a model already by preparing layers with a model already trained on a simliar manuscript, allowing the user to make corrections to pre-classified pixels instead of starting from scratch. For this tutorial with CDN-Mlr 073, models from the Salzinnes manuscript (CDN-Hsmu M2149.L40) can be used.

In this example we are using pages 40, 230, and 176 from CDN-Mlr 073. The layers of these three images were created using Pixelmator, exported as RGBA PNG images with parts of the image not in each layer made transparent, and then loaded into Pixel.js to create the background and selected regions layers. The layers from Pixel.js were exported as masks for model training in Rodan by setting the generate masks setting to “true”.

The files for these three pages were then combined using Image Magick to make one image and its layers, necessary for the Patchwise Trainer. The only way to train on multiple pages is to combine them all into a single image and do the same for each type of layer. The more layered pages that are provided to the Patchwise Trainer, the more data it has to train on to segment the document into the background, music symbols, staff lines, and text layers.

To combine the images (and layers), we used the software Image Magick. The dimensions of the files differed slightly, so background values to fill gaps needed to be specified to the software. The full images were taken with a black background, and so “black” was used as the background color for gaps when those images were combined. This can be seen in the next code snippet, where the images of three folios are combined into a single file (40-230-176.png).

convert -background black _A_15th_century_Italian_antiphonal___manuscript__0_40_0.jpg \
_A_15th_century_Italian_antiphonal___manuscript__0_230_0.jpg \
_A_15th_century_Italian_antiphonal___manuscript__0_176_0.jpg -append 40-230-176.jpg

For the masks, the keyword “none” was used to fill any gaps with transparent pixels. This is shown in the next code snippet for combining the text layers of three folios into one file (40-230-176-Text.png).

convert -background none _A_15th_century_Italian_antiphonal___manuscript__0_40_Text.png \
_A_15th_century_Italian_antiphonal___manuscript__0_230_Text.png \
_A_15th_century_Italian_antiphonal___manuscript__0_176_Text.png -append 40-230-176-Text.png

After experimenting with the training, it was determined the images needed to be resized. This is because the models generated by the Patchwise Trainer yield better results with lower dimensions (e.g., 4938x6380 px). The images were resized to 52.2% of the original size by using the following command:

convert 40-230-176.jpg -resize 52.2% 40-230-176-Resized.jpg

These combined and resized files were uploaded back to Rodan for the next step. To upload resources to Rodan, create or enter a project, click the “Resources” button, and then click “Upload Resource(s)” to select resources to upload.

Model Training

Once the images and layers to use have been combined, the Patchwise Trainer can be used to generate new models. For larger images (especially when combined), it may be necessary to use the HPC (High Perfomance Computing) job which dispatches the file to the Cedar cluster of Compute Canada.

We will use this for our example with the following settings:

Maximum Time: 0-12:00
Maximum number of samples per label: 20000
CPUs: 6
Maximum number of training epochs: 15
Maximum memory (MB): 250000
Patch width: 256
Patch height: 256

Layer Extraction of Another Page

After testing that the models work on pages that they were trained on (e.g., page 176) it is time to use them on another page from the same manuscript. Here, page 123 of CDN-Mlr 073 is used. The Fast Pixelwise Classifier job is run on the page scaled down to 52.2% of the original size with the default settings. The reduction of page dimensions is necessary to obtain the best results since the models used by the classifier were trained on the same smaller dimensions. Three of the layers generated — music symbols, staff lines, and text — are used in future steps.