Darwin  1.10(beta)
 All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Friends Groups Pages
Multi-class Image Segmentation

This project implements conditional Markov random field (CRF) models for multi-class image segmentation (also known as pixel labeling). The following documentation describes the pipeline for learning and evaluating a multi-class image segmentation model. The following instructions are general but give examples using the 21-class MSRC image segmentation dataset.

The 21-class MSRC dataset can be downloaded from http://research.microsoft.com/en-us/projects/ObjectClassRecognition/. It contains 23 object classes, but two of the object classes appear rarely and are usually ignored. An additional void class marks regions to be ignored during training and test.
See Also
drwnSegImageInstance, drwnPixelSegModel, drwnMultiSegConfig

Preparing the Training Data

Multi-class image segmentation requires labeled training data. Each training instance consists of an image and an integer matrix the same size as the image. The matrix entries indicate the class label for each pixel. These matrices can be stored as space-delimited text files (in row-major order) or 16-bit PNG files. The convertPixelLabels application can be used to generate the right file format from colour-annotated images (see below).

Images and label files should have the same basename (e.g., img001.jpg and img001.txt) and may be stored in the same or different directories. Training and evaluation lists are described in terms of basenames, e.g., for the 21-class MSRC dataset we would have

The shell script prepareMSRCDemo.sh or Python script prepareMSRCDemo.py in the project directory will download and prepare the data for exprimenting with the 21-class MSRC dataset. The shell script assumes that you have utilities wget, unzip, and convert installed on your system.

Converting Pixel Labels

A standard method of annotating images for multi-class pixel labeling is to paint each pixel with a colour corresponding to a particular class label. For example, the 21-class MSRC dataset uses red to indicate building and blue to indicate cow. Using the XML configration file (see Configuration) the convertPixelLabels application can convert these colour images into a format recognized by the other application in this project—specifically, space-delimited text files. The application expects the colour images to be in the labels directory and will write to the same directory. E.g.,

${BIN_DIR}/convertPixelLabels -config $CONFIG -i "_GT.bmp" $ALL_LIST

where $ALL_LIST should be replaced with the filename of a file containing the basenames of both training and evaluation images as described above.


The multi-class image segmentation pipeline requires a number of configuration settings to be defined so that it can find the training images, labels, etc. It also needs to know the number of class labels and how to visualize them. The following XML configuration shows an example configuration for the MSRC dataset.

<!-- data options -->
<option name="baseDir" value="./" />
<option name="imgDir" value="data/images/" />
<option name="lblDir" value="data/labels/" />
<option name="segDir" value="data/regions/" />
<option name="cacheDir" value="cached/" />
<option name="modelsDir" value="models/" />
<option name="outputDir" value="output/" />
<option name="imgExt" value=".jpg" />
<option name="lblExt" value=".txt" />
<option name="segExt" value=".sp" />
<option name="useCache" value="true" />
<!-- region definitions -->
<region id="-1" name="void" color="0 0 0"/>
<region id="0" name="building" color="128 0 0"/>
<region id="1" name="grass" color="0 128 0"/>
<region id="2" name="tree" color="128 128 0"/>
<region id="3" name="cow" color="0 0 128"/>
<region id="4" name="sheep" color="0 128 128"/>
<region id="5" name="sky" color="128 128 128"/>
<region id="6" name="airplane" color="192 0 0"/>
<region id="7" name="water" color="64 128 0"/>
<region id="8" name="face" color="192 128 0"/>
<region id="9" name="car" color="64 0 128"/>
<region id="10" name="bicycle" color="192 0 128"/>
<region id="11" name="flower" color="64 128 128"/>
<region id="12" name="sign" color="192 128 128"/>
<region id="13" name="bird" color="0 64 0"/>
<region id="14" name="book" color="128 64 0"/>
<region id="15" name="chair" color="0 192 0"/>
<region id="16" name="road" color="128 64 128"/>
<region id="17" name="cat" color="0 192 128"/>
<region id="18" name="dog" color="128 192 128"/>
<region id="19" name="body" color="64 64 0"/>
<region id="20" name="boat" color="192 64 0"/>
<!-- feature options -->
<option name="filterBandwidth" value="1" />
<option name="featureGridSpacing" value="5" />
<option name="includeRGB" value="true" />
<option name="includeHOG" value="true" />
<option name="includeLBP" value="true" />
<option name="includeRowCol" value="true" />
<option name="includeLocation" value="true" />
<drwnCodeProfiler enabled="true" />
<drwnLogger logLevel="VERBOSE"
logFile="msrc.log" />
<drwnThreadPool threads="4" />
<drwnConfusionMatrix colSep=" || " rowBegin=" || " rowEnd=" \" />
<drwnHOGFeatures blockSize="1" normClippingLB="0.1" normClippingUB="0.5" />

The directories specified in the configuration file are relative to baseDir. Output directories modelsDir and outputDir must exist before running the learning and inference code. If feature caching is being used (useCache set to true) then cacheDir must also exist. Different image formats are supported. For example, if you have kept your MSRC images are in bmp format then you can change the imgExt option from ".jpg" to ".bmp".

See Also
Configuration Manager

Learning Unary Potentials

The unary potentials encode a single pixel's preference for each label and is the heart of the model. The unary potentials are learned in two stages. The first stage learns a one-versus-all boosted decision tree classifier for each of the labels. The key features used for this stage are derived from a bank of 17 filters which are run over the image. In addition, we include the RGB color of the pixel, dense HOG features, LBP-like features, and averages over image rows and columns. These features can all be controlled within the drwnSegImagePixelFeatures section of the configuration file. Custom features can even be included via auxiliary feature settings (see drwnSegImageStdPixelFeatures for details).

The second stage calibrates the output of the boosted decision trees via a multi-class logistic regression classifier. These steps are performed by the following commands. One of the most important commandl-line argument is -subSample, which determines how many pixels are used, and hence the amount of memory required, during training. Specifically, "-subSample n^2" randomly samples 1 pixel out of every n-by-n pixel grid. With the settings below, the unary potentials can be trained using under 4GB of memory.

# train boosted classifiers
${BIN_DIR}/learnPixelSegModel -config $CONFIG -component BOOSTED \
-set drwnDecisionTree split MISCLASS \
-set drwnBoostedClassifier numRounds 200 \
-subSample 250 $TRAIN_LIST
# train unary potentials
${BIN_DIR}/learnPixelSegModel -config $CONFIG -component UNARY \
-subSample 25 $TRAIN_LIST

Evaluating Unary Potentials

We can evaluate the learned model on some test images using the following commands.

# evaluate with unary terms only
${BIN_DIR}/inferPixelLabels -config $CONFIG -pairwise 0.0 -longrange 0.0 \
-outLabels .unary.txt -outImages .unary.png $TEST_LIST
# score results
${BIN_DIR}/scorePixelLabels -config $CONFIG \
-inLabels .unary.txt $TEST_LIST

Images visualizing the results are written to the output directory specified in the configuration file.

Learning Pairwise Potentials

The pairwise term encodes a contrast-dependent smoothness prior on the image labeling. The weight of the term is learned by direct search, i.e., a number of parameter values are tried and the one that gives the best results on a subset of training images is kept. The following commandline will perform this step.

# train pairwise potentials
${BIN_DIR}/learnPixelSegModel -config $CONFIG -component CONTRAST $VAL_LIST

Note that $VAL_LIST and $TRAIN_LIST can be the same list, however, the code will only use up to 100 images from the list to learn the contrast weight.


In addition to the contrast-dependent smoothness pairwise terms which are defined on a local neighbourhood raound each pixel, we can add long-range pairwise terms to encourage consistent labeling across the image.

The long-range edges are determined by finding similar pairs of patches within the image. A similar approach is taken in Gould, CVPR 2012. Like the contrast-dependent pairwise terms, the strength of the long-range edge constraints is determined by cross-validation on a subset of the training images as,

# train pairwise potentials
${BIN_DIR}/learnPixelSegModel -config $CONFIG -component LONGRANGE $VAL_LIST


The final step in the pipeline evaluates the model on some test images and reports average performance.

# evaluate with unary and pairwise terms
${BIN_DIR}/inferPixelLabels -config $CONFIG -longtrange 0.0 \
-outLabels .pairwise.txt -outImages .pairwise.png $TEST_LIST
# score results
${BIN_DIR}/scorePixelLabels -config $CONFIG -confusion \
-inLabels .pairwise.txt $TEST_LIST
# evaluate with unary and pairwise and long range terms
${BIN_DIR}/inferPixelLabels -config $CONFIG \
-outLabels .longrange.txt -outImages .longrange.png $TEST_LIST
# score results
${BIN_DIR}/scorePixelLabels -config $CONFIG -confusion \
-inLabels .longrange.txt $TEST_LIST

Unlike the unary-only evaluation, in this example we also generate a full confusion matrix (by giving the -confusion option) to the scorePixelLabels application.