Darwin  1.10(beta)

# Class List for drwnBase

• drwnBitArray : Implements an efficient packed array of bits.
• drwnCodeProfiler : Static class for providing profile information on functions.
• drwnCommandLine : Command line processing macros. Applications should use these macros to present a consistent interface.
• drwnCompatibility : Windows/linux compatibility layer.
• drwnConfigurableModule : Interface for a configurable module.
• drwnConfigurationManager : Configuration manager.
• drwnConstants : Provides useful constants and version information.
• drwnFactory : Templated factory for creating or cloning objects for a particular base class.
• drwnFileUtils : File and directory processing utilities.
• drwnIndexQueue : Provides a queue datastructure on a fixed number of indexes. At most one copy of each index can appear in the queue (a second enqueue is ignored). Membership of the queue can be queried.
• drwnLogger : Message and error logging. This class is not thread-safe in the interest of not having to flush the log on every message.
• drwnOrderedMap : Provides a datastructure for that can be indexed by a KeyType (usually a string) or unsigned integer, i.e., the index.
• drwnProperties : Provides an abstract interface for dynamic properties.
• drwnSmartPointer : Implements a shared pointer interface to avoid the need to deep copy constant (shared) objects.
• drwnStatsUtils : Generic statistical utilities.
• drwnStdObjIface : standard Darwin object interface (cloneable and writeable)
• drwnStrUtils : Generic string utilities.
• drwnThreadPool : Implements a pool of threads for running concurrent jobs.
• drwnTriplet : Basic datatype for holding three objects of arbitrary type. Similar to the STL pair<> class.
• drwnXMLParser : Provides XML parsing functionality for serializing and deserializing objects and containers of objects.
• drwnXMLUtils : Provides utility functions for XML parsing.

# Configuration Manager

For many research projects it is useful to have a standard configuration for running experiments with only a few parameters changing from one experiment to the next. Darwin supports this through two main mechanisms—XML configuration and command line options. The general strategy is to create an XML file with the standard configuration and then provide overrides for various settings on the command line. The system is lightweight while still catering for most configuration needs. An example XML configuration file is shown below.

<drwn>
<drwnCodeProfiler enabled="false" />
<drwnLogger logLevel="message"
logFile="" />
<drwnConfusionMatrix colSep="&#9;"
rowBegin="&#9;"
rowEnd="">
<myApplication attributeName=attributevalue''/>
<myArbitraryData>
1 2 3 4
</myArbitraryData>
</myApplication>
</drwn>

The drwnConfigurationManager class handles configuration of static parameters for Darwin libraries and can be used for configuring individual applications or projects. The command line options -config and -set will automatically invoke the configuration manager. To invoke it manually you can simply call the drwnConfigurationManager::configure function.

Standard configuration parameters are defined by the triplet: module, name and value. In the XML configuration file shown above the tags "drwnCodeProfiler", "drwnConfusionMatrix", etc. define the module (i.e., configurable class) and the node's attributes define the name-value pairs. An application can define its own module (XML node) with arbitrary name-value pairs. The structure of the application-specific XML node can be arbitrary and it is up to the application developer to parse non-attribute content (such as the myArbitraryData node in the example above). The -set command line option can only be used for name-value pairs.

To register a configurable class with the Darwin Configuration Manager, an application needs to create a derived class from base drwnConfigurableModule and override the setConfiguration function. More control can be achieved by also overriding the readConfiguration function.

To register the class, the code simply needs to instantiate a global class member—the drwnConfigurableModule constructor will handle registration.

An example is the drwnMultiClassLogistic configuration module which is placed inside the drwnMultiClassLogistic.cpp file:

public:
drwnMultiClassLogisticConfig() : drwnConfigurableModule("drwnMultiClassLogistic") { }
void usage(ostream &os) const {
os << " lambda :: regularization strength (default: "
os << " maxIterations :: maximum number of training iterations (default: "
}
void setConfiguration(const char *name, const char *value) {
if (!strcmp(name, "lambda")) {
drwnMultiClassLogistic::REG_STRENGTH = std::max(0.0, atof(value));
} else if (!strcmp(name, "maxIterations")) {
drwnMultiClassLogistic::MAX_ITERATIONS = std::max(0, atoi(value));
} else {
DRWN_LOG_FATAL("unrecognized configuration option " << name << " for " << this->name());
}
}
};
static drwnMultiClassLogisticConfig gMultiClassLogisticConfig;

Use -config on the command line by itself to get online help for most configurable modules. You can also use -set by itself to get a list of registered modules, or "-set <module>" to get online help for a specific module.

Configuration Settings

# Messages, Warnings and Errors

Messages, warnings and errors are managed via the drwnLogger class. The DRWN_LOG macro will automatically write log messages to a file (if specified) and display them on the console. You can set the verbosity level to control which messages get displayed. The following macros generate messages at different verbosity levels:

 Macro Display Description DRWN_LOG_FATAL -*- An unrecoverable error has occurred and the code will terminate. DRWN_LOG_ERROR -E- A recoverable error has occurred, e.g., a missing file. DRWN_LOG_WARNING -W- Something unexpected happened, e.g., a parameter is zero. DRWN_LOG_MESSAGE — Standard messages, e.g., application-level progress information. DRWN_LOG_STATUS — Status/progress messages, e.g., image names and sizes during loading. DRWN_LOG_VERBOSE — Verbose messages, e.g., intermediate performance results. DRWN_LOG_METRICS — Metrics messages, e.g., event statistics, etc. DRWN_LOG_DEBUG -D- Debugging messages, e.g., matrix inversion results, etc.

Other useful macros include DRWN_LOG_PROGRESS_SPINNER, which is useful to show activity within a loop and DRWN_LOG_<level>_ONCE, which will only log a message once (even if the code calling the message is run multiple times). The following code gives an example of each:

#include "drwnBase.h"
int main(int argc, char *argv[]) {
// standard command line processing
DRWN_BEGIN_CMDLINE_PROCESSING(argc, argv)
DRWN_END_CMDLINE_PROCESSING();
// long running loop
for (int i = 0; i < 1000; i++) {
DRWN_LOG_VERBOSE_ONCE("inside loop");
DRWN_LOG_PROGRESS_SPINNER("processing...");
sleep(1); // on Windows use Sleep(1000) in <windows.h>
}
return 0;
}
Applications can override the message displaying functions by
registering callbacks with the \ref drwnLogger class. This is useful
for interfacing to Matlab or displaying errors in GUI dialog
boxes. The following example registers a callback for capturing errors
and terminates if too many errors occur.
\code
#include "drwnBase.h"
void errorMessageCallback(const char *msg) {
static int counter = 0;
std::cerr << msg << std::endl;
if (++counter > 5) {
std::cerr << "too many error messages" << std::endl;
exit(0);
}
}
int main(int argc, char *argv[])
{
// set drwnLogger callbacks
drwnLogger::showErrorCallback = errorMessageCallback;
for (int i = 0; i < 10; i++) {
DRWN_LOG_ERROR("error message number " << i);
}
return 0;
}

The standard command line options -quiet, -verbose, and -debug allow you to filter which messages are produced (see Command Line Processing).

## Asserts

Code written for the Darwin framework should use the DRWN_ASSERT or DRWN_ASSERT_MSG macros rather than the standard assert function to allow GUI applications and external environments, such as Matlab, to trap errors.

# Code Profiling

drwnCodeProfiler is typically used to accumulate information on entire functions (or within subroutines). Wrap the function or code block in

drwnCodeProfiler::tic(handle = getHandle("functionName"));

to accumulate timing and number of calls for a given function. The timer accumulates the amount of processor and real (wall clock) time used between tic and toc calls (child processes, such as file I/O, are not counted in this time). Processor times may be inaccurate for functions that take longer than about 1 hour.

By default profiling is turned off and must be enabled in main() with

Most Darwin applications use the standard command line option -profile to enable profiling. Call the function drwnCodeProfiler::print() before exiting main() to log profiling information.

The code can also be used to setting time limits or recursive call limits within instrumented functions. Use time and calls to get the total running time or total number of calls for a given handle.

The macros DRWN_FCN_TIC and DRWN_FCN_TOC can be used at the entry and exit of your functions to instrument the entire function. Make sure you put DRWN_FCN_TOC before all return statements within the function.

Warning
Profiling provided by drwnCodeProfiler should be used as an estimate only. Specifically it is not accurate for small functions. In those cases you are better off using gprof and the "gcc -pg" option or Microsoft's Visual C++ profiling software. Note also that instrumenting code for profiling will unavoidably add a small overhead to a function's running time, so do not use tic and toc within tight loops. Unlike compiling with -pg for gprof, functions tic and toc are always compiled into the code.

# Command Line Processing

Standard command line options for most Darwin applications are:

-help :: display application usage
-config <xml> :: configure Darwin from XML file (or without <xml> for configuration help)
-set <m> <n> <v> :: set (configuration) <m>::<n> to value <v> (or -set <m> for module help)
-profile :: profile code
-quiet :: only show warnings and errors
-verbose :: show verbose messages
-debug :: show debug messages
-log <filename> :: log filename
-randseed <n> :: seed random number generators rand and drand48

Command line options are processed from left to right. If there are multiple conflicting options, the rightmost one will be taken, e.g., "-threads 4 -threads 0" will result in multi-threading being turned off. Many Darwin features can be configured using the -config or -set commandline options. The standard options provide shortcuts for these. For example, "-verbose" is equivalent to "-set drwnLogger logLevel VERBOSE".

Applications should include a DRWN_BEGIN_CMDLINE_PROCESSING block to automatically handle these options, e.g.,

int main(int argc, char* argv[])
{
DRWN_BEGIN_CMDLINE_PROCESSING(argc, argv)
DRWN_END_CMDLINE_PROCESSING();
// application code
return 0;
}
drwnCommandLine

The threads read and execute jobs from a queue of arbitrary size. As soon as one of the N threads completes a job, it requests the next job in the queue, until the queue is empty, at which point it waits for a new job. The maximum number of threads is controlled by the MAX_THREADS static member variable. It can be set from the command line using the -threads option. Setting MAX_THREADS to zero will result in all jobs being executed in the main thread.

Derived classes should overload the operator()() member function. For example the following code creates a functor for computing the L2-norm of a vector:

class MySumSquaredJob : public drwnThreadJob {
public:
vector<double> data;
double result;
public:
MySumSquaredJob(const vector<double>& d) : data(d) { }
void operator()() {
result = 0.0;
for (unsigned i = 0; i < data.size(); i++) {
result += data[i] * data[i];
}
}
};

The lock and unlock methods should be used for controlling access to shared resources across threads (e.g., graphical output windows).

The following code demonstrates typical thread pool usage:

vector<MySumSquaredJob *> jobs; // derived from drwnThreadJob
for (int i = 0; i < numJobs; i++) {
vector<double> data = getDataForJob(i); // get the data for job i
jobs.push_back(new MySumSquaredJob(data)); // create functor for the job
}
// wait for jobs to finish (and show progress)
// extract results (and delete jobs)
for (int i = 0; i < numJobs; i++) {
cerr << "||" << toString(jobs[i]->data) << "|| = " << jobs[i]->result << endl;
delete jobs[i];
}

# XML Utilities

Darwin makes extensive use of XML formatting for serialization (saving) and de-serialization (loading) of objects. All objects derived from drwnWriteable implement methods for saving state to, and loading state from, an XML object.

A number of helper functions are provided in drwnXMLUtils. The following code snippet shows an example:

class MyObject {
public:
VectorXd x;
MyObject() { /* do nothing */ }
MyObject(const VectorXd& ix) : x(ix) { /* do nothing */ }
~MyObject() { /* do nothing */ }
void save(drwnXMLNode& xml) const { drwnXMLUtils::serialize(xml, x); }
void load(drwnXMLNode& xml) { drwnXMLUtils::deserialize(xml, x); }
};
int main()
{
// create a vector of MyObject
vector<MyObject> container;
for (int i = 0; i < 5; i++) {
container.push_back(MyObject(VectorXd::Random(5)));
}
// save container to an XML file
drwnXMLUtils::write("example.xml", "vector", "object", container);
return 0;
}

# Factories

Factories facilitate the creation of objects (derived from a specific base class) without having to know the specific object type at compile time. For example, Dawrin makes use of factories for creating or loading classifiers and feature transforms in the drwnML library.

Consider, for example, training either a decision tree classifier or a logistic regression classifier:

const int nFeatures = dataset.numFeatures();
const int nClasses = dataset.maxTarget() + 1;
// train and save the classifier
drwnDecisionTree model(nFeatures, nClasses);
model.train(dataset);
model.write("classifier.xml");
} else {
drwnMultiClassLogistic model(nFeatures, nClasses);
model.train(dataset);
model.write("classifier.xml");
}

Now in some other code we may want to use the classifier, but we don't know which classifier we trained. Factories to the rescue:

drwnClassifier *classifier = drwnClassifierFactory::get().createFromFile("classifier.xml");
DRWN_LOG_VERBOSE("loaded classifier of type " << classifier->type());
// do something with the classifier
...
// free the classifier
delete classifier;

You can create factories for your own objects using the drwnFactory template class. Here is a short example.

class myBase : public drwnWriteable {
...
};
class myDerivedA : public myBase {
// must define drwnWriteable interface
};
class myDerivedB : public myBase {
// must define drwnWriteable interface
};
template <>
struct drwnFactoryTraits<myBase> {
static void staticRegistration();
};
typedef drwnFactory<myBase> myBaseFactory;

The staticRegistration function is used to automatically tell the factory which classes belong to is. Otherwise you will need to explicitly add classes each time you use the factory with the registerClass function.

{
DRWN_FACTORY_REGISTER(myBase, myDerivedA);
DRWN_FACTORY_REGISTER(myBase, myDerivedB);
}

We can now use the factory to create objects by name or from file.

myBase *base;
base = myBaseFactory::get().create("myDerivedA");
base->write("file.xml");
delete base;
base = myBaseFactory::get().createFromFile("file.xml");
delete base;