Jandy

Jandy is a program for managing multiple runs of scientific software. It facilitates running a program a number of times, perhaps with different combinations of input parameters, and keeps track of the outputs produced from each run in a database. Runs may be computed in parallel on a cluster. Jandy can produce some plots from the collection of inputs and outputs.

Jandy is specifically designed to manage Java programs; but programs written in other languages can also be managed by writing a simple wrapper.

In the case of Java programs, Jandy solves the problem of distributing configuration parameters throughout a complex application, through the use of runtime parameter injection.

Heirarchical structure

The central concept in Jandy is the ParameterSet, which is simply a list of key-value pairs. ParameterSets are related to one another in a tree; child ParameterSets automatically inherit all the values from their parents, unless these are overridden.

Initially you will create new ParameterSets to describe the inputs to your computation. Once the computation is complete, the outputs will also be represented as ParameterSets, residing below the input ParameterSet in the tree. Some computations may produce multiple subsidiary output sets as children of the main one. This all seems tricksy, but turns out to be very convenient as you'll see below.

So the upshot is: all program inputs and outputs are stored as key-value pairs attached to nodes of a tree.

Set up a batch of jobs to run

  1. Create a new ParameterSet.
  2. Choose the computation you want to do from the Context menu. The input parameters allowed for that Context are automatically displayed.
  3. Enter the values for the parameters you want to run. To perform a parameter sweep, enter the values you want to use, separated by commas. If several different parameters have multiple values, Jandy will perform one run for each possible combination of those parameters. The total number of runs requested is shown at the bottom.
  4. Save the ParameterSet, if you want to just leave it there for later editing.
  5. To actually request the computation, Enqueue the ParameterSet. This has several effects:
    1. If the set includes parameter sweeps, a new child ParameterSet is generated for each possible combination.
    2. Each resulting ParameterSet is placed is the run queue.
    3. For each input ParameterSet, a new output ParameterSet is created as a child. It's initially empty, of course; it's just a container where the outputs will be written when the computation is performed.
    4. ParameterSets that are enqueued can no longer be edited.

Parameter types

Parameters may be strings, integers, doubles, or booleans.

A list: 1,2,3,4,5 (for populating a field that is in fact a list or array)

Plugins

The program to be run is provided to Jandy as a plugin. In addition to its configurable parameters of basic types, it may have parameters that are themselves plugins. In this case, the parameters of the sub-plugin are requested as well, and of its sub-sub-plugins, etc.

Parameter Ranges

A set: {1,2,3,4,5} (this will be exploded)

Ranges of plugins are currently not allowed, since the UI for that would be confusing. You can however configure a ParameterSet with one plugin, then copy it and change the plugin.

(docs pending for other range types)

Run jobs

To actually run the jobs you requested, run Jandy with no options (to run a single job from the queue and quit) or with the -all option, to continue running jobs indefinitely. You can run as many Jandy instances on different machines as you like; they will all connect to the central database to retrieve job requests and write back the results.

Note that the main program you want to run, and any subsidiary plugins, must be available in the classpath on the client.

Filtering and Plotting

Once you've done a bunch of computations, you can see results in the form of scatterplots. Each point on the plot represents one output set, i.e., a leaf of the ParameterSet tree. A single run of your program may produce many distinct output sets (for instance, your program may perform an all-vs-all comparison of some set of items, in which case each pairwise comparison may produce an output set). Each such output set inherits all the key-value pairs from above it in the tree, including the input parameters that gave rise to those outputs.

You can filter the output sets to be plotted by specifying allowable values for each parameter, again using comma-separated lists of values.

For the resulting set of points, you can plot any variable against any other variable: inputs against inputs (i.e., to visualize what regions of the parameter space you've explored), inputs against outputs, and outputs against outputs. At this stage they're all just key-value pairs in the tree; the plotting code doesn't know the difference.

The plot axes can be made logarithmic, and gaussian noise can be added to points on each axis with a given standard deviation. This helps to visualize clouds of points that would otherwise overlap because they have the same value on one axis.

In addition to specifying the variables to use for the X and Y axes, you can select any number of variables to use for grouping points together. Each unique combination of values for these variables is considered a group. All points in a group will have a distinct color; and if connecting lines are requested, then each group will be connected independently.

The scatterplot can be normalized in one or both dimensions (the 2d histogram story)

Preparing your Java program for use with Jandy

A top-level Jandy plugin must extend com.davidsoergel.runutils.ResultsCollectingProgramRun, which mostly means that it implements a run() method (analogous to main()), and must be annotated with @PropertyConsumer(isprogram = true).

In order for parameter values to be injected into instances of other classes (i.e., subsidiary plugins), the class must be annotated with @PropertyConsumer.

The fields to be injected must be annotated with @Property.

Subsidiary plugins are represented simply as fields of some interface type. These fields must also be annotated with @Property, of course. The plugins themselves implement the interface, and themselves may or may not be annotated with @PropertyConsumer.

If the class wishes to instantiate multiple instances of the subsidiary plugin (as opposed to having a single instance injected), it must have a field of type GenericFactory<MyPluginInterfaceType>. This field is annotated with @Property like any other. Configured instances of the plugin can then be created via the create() method on the factory. The "new" operator should not be used in this case, since the resulting instances will not be configured (and anyhow the class has no way of knowing which concrete type should be instantiated).

Nothing additional is required; these simple annotations marking configurable fields are sufficient for Jandy to build the UI and to populate the fields as needed at runtime.

Writing a new plotting plugin