In my previous post concerning interactive (e.g. during a
IPython session) parallelization of tasks with a MapReduce-like aproach across nodes in a cluster, I created an object which interfaces Slurm and the interactive session I’m working on, by splitting an input in pools and submitting each pool as a job that would be further processed in parallel.
Since the class was performing two distinct functions (handling jobs, splitting input in a task-dependent manner), I split it into two classes:
DivideAndSlurm - which takes care of job processing;
Task which is a meta-class for different tasks which can be parallelized this way.
Task subclasses should be created for a particular class, inheriting from the meta one, allowing little effort when writing new classes, since basically what changes between tasks is (1) the location of the script which actually will compute the output, and (2) how the output is reduced when collected, since different output objects should be reduced differently (e.g.
collections.Counter objects can be reduced by summation, but
dict ones no (although here I might get smarter and write general colllection methods for different output types).
The basic usage now looks like this:
The meta class
Task accepts args and kwargs, so inheriting sub-classes can use task-specific options.
I further included many more functions and attributes to handle tasks (
task.log attributes) faulty job executions (e.g. allowing collection of output even if some jobs would fail -
task.permissive attribute - off by default), status checking (
task.failed()) and handling tasks (
The small library is called divideAndSlurm and includes a setup.py to install.