obiwan.runmanager.status

Monitors an obiwan production run using qdo

Classes

QdoList(outdir[, que_name, skip_succeed, …]) Queries the qdo db and maps log files to tasks and task status
RunStatus(tasks, logs) Tallys which QDO_RESULTS actually finished, what errors occured, etc.

Functions

get_checkpoint_fn(outdir, brick, rowstart)
get_deldirs(outdir, brick, rowstart[, …]) If slurm timeout or failed, logfile will exist in final dir but other outputs will be in interm dir.
get_final_dir(outdir, brick, rowstart[, …]) Returns paths like outdir/replaceme/bri/brick/rs0
get_interm_dir(outdir, brick, rowstart[, …]) Returns paths like outdir/bri/brick/rs0
get_logdir(outdir, brick, rowstart[, …])
get_logfile(outdir, brick, rowstart[, …])
get_slurm_files(outdir)
class obiwan.runmanager.status.QdoList(outdir, que_name='obiwan', skip_succeed=False, rand_num=None, firstN=None)[source]

Queries the qdo db and maps log files to tasks and task status

Parameters:
  • outdir – obiwan outdir, the slurm*.out files are there
  • que_name – ie. qdo create que_name
  • skip_suceeded – number succeeded tasks can be very large for production runs, this slows down code so skip those tasks
change_task_state(task_ids, to=None, modify=False, rm_files=False)[source]

change qdo tasks state, for tasks with task_ids, to pending,failed, etc

Parameters:
  • to – change qdo state to this, pending,failed
  • rm_files – delete the output files for that task
  • modify – actually do the modifications (fail safe option)
get_tasks_logs()[source]

get tasks and logs for the three types of qdo status

class obiwan.runmanager.status.RunStatus(tasks, logs)[source]

Tallys which QDO_RESULTS actually finished, what errors occured, etc.

Args: tasks: dict, each key is list of qdo tasks logs: dict, each key is list of log files for each task

Defaults: regex_errs: list of regular expressions matching possible log file errors

get_logs_for_failed(regex='Other')[source]

Returns log and slurm filenames for failed tasks labeled as regex

obiwan.runmanager.status.get_deldirs(outdir, brick, rowstart, do_skipids='no', do_more='no')[source]

If slurm timeout or failed, logfile will exist in final dir but other outputs will be in interm dir. Return list of dirst to all of these

obiwan.runmanager.status.get_final_dir(outdir, brick, rowstart, do_skipids='no', do_more='no')[source]

Returns paths like outdir/replaceme/bri/brick/rs0

obiwan.runmanager.status.get_interm_dir(outdir, brick, rowstart, do_skipids='no', do_more='no')[source]

Returns paths like outdir/bri/brick/rs0