database.client.slurm_monitoring

Mixin class for databse client with slurm monitoring related functions

class accre.database.client.slurm_monitoring.DBClientSlurmMonitoringMixin[source]

Bases: object

Functionality related to slurm monitoring

add_month_slurm_gpu_partition_data_for_group(month, year, grp, par, n_total_jobs_1, n_finished_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_gpus, gpus_rate, total_comp_time, usage_rate_per_users)[source]

Add a new row for table GROUPS_GPU_PARTITION_HISTORY_USAGE, this is for a given month and the year. The data time tag will be the year-month for the future searching.

Parameters:
  • month (int) – the month that the slurm data corresponds

  • year (int) – the year that the slurm data corresponds

  • grp (str) – the slurm group name, which must already be present in the database. All of jobs data here is related to the given group.

  • par (str) – the slurm partition name.

  • n_total_jobs_1 (int) – number of total jobs for the given time period.

  • n_finished_jobs_1 (int) – number of finished jobs for the given time period.

  • medium_waiting_time_1 (float) – slurm jobs’ medium waiting time for the given time period.

  • average_waiting_time_1 (float) – slurm jobs’ average waiting time for the given time period.

  • total_gpus (int) – total number of gpu cards used by the slurm jobs.

  • gpus_rate (float) – the gpu utilization rate.

  • total_comp_time (float) – the total computation time in hours by all of jobs.

  • usage_rate_per_users (JSON) – this json objects records the users resource usage percentage.

add_month_slurm_nongpu_partition_data_for_account(month, year, acc, par, n_total_jobs_1, n_finished_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_cores_1, fairshare_rate_1, total_comp_time, usage_rate_per_users)[source]

Add a new row for table ACCOUNTS_NON_GPU_PARTITION_HISTORY_USAGE, this is for a given month and the year. The data time tag will be the year-month for the future searching.

Parameters:
  • month (int) – the month that the slurm data corresponds

  • year (int) – the year that the slurm data corresponds

  • acc (str) – the slurm account name, which must already be present in the database.

All of jobs data here is related to the given ccount. :param str par: the slurm partition name. :param int n_total_jobs_1: number of total jobs for the given time period. :param int n_finished_jobs_1: number of finished jobs for the given time period. :param float medium_waiting_time_1: slurm jobs’ medium waiting time for the given time period. :param float average_waiting_time_1: slurm jobs’ average waiting time for the given time period. :param int total_cores_1: total number of cpu cores used by the slurm jobs. :param float fairshare_rate_1: the fairshare utilization rate. :param float total_comp_time: the total computation time in hours by all jobs. :param JSON usage_rate_per_users: this json objects records the users resource usage percentage.

add_month_slurm_nongpu_partition_data_for_group(month, year, grp, par, n_total_jobs_1, n_finished_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_cores_1, total_comp_time, usage_rate_per_users)[source]

Add a new row for table GROUPS_NON_GPU_PARTITION_HISTORY_USAGE, this is for a given month and the year. The data time tag will be the year-month for the future searching.

Parameters:
  • month (int) – the month that the slurm data corresponds

  • year (int) – the year that the slurm data corresponds

  • grp (str) – the slurm group name, which must already be present in the database. All of jobs data here is related to the given group.

  • par (str) – the slurm partition name.

  • n_total_jobs_1 (int) – number of total jobs for the given time period.

  • n_finished_jobs_1 (int) – number of finished jobs for the given time period.

  • medium_waiting_time_1 (float) – slurm jobs’ medium waiting time for the given time period.

  • average_waiting_time_1 (float) – slurm jobs’ average waiting time for the given time period.

  • total_cores_1 (int) – total number of cpu cores used by the slurm jobs.

  • total_comp_time (float) – the total computation time in hours by all jobs.

  • usage_rate_per_users (JSON) – this json objects records the users resource usage percentage.

get_slurm_partition_data_for_account(acc, par, month=None, year=None)[source]

This function is to get the row from the table ACCOUNTS_NON_GPU_PARTITION_RECENT_USAGE or ACCOUNTS_NON_GPU_PARTITION_HISTORY_USAGE, if the month and year data are provided then we will get the data from history table; otherwise we will query the data from the table with recent data.

Parameters:
  • month (int) – the month for the slurm data

  • year (int) – the year for the slurm data

  • acc (str) – the slurm account name

  • par (str) – the slurm partition name.

:return dict the corresponding row for the ACCOUNTS_PARTITION_DATA

get_slurm_partition_data_for_group(group, par, with_gpu_group, month=None, year=None)[source]

This function is to get the row from the table GROUPS_NON_GPU_PARTITION_USAGE or GROUPS_GPU_PARTITION_USAGE, for the given time tag (input month and year), as well as the given group name and the partition

Parameters:
  • month (int) – the month for the slurm data

  • year (int) – the year for the slurm data

  • group (str) – the slurm group name

  • par (str) – the slurm partition name.

  • with_gpu_group (Boolean) – whether the group is corresponding to GPU groups

:return dict the corresponding row for the slurm group tables

update_short_period_gpu_partition_data_for_group(grp, par, n_running_jobs_1, n_finished_jobs_1, n_pending_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_used_gpu, gpu_rate, total_comp_time, usage_rate_per_users)[source]

Update/Add a new row for table GROUPS_GPU_PARTITION_RECENT_USAGE, this is for the short period data.

For the table, the short period data refers the recent weeks data. In ACCRE every job can only last two weeks, so we set the short period as 2 weeks so that the input job information reflects the last two weeks job statistical data for the given account and partition.

Because each GPU group in the given partition has a limit number GPUs for job, here we are able to calculate the GPU utilization rate.

Parameters:
  • grp (str) – the slurm group name, which must already be present in the database. All of jobs data here is related to the given group.

  • par (str) – the slurm partition name.

  • n_running_jobs_1 (int) – number of running jobs for the given time period.

  • n_finished_jobs_1 (int) – number of finished jobs for the given time period.

  • n_pending_jobs_1 (int) – number of pending jobs for the given time period.

  • medium_waiting_time_1 (float) – slurm jobs’ medium waiting time for the given time period.

  • average_waiting_time_1 (float) – slurm jobs’ average waiting time for the given time period.

  • total_used_gpu (int) – total number of gpu cards used by the slurm jobs.

  • gpu_rate (float) – gpu utilization rate.

  • total_comp_time (float) – the total computation time in hours by all of jobs.

  • usage_rate_per_users (JSON) – this json objects records the users resource usage percentage.

update_short_period_nongpu_partition_data_for_account(acc, par, n_running_jobs_1, n_finished_jobs_1, n_pending_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_cores_1, fairshare_rate_1, total_comp_time, usage_rate_per_users)[source]

Update/Add a new row for table ACCOUNTS_NON_GPU_PARTITION_RECENT_USAGE, this is for the short period data.

For the table, the short period data refers the recent weeks data. In ACCRE every job can only last two weeks, so we set the short period as 2 weeks so that the input job information reflects the last two weeks job statistical data for the given account and partition.

Parameters:
  • acc (str) – the slurm account name, which must already be present in the database. All of jobs data here is related to the given ccount.

  • par (str) – the slurm partition name.

  • n_running_jobs_1 (int) – number of running jobs for the given time period.

  • n_finished_jobs_1 (int) – number of finished jobs for the given time period.

  • n_pending_jobs_1 (int) – number of pending jobs for the given time period.

  • medium_waiting_time_1 (float) – slurm jobs’ medium waiting time for the given time period.

  • average_waiting_time_1 (float) – slurm jobs’ average waiting time for the given time period.

  • total_cores_1 (int) – total number of cpu cores used by the slurm jobs.

  • fairshare_rate_1 (float) – the fairshare utilization rate.

  • total_comp_time (float) – the total computation time in hours by all of jobs.

  • usage_rate_per_users (JSON) – this json objects records the users resource usage percentage.

update_short_period_nongpu_partition_data_for_group(grp, par, n_running_jobs_1, n_finished_jobs_1, n_pending_jobs_1, medium_waiting_time_1, average_waiting_time_1, total_cores_1, total_comp_time, usage_rate_per_users)[source]

Update/Add a new row for table GROUPS_NON_GPU_PARTITION_RECENT_USAGE, this is for the short period data.

For the table, the short period data refers the recent weeks data. In ACCRE every job can only last two weeks, so we set the short period as 2 weeks so that the input job information reflects the last two weeks job statistical data for the given account and partition.

Because different groups under the same account may share the fairshare, hence here we do not calculate the fairshare utilization rate; since other groups also use the fairshare. Only the total cores usage is provided.

Parameters:
  • grp (str) – the slurm group name, which must already be present in the database. All of jobs data here is related to the given group.

  • par (str) – the slurm partition name.

  • n_running_jobs_1 (int) – number of running jobs for the given time period.

  • n_finished_jobs_1 (int) – number of finished jobs for the given time period.

  • n_pending_jobs_1 (int) – number of pending jobs for the given time period.

  • medium_waiting_time_1 (float) – slurm jobs’ medium waiting time for the given time period.

  • average_waiting_time_1 (float) – slurm jobs’ average waiting time for the given time period.

  • total_cores_1 (int) – total number of cpu cores used by the slurm jobs.

  • total_comp_time (float) – the total computation time in hours by all of jobs.

  • usage_rate_per_users (JSON) – this json objects records the users resource usage percentage.