monitor_checks.auditor

These “checks” are generally long running audits to be performed by the ACCRE auditor. They will not be available in the accre-monitor command unless an “auditor” field is set to true.

For an overview of the monitoring check framework see accre.monitor.

accre.monitor_checks.auditor.auditor_accre_active(opts)[source]

Ensure that all active users in ACCRE LDAP (defined by login shell) are considered active in the ACCRE database, and that all users that are active in the ACCRE database are active users in LDAP.

accre.monitor_checks.auditor.auditor_accre_group_membership(opts)[source]

Ensure that all active LDAP users have primary and secondary groups matching the ACCRE database

accre.monitor_checks.auditor.auditor_accre_groups(opts)[source]

Ensure that all groups in ACCRE LDAP have a corresponding group in the database with the same GID, and that all active database groups are in LDAP.

accre.monitor_checks.auditor.auditor_compute_node_checkin(opts)[source]

Ensure that all compute nodes responding to the SLURM scheduler have checked in with configuration management in the last 24 hours, and report a list of all dead nodes.

accre.monitor_checks.auditor.auditor_compute_node_kernels(opts)[source]

Determine all the kernel versions on the compute nodes that have checked in (over last week) and report.

accre.monitor_checks.auditor.auditor_compute_node_nagios(opts)[source]

Ensure that all compute nodes responding to the SLURM scheduler are currently being monitored in nagios.

accre.monitor_checks.auditor.auditor_gfps_license_count(opts)[source]

Determine the number of GPFS client and server licenses in use and compare to the configured limits.

accre.monitor_checks.auditor.auditor_gpfs_fileset_limits(opts)[source]

Ensure all GPFS filesets have the correct limits in GPFS as determined by the usage record and that the usage records have been updated in the last 48 hours

Ensure all active GPFS filesets in the database exist in GPFS and are linked correctly as specified in the database, and that no other GPFS filesets exist.

accre.monitor_checks.auditor.auditor_hello(opts)[source]

Declare yourself as the auditor.

There should be only one auditor node. This will show the hostname, internal IP, and RSA host key of the node.

accre.monitor_checks.auditor.auditor_scheduler_acc_associations(opts)[source]

Ensure that all accelerated scheduler associations in slurm match those as determined by the database user, group, and partition records.

accre.monitor_checks.auditor.auditor_scheduler_accounts(opts)[source]

Ensure that all accounts in the databaes have a corresponding account in the slurm database with the correct properties

accre.monitor_checks.auditor.auditor_scheduler_associations(opts)[source]

Ensure that all scheduler associations in slurm match those as determined by the database user, group, and partition records. Note that this check does not look at accelerated partition associations.

accre.monitor_checks.auditor.auditor_scheduler_default_groups(opts)[source]

Ensure that all active users have a default group in the scheduler that matches their primary group in the database if that primary group is a scheduler group.

accre.monitor_checks.auditor.auditor_scheduler_groups(opts)[source]

Ensure that all scheduler groups have a corresponding group in the slurm database with the correct properties

accre.monitor_checks.auditor.auditor_sync_vuds_accre_users(opts)[source]

Ensure all active users and PIs that are not robot accounts are in the VUIT VUDS ACCRE_Users group, make any required repairs to the group and report.

Users with legacy VUMC IDs will not be in VUDS. A text file with a list of email addresses for these users will be created and saved to the file /data/accre/accre_emails_non_vuds.txt on GPFS.

accre.monitor_checks.auditor.auditor_sync_zimbra_announce_dl(opts)[source]

Ensure all active users and PIs are in the zimbra announce_dl distribution list, make any required repairs to the list and report.

accre.monitor_checks.auditor.auditor_vandy_active(opts)[source]

Ensure that all active users in ACCRE LDAP (defined by login shell) are considered active VUnetIDs in Vanderbilt LDAP.

Notice - 2021/01 - Eric - Currently this needs cleanup and has for over a year so we will not send WARNING/CRITICAL for now even if there are problems.