Filesystems in ACCRE

Guidelines of database model creation

For any new file system added to ACCRE, the following things must be defined in regards to admintools and automation around the filesystem:

  1. A table in the models.py file that will describe the new filesystem. The filesystem table must contain the following columns:
    1. Identifiers that can uniquely identify a filesystem entry. For example, for gpfs, “filesystem”, “name”, and “fileset’’ define a unique entry. Similarly for panfs, it would be the combination of “bladeset” and “name”.

    2. The filesystem table must contain a foreign key to groups that are existing under ACCRE where applicable. It is not applicable in case of user quotas and system related quotas, the group should be defaulted to accre.

    3. The table should also have the following items if applicable: i. soft_quota: BigInteger ii. hard_quota: BigInteger iii. path: str iv. user_path: str v. active: bool vi. join_date: datetime

  2. A usage table with followings:
    1. soft_quota: BigInteger

    2. hard_quota: BigInteger

    3. space_used: BigInteger

    4. path: str

    5. user_path: str

    6. user, optional: str. Only applicable for paths such as /home/username, /nobackup/userspace/username

    1. last_updated, automatic: datetime. Updates itself automatically to reflect when it was recently changed

    2. join_date: datetime

Filesystems in ACCRE

The basefilesystem.py file provides the class GenericFileSystemMixin. It is meant to be a guideline for development of filesystem handlers in ACCRE.

Implementation for FutureVolume

from typing import Optional
from pydantic import BaseModel
from sqlalchemy import Column, DateTime, JSON, String, Table, MetaData, BigInteger

from accre.util import convert_byte_unit
from accre.filesystem.basefilesystem import GenericFileSystemMixin


admin_meta = MetaData()
FUTURE_VOLUME = Table('future_volume', admin_meta,
    Column('name', String, primary_key=True, nullable=False),
    Column('path', String, primary_key=True, nullable=False),
    Column('soft_quota', BigInteger, nullable=False),
    Column('hard_quota', BigInteger, nullable=False),
    Column('join_date', DateTime)
)

FUTURE_USAGE = Table('future_usage', admin_meta,
    Column('name', String, primary_key=True, nullable=False),
    Column('path', String, primary_key=True, nullable=False),
    Column('usage', BigInteger, nullable=False),
    Column('extra_info', JSON, nullable=True),
)

class FutureVolumeModel(BaseModel):
    """Future Volume Model"""
    name: str
    path: str
    soft_quota: str
    hard_quota: str

class FutureVolumeModelPartial(BaseModel):
    """Future Volume Model Partial"""
    name: Optional[str] = ""
    path: Optional[str] = ""
    soft_quota: Optional[str] = ""
    hard_quota: Optional[str] = ""

class FutureVolume(GenericFileSystemMixin):
    """
    Test future volume
    """
    database_table = FUTURE_VOLUME
    usage_table = FUTURE_USAGE
    data_model = FutureVolumeModel
    data_model_partial = FutureVolumeModelPartial
    data_holder = {}
    simlinks = {}
    deployed_simlinks = {}

    def process_fields_db(self, validated_volume_input, insert=False) -> dict:
        return_dict = {**validated_volume_input.dict()}

        if isinstance(validated_volume_input.hard_quota, str):
            return_dict["hard_quota"] = int(convert_byte_unit(validated_volume_input.hard_quota, target='B'))
        if isinstance(validated_volume_input.soft_quota, str):
            return_dict["soft_quota"] = int(convert_byte_unit(validated_volume_input.soft_quota, target='B'))

        return return_dict

    def process_usage_fields(self, validated_volume_input, usage_details: dict):
        return_dict = {
            "name": validated_volume_input.name,
            "path": validated_volume_input.path,
            "extra_info": usage_details.get("extra_info", {})
        }
        if "usage" in usage_details:
            return_dict["usage"] = usage_details.get("usage")

        if isinstance(usage_details.get("usage"), str):
            return_dict["usage"] = int(convert_byte_unit(usage_details.get("usage"), target='B'))

        return return_dict

    def get_volume_fs(self, validated_volume_input, update_usage=True, timeout=600):
        value = self.data_holder.get(validated_volume_input.name)
        if update_usage:
            self.add_or_update_usage_table(validated_volume_input, value)

        return value

    def modify_volume_fs(self, validated_volume_input):
        self.data_holder[validated_volume_input.name] = {
            **self.data_holder[validated_volume_input.name],
            **validated_volume_input.dict(),
            "usage": "1G",
        }
        return True, "", self.data_holder[validated_volume_input.name]

    def add_volume_fs(self, validated_volume_input):
        self.data_holder[validated_volume_input.name] = {
            **validated_volume_input.dict(),
            "usage": "1G",
            "extra_info": {}
        }
        return True, "", self.data_holder[validated_volume_input.name]

    def remove_volume_fs(self, volume_input):
        self.data_holder.pop(volume_input.name)
        return True, ""

    def delete_symlink(self, validated_volume_input):
        self.simlinks.pop(validated_volume_input.name)
        return True

    def create_symlink(self, validated_volume_input):
        self.simlinks[validated_volume_input.name] = True
        return True

    def deploy_symlinks(self):
        self.deployed_simlinks = {**self.simlinks}
        return True

    def get_volumes_fs(self):
        return list(self.data_holder.values())

    def get_users_usage_fs(self):
        return [
            {
                "name": "user1",
                "path": "/home/user1",
                "usage": "1G",
                "soft_quota": "1G" ,
                "hard_quota": "2G",
                "extra_info": {
                    "files_used": 10,
                    "files_soft_quota": 100,
                    "files_hard_quota": 1000,
                }
            },
            {
                "name": "user2",
                "path": "/home/user2",
                "usage": "1G",
                "soft_quota": "1G" ,
                "hard_quota": "2G",
                "extra_info": {
                    "files_used": 10,
                    "files_soft_quota": 100,
                    "files_hard_quota": 1000,
                }
            }
        ]

Mismatched volumes and groups

There are certain volumes in panfs that don’t exactly map to groups. For instance: /data/cqs maps to group h_cqs instead of cqs To mitigate such mismappings, something like this can be done:

from accre.filesystem.panfs.main import PanasasFileSystem
fs = PanasasFileSystem()
fs.mis_matched_volume_groups = {
    "/data/cqs": "h_cqs",
    "/nobackup/testing": "fe_accre_lab"
}

filesystem

Serves as the base of the filesystems.

class accre.filesystem.basefilesystem.GenericFileSystemMixin(client=None)[source]

Bases: object

Mixin class for generic file system implementations

add_new_volume(volume_input: dict, deploy_symlinks=False)[source]

This is the function that will be exposed to the public user. :param dict volume_input: input that is validated against the pydantic model using partial=False :param bool deploy_symlinks: whether to deploy symlinks or not

Raises:
  • ACCREValueError – if the volume could not be added to the filesystem

  • ValueError – if the input is not valid

Returns:

True if the volue was added to the filesystem

Return type:

bool

add_or_update_usage_table(validated_volume_input, usage_details: dict, add_or_update: Optional[str] = None)[source]

This function modifies the existing volume’s usage information in the database

Parameters:
  • validated_volume_input (PydanticModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • usage_details (dict) – Contains values to update in the usage table

  • add_or_update (None|str) – whether to ‘add’ or ‘update’ the volume in the database

Returns bool:

status of whether or not the usage table was updated.

add_or_update_volume_db(validated_volume_input, add_or_update: Optional[str] = None)[source]

This function modifies the existing volume’s usage information in the database

Parameters:
  • validated_volume_input (PydanticModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • add_or_update (None|str) – whether to ‘add’ or ‘update’ the volume in the database This flag can be used to avoid the database query to check if the volume exists or not.

Returns bool:

status of whether or not the usage table was updated.

add_volume_fs(validated_volume_input)[source]

This function adds the validated volume input to the real thing and only after that’s done to the database

Parameters:

validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

Returns:

added, error_msg tuple of added and error_msg if not added

Return type:

bool, str

Create symlinks for the volume. Prints the output of the symlinks command as a side effect

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

None

cross_check_usage(remove_from_db=False)[source]

This function will cross check the usage database with the filesystem usage and fixes the usage database entries as fit

Parameters:

remove_from_db (bool) – if True, it will remove the volumes from the database that are not in the filesystem

cross_check_volumes(remove_from_db=False)[source]

This function will cross check the database with the filesystem and fixes the database entries as fit

Parameters:

remove_from_db (bool) – if True, it will remove the volumes from the database that are not in the filesystem

Delete symlinks for the volume. Prints the output of the symlinks command as a side effect

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

None

Deploy the symlinks to the real thing

get_existing_volume(volume_input: dict, check_in: str = 'database')[source]

Checks the existing volume in the desired location and returns the volume information :param dict volume_input: input that is validated against the pydantic model using partial=True :param str check_in: the place to check the volume from. Default: database. Options: filesystem, database

Returns:

the volume information obtained from the fs or the db

Return type:

dict

get_existing_volumes(check_in: str = 'database')[source]

Checks the existing volume in the desired location and returns the volume information :param str check_in: the place to check the volume from. Default: database. Options: filesystem, database

Returns:

the volume information obtained from the fs or the db

Return type:

dict

get_model_from_dict(volume_input: dict, partial=False)[source]

Generates a pydantic model from the volume_input dictionary :param dict volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database :param bool partial: if True, only validate input is partially qualified.

Returns:

the pydantic model based on the volume_input

get_usage_db(usage_info: dict)[source]

Retrieve general information about usage information :param dict usage_info: usage information that can be used to retrieve unique item from the database

Returns:

the volume information from the database

Return type:

dict

get_usages_db()[source]

Retrieve general information about all active volumes, or optionall all volumes in the database

Returns:

the volume information from the database

Return type:

list(dict)

get_users_usage_fs()[source]

Get the usage of a userspace volume for all users from the filesystem. :param str userspace_path: The path of the userspace volume. Example: /nobackup/userspace, /home

Returns:

The usage of the userspace volume.

Return type:

dict

get_volume_db(validated_volume_input)[source]

Retrieve general information about volume :param DataModel validated_volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database

Returns:

the volume information from the database

Return type:

dict

get_volume_fs(validated_volume_input, update_usage=True, timeout=600)[source]

Check the validated_volume_input to see if it exists and return the volume’s detail from the file system This function should also update the volume in the db with the volume’s usage information. use add_or_update_volume_db()

Parameters:

validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

Returns:

the volume information from the filesystem. Must contain usage information as the key “space_used”

Return type:

dict

get_volumes_db(database=None, active=None)[source]

Retrieve general information about all active volumes, or optionall all volumes in the database :param dict|None active: if unset/None, it gets all the elements, if none it gets the ones that match :param str|None database: if unset/None, it gets all the elements, if none it gets the ones that match :returns: the volume information from the database :rtype: list(dict)

get_volumes_fs()[source]

Check the validated_volume_input to see if it exists and return the volume’s detail from the file system This function should also update the volume in the db with the volume’s usage information. use _add_or_update_volume_db()

Parameters:

validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

Returns:

the volume information from the filesystem. Must contain usage information as the key “space_used”

Return type:

list[dict]

modify_existing_volume(volume_input: dict)[source]

Modifies the existing volume :param dict volume_input: input that is validated against the pydantic model using partial=True

Raises:
  • ACCREValueError – if the volume could not be added to the filesystem

  • ValueError – if the input is not valid

Returns:

True if the volue was modified successfully

Return type:

bool

modify_volume_fs(validated_volume_input)[source]

This function modifies the existing volume first in the real thing and only after that’s done to the database

Parameters:

validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

primary_key_query(validated_volume_input, table=None)[source]

Generates a sql query from validated volume input to get the primary key SQL query

Parameters:
  • validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • table (Table) – the table to get the primary key query from

Returns:

the primary key SQL query based on the validated volume input

Return type:

SqlQuery

process_fields_db(validated_volume_input, insert=False)[source]

process the fields so that they can be inserted into the database for example: if hard_quota is 7 TB, then convert it to bytes and return that value

Parameters:
  • volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • insert (bool) – whether to process the fields for insert or update

Returns:

valid dict that can be inserted into volume database

Return type:

dict

process_usage_fields(validated_volume_input, usage_details: dict)[source]

process the fields so that: 1. usage table entry can be identified uniquely (mostly from validated_volume_input) 2. usage table can be populated based on usage_details

Parameters:
  • volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • usage_details (dict) – the usage details of the volume

Returns:

valid dict that can be inserted into the usage database

Return type:

dict

remove_usage_db(validated_usage_input: dict)[source]

Removes the volume from the database :param PydanticModel validated_usage_input: input that is validated against the pydantic model using partial=True

Returns:

True if the volue was removed successfully

Return type:

bool

remove_volume_db(validated_volume_input: dict)[source]

Removes the volume from the database :param PydanticModel volume_input: input that is validated against the pydantic model using partial=True

Returns:

True if the volue was removed successfully

Return type:

bool

remove_volume_fs(volume_input: dict)[source]

Removes the volume from the filesystem :param dict volume_input: input that is validated against the pydantic model using partial=True

Returns:

True if the volue was removed successfully

Return type:

bool

validate_volume_input(volume_input: dict, partial: bool = False, identifiable: bool = False)[source]

This function must validate volume input. Assumes identifiable is automatically partial. :param dict validated_volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database :param bool partial: if True, only validate input is partially qualified. :param bool identifiable: if True, validates that the input can atleast be used to identify a record in the database.

Raises:

ValueError – if the input is not valid with the values that are bad

Returns:

the primary key SQL query based on the validated volume input

Return type:

data_model

filesystems.panfs

Main module for the panfs functionality

class accre.filesystem.panfs.main.PanasasFileSystem(client=None)[source]

Bases: GenericFileSystemMixin

FileSystem client for PanFS

add_volume_fs(validated_volume_input: PanasasVolume)[source]

Add a volume to the file system.

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

The success of the operation, the message, and the updated volume info.

Return type:

bool, str, dict

Create symlinks for the volume.

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

Whether the symlink was created successfully

Return type:

bool

data_model

alias of PanasasVolume

data_model_partial

alias of PanasasVolumePartial

Delete symlinks for the volume. Prints the output of the symlinks command as a side effect

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

None

Deploy symlinks from the database.

get_users_usage_fs()[source]

Get the usage of a userspace volume for all users from the filesystem.

Parameters:

userspace_path (str) – The path of the userspace volume. Example, /nobackup/userspace, /home

Returns:

The usage of the userspace volume. Example:

"name": usage["name"],
"bladeset": fs_volume["bladeset"],
"soft_quota" (float): usage["space_soft_quota"],
"hard_quota" (float): usage["space_hard_quota"],
"space_used" (int B): usage["space_used"],
"path": f"{DEFAULT_MOUNT_POINT}/{fs_volume['name']}",
"user_path": fs_volume["name"],
"extra_info": {
    "files_used" (int): usage["files_used"],
    "files_soft_quota" (float): usage["files_soft_quota"],
    "files_hard_quota" (float): usage["files_hard_quota"],
}
Return type:

dict

get_volume_fs(validated_volume_input, update_usage=False, timeout=600)[source]

Get the details of a volume from the file system.

Parameters:
  • validated_volume_input (PanasasVolumePartial) – The validated input for the volume.

  • update_usage (bool) – Whether to update the usage table. Default is False.

  • timeout (int) – The maximum time in seconds to wait for the command to complete.

Returns:

The output of the command. Raises exception if no volume is found in the filesystem.

get_volumes_fs()[source]

This function will get the volumes from the filesystem If volume_input is None, it will get all the volumes in the filesystem

Returns:

The list of volumes in the filesystem as a list of dicts. Dict is of format:

name[str]: The name of the volume
bladeset[str]: The bladeset of the volume
group[str]: The group that volume belongs to, default "accre" if no group association found
soft_quota[bigint]: soft quota in bytes
hard_quota[bigint]: hard quota in bytes
space_used[bigint]: space used in bytes
user_path[str]: Path as seen by the user
path: "{DEFAULT_MOUNT_POINT}/ user_path", path as seen by the system
user[str] (optional): The user of the volume, if it is a userspace volume
Return type:

list[dict]

static modify_hard_quota(volume_name, new_hard_quota, timeout=600)[source]

Modify the hard quota of a volume.

Parameters:
  • volume_name (str) – The name of the volume.

  • new_hard_quota (int) – The new hard quota in GB.

  • timeout (int) – The maximum time in seconds to wait for the command to complete.

Returns:

The output of the command.

static modify_soft_quota(volume_name, new_soft_quota, timeout=600)[source]

Modify the soft quota of a volume.

Parameters:
  • volume_name (str) – The name of the volume.

  • new_soft_quota (int) – The new soft quota in GB.

  • timeout (int) – The maximum time in seconds to wait for the command to complete.

Returns:

The output of the command.

modify_volume_fs(validated_volume_input: PanasasVolumePartial)[source]

Modify a volume in the file system.

Parameters:

validated_volume_input (PanasasVolumePartial) – The validated input for the volume.

Returns bool, str, dict:

The success of the operation, the message, and the updated volume info.

process_fields_db(validated_volume_input, insert=False) dict[source]

Process the fields so that they can be inserted into the database for example: if hard_quota is 7 TB, then convert it to bytes and return that value

Parameters:
  • validated_volume_input (PanasasVolume) – The validated input for the volume.

  • insert (bool) – If True, the input is for an insert operation. If False, the input is for an update operation.

Returns:

valid dict that can be inserted into volume database

Return type:

dict

process_usage_fields(validated_volume_input: PanasasVolume, usage_details: dict)[source]

process the fields so that: 1. usage table entry can be identified uniquely (mostly from validated_volume_input) 2. usage table can be populated based on usage_details

Parameters:
  • volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database

  • usage_details (dict) – the usage details of the volume

Returns:

valid dict that can be inserted into the usage database

Return type:

dict

remove_volume_fs(volume_input: PanasasVolume)[source]

Remove a volume from the file system.

Parameters:

validated_volume_input (PanasasVolume) – The validated input for the volume.

Returns:

The success of the operation, the message, and the updated volume info.

Return type:

bool, str, dict

filesystems.utils

Contains utility functions for the PanFS filesystem.

accre.filesystem.panfs.utils.get_group_from_volume_name(volume_name, volume_group_mapping=None)[source]

Get the group from the volume name.

Parameters:
  • volume_name (str) – The name of the volume.

  • volume_group_mapping (dict) – A dictionary of volume name to group name. If not present, group name is derived from volume name.

Returns:

The group of the volume.

Return type:

str

accre.filesystem.panfs.utils.get_key_value_from_raw(text)[source]

Parses the raw text from panFS command to get key and values in a dictionary. Returns None if the text says NO_VOLUME_FOUND_ERROR_MSG

Parameters:

text – The raw text from the panfs command.

Returns:

A dictionary of key and values or None if the volume doesn’t exist. Example Output:

{
    "name: '/data/groupname',
    "state": 'Online (No Errors)',
    "bladeset: 'Set 1',
    "raid: '...',
    "recovery_priority: '50',
    "extended_file_system_availability_mode: 'retry',
    "user_and_group_quotas_policy: 'Enforced (inherited from system)',
    "soft_quota: '4.00 TB',
    "hard_quota: '5.00 TB',
    "space_used: '3.51 TB',
    "space_available: '1.02 PB'
}
accre.filesystem.panfs.utils.parse_panfs_userquota_output(stdout: str, volume_prefix: str = '/nobackup', testing_mode: bool = False)[source]

Parses the output of the panfs userquota command. :param str volume_prefix: The prefix of the volume to filter the output. :param bool testing_mode: If True, this function will run in testing mode :param str stdout: The standard output of the panfs userquota command. Example:

Volume              Unix         Windows     Bytes    % Bytes Used    Files    % Files Used
                    User         User         Used    Soft    Hard     Used    Soft    Hard
                                            Total   Quota   Quota    Total   Quota   Quota
/nobackup/userspace root                         0       0       0      7.0       0       0
/nobackup/userspace uid:5007             130.90 GB   65.45   52.36 110.68 K    3.69    3.16
Returns:

The parsed output.

Example Output:
  {
    "name": volume_prefix,
    "user": username,
    "space_used" (int B): space_used,
    "space_soft_quota" (float): space_soft_quota,
    "space_hard_quota" (float): space_hard_quota,
    "files_used" (int): files_used,
    "files_soft_quota" (float): files_soft_quota,
    "files_hard_quota" (float): files_hard_quota,
}
Return type:

list[dict]

accre.filesystem.panfs.utils.parse_volume_list(stdout)[source]

Parses the volume_list output. Note: This only works for command “volume list show all”

Parameters:

stdout (str) – The standard output of the volume_list command.

Returns:

The parsed volume list.

returns {
    name: str,
    bladeset: str,
    raid: str,
    space_used: str,
    soft_quota: str,
    hard_quota: str,
    status: str
}
Return type:

list[dict]

accre.filesystem.panfs.utils.remove_slash(string)[source]

Remove the first and the last slash from a string. :param str string: The string to remove the slash from. :returns: The string with the first slash removed.

accre.filesystem.panfs.utils.run_panfs_command(arglist, server='admin@accrepfs.vampire', timeout=60)[source]

Runs a Panfs command as a subprocess and returns the standard output as a string decoded from utf-8. Optionally ssh out to a configured GPFS node and use a timeout.

Parameters:
  • arglist (list(str)) – list of arguments to run

  • server (str) – ssh value for the panfs addrress

  • timeout (int) – Maximum time in seconds to wait for command to complete

Returns:

Standard output of command interpreted as utf-8 text

Return type:

str