Filesystems in ACCRE¶
Guidelines of database model creation¶
For any new file system added to ACCRE, the following things must be defined in regards to admintools and automation around the filesystem:
- A table in the models.py file that will describe the new filesystem. The filesystem table must contain the following columns:
Identifiers that can uniquely identify a filesystem entry. For example, for gpfs, “filesystem”, “name”, and “fileset’’ define a unique entry. Similarly for panfs, it would be the combination of “bladeset” and “name”.
The filesystem table must contain a foreign key to groups that are existing under ACCRE where applicable. It is not applicable in case of user quotas and system related quotas, the group should be defaulted to accre.
The table should also have the following items if applicable: i. soft_quota: BigInteger ii. hard_quota: BigInteger iii. path: str iv. user_path: str v. active: bool vi. join_date: datetime
- A usage table with followings:
soft_quota: BigInteger
hard_quota: BigInteger
space_used: BigInteger
path: str
user_path: str
user, optional: str. Only applicable for paths such as /home/username, /nobackup/userspace/username
last_updated, automatic: datetime. Updates itself automatically to reflect when it was recently changed
join_date: datetime
Filesystems in ACCRE¶
The basefilesystem.py file provides the class GenericFileSystemMixin. It is meant to be a guideline for development of filesystem handlers in ACCRE.
Implementation for FutureVolume¶
from typing import Optional
from pydantic import BaseModel
from sqlalchemy import Column, DateTime, JSON, String, Table, MetaData, BigInteger
from accre.util import convert_byte_unit
from accre.filesystem.basefilesystem import GenericFileSystemMixin
admin_meta = MetaData()
FUTURE_VOLUME = Table('future_volume', admin_meta,
Column('name', String, primary_key=True, nullable=False),
Column('path', String, primary_key=True, nullable=False),
Column('soft_quota', BigInteger, nullable=False),
Column('hard_quota', BigInteger, nullable=False),
Column('join_date', DateTime)
)
FUTURE_USAGE = Table('future_usage', admin_meta,
Column('name', String, primary_key=True, nullable=False),
Column('path', String, primary_key=True, nullable=False),
Column('usage', BigInteger, nullable=False),
Column('extra_info', JSON, nullable=True),
)
class FutureVolumeModel(BaseModel):
"""Future Volume Model"""
name: str
path: str
soft_quota: str
hard_quota: str
class FutureVolumeModelPartial(BaseModel):
"""Future Volume Model Partial"""
name: Optional[str] = ""
path: Optional[str] = ""
soft_quota: Optional[str] = ""
hard_quota: Optional[str] = ""
class FutureVolume(GenericFileSystemMixin):
"""
Test future volume
"""
database_table = FUTURE_VOLUME
usage_table = FUTURE_USAGE
data_model = FutureVolumeModel
data_model_partial = FutureVolumeModelPartial
data_holder = {}
simlinks = {}
deployed_simlinks = {}
def process_fields_db(self, validated_volume_input, insert=False) -> dict:
return_dict = {**validated_volume_input.dict()}
if isinstance(validated_volume_input.hard_quota, str):
return_dict["hard_quota"] = int(convert_byte_unit(validated_volume_input.hard_quota, target='B'))
if isinstance(validated_volume_input.soft_quota, str):
return_dict["soft_quota"] = int(convert_byte_unit(validated_volume_input.soft_quota, target='B'))
return return_dict
def process_usage_fields(self, validated_volume_input, usage_details: dict):
return_dict = {
"name": validated_volume_input.name,
"path": validated_volume_input.path,
"extra_info": usage_details.get("extra_info", {})
}
if "usage" in usage_details:
return_dict["usage"] = usage_details.get("usage")
if isinstance(usage_details.get("usage"), str):
return_dict["usage"] = int(convert_byte_unit(usage_details.get("usage"), target='B'))
return return_dict
def get_volume_fs(self, validated_volume_input, update_usage=True, timeout=600):
value = self.data_holder.get(validated_volume_input.name)
if update_usage:
self.add_or_update_usage_table(validated_volume_input, value)
return value
def modify_volume_fs(self, validated_volume_input):
self.data_holder[validated_volume_input.name] = {
**self.data_holder[validated_volume_input.name],
**validated_volume_input.dict(),
"usage": "1G",
}
return True, "", self.data_holder[validated_volume_input.name]
def add_volume_fs(self, validated_volume_input):
self.data_holder[validated_volume_input.name] = {
**validated_volume_input.dict(),
"usage": "1G",
"extra_info": {}
}
return True, "", self.data_holder[validated_volume_input.name]
def remove_volume_fs(self, volume_input):
self.data_holder.pop(volume_input.name)
return True, ""
def delete_symlink(self, validated_volume_input):
self.simlinks.pop(validated_volume_input.name)
return True
def create_symlink(self, validated_volume_input):
self.simlinks[validated_volume_input.name] = True
return True
def deploy_symlinks(self):
self.deployed_simlinks = {**self.simlinks}
return True
def get_volumes_fs(self):
return list(self.data_holder.values())
def get_users_usage_fs(self):
return [
{
"name": "user1",
"path": "/home/user1",
"usage": "1G",
"soft_quota": "1G" ,
"hard_quota": "2G",
"extra_info": {
"files_used": 10,
"files_soft_quota": 100,
"files_hard_quota": 1000,
}
},
{
"name": "user2",
"path": "/home/user2",
"usage": "1G",
"soft_quota": "1G" ,
"hard_quota": "2G",
"extra_info": {
"files_used": 10,
"files_soft_quota": 100,
"files_hard_quota": 1000,
}
}
]
Mismatched volumes and groups¶
There are certain volumes in panfs that don’t exactly map to groups. For instance: /data/cqs maps to group h_cqs instead of cqs To mitigate such mismappings, something like this can be done:
from accre.filesystem.panfs.main import PanasasFileSystem
fs = PanasasFileSystem()
fs.mis_matched_volume_groups = {
"/data/cqs": "h_cqs",
"/nobackup/testing": "fe_accre_lab"
}
filesystem¶
Serves as the base of the filesystems.
- class accre.filesystem.basefilesystem.GenericFileSystemMixin(client=None)[source]¶
Bases:
object
Mixin class for generic file system implementations
- add_new_volume(volume_input: dict, deploy_symlinks=False)[source]¶
This is the function that will be exposed to the public user. :param dict volume_input: input that is validated against the pydantic model using partial=False :param bool deploy_symlinks: whether to deploy symlinks or not
- Raises:
ACCREValueError – if the volume could not be added to the filesystem
ValueError – if the input is not valid
- Returns:
True if the volue was added to the filesystem
- Return type:
bool
- add_or_update_usage_table(validated_volume_input, usage_details: dict, add_or_update: Optional[str] = None)[source]¶
This function modifies the existing volume’s usage information in the database
- Parameters:
validated_volume_input (PydanticModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
usage_details (dict) – Contains values to update in the usage table
add_or_update (None|str) – whether to ‘add’ or ‘update’ the volume in the database
- Returns bool:
status of whether or not the usage table was updated.
- add_or_update_volume_db(validated_volume_input, add_or_update: Optional[str] = None)[source]¶
This function modifies the existing volume’s usage information in the database
- Parameters:
validated_volume_input (PydanticModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
add_or_update (None|str) – whether to ‘add’ or ‘update’ the volume in the database This flag can be used to avoid the database query to check if the volume exists or not.
- Returns bool:
status of whether or not the usage table was updated.
- add_volume_fs(validated_volume_input)[source]¶
This function adds the validated volume input to the real thing and only after that’s done to the database
- Parameters:
validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
- Returns:
added, error_msg tuple of added and error_msg if not added
- Return type:
bool, str
- create_symlink(validated_volume_input)[source]¶
Create symlinks for the volume. Prints the output of the symlinks command as a side effect
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
- Returns:
None
- cross_check_usage(remove_from_db=False)[source]¶
This function will cross check the usage database with the filesystem usage and fixes the usage database entries as fit
- Parameters:
remove_from_db (bool) – if True, it will remove the volumes from the database that are not in the filesystem
- cross_check_volumes(remove_from_db=False)[source]¶
This function will cross check the database with the filesystem and fixes the database entries as fit
- Parameters:
remove_from_db (bool) – if True, it will remove the volumes from the database that are not in the filesystem
- delete_symlink(validated_volume_input)[source]¶
Delete symlinks for the volume. Prints the output of the symlinks command as a side effect
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
- Returns:
None
- get_existing_volume(volume_input: dict, check_in: str = 'database')[source]¶
Checks the existing volume in the desired location and returns the volume information :param dict volume_input: input that is validated against the pydantic model using partial=True :param str check_in: the place to check the volume from. Default: database. Options: filesystem, database
- Returns:
the volume information obtained from the fs or the db
- Return type:
dict
- get_existing_volumes(check_in: str = 'database')[source]¶
Checks the existing volume in the desired location and returns the volume information :param str check_in: the place to check the volume from. Default: database. Options: filesystem, database
- Returns:
the volume information obtained from the fs or the db
- Return type:
dict
- get_model_from_dict(volume_input: dict, partial=False)[source]¶
Generates a pydantic model from the volume_input dictionary :param dict volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database :param bool partial: if True, only validate input is partially qualified.
- Returns:
the pydantic model based on the volume_input
- get_usage_db(usage_info: dict)[source]¶
Retrieve general information about usage information :param dict usage_info: usage information that can be used to retrieve unique item from the database
- Returns:
the volume information from the database
- Return type:
dict
- get_usages_db()[source]¶
Retrieve general information about all active volumes, or optionall all volumes in the database
- Returns:
the volume information from the database
- Return type:
list(dict)
- get_users_usage_fs()[source]¶
Get the usage of a userspace volume for all users from the filesystem. :param str userspace_path: The path of the userspace volume. Example: /nobackup/userspace, /home
- Returns:
The usage of the userspace volume.
- Return type:
dict
- get_volume_db(validated_volume_input)[source]¶
Retrieve general information about volume :param DataModel validated_volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database
- Returns:
the volume information from the database
- Return type:
dict
- get_volume_fs(validated_volume_input, update_usage=True, timeout=600)[source]¶
Check the validated_volume_input to see if it exists and return the volume’s detail from the file system This function should also update the volume in the db with the volume’s usage information. use add_or_update_volume_db()
- Parameters:
validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
- Returns:
the volume information from the filesystem. Must contain usage information as the key “space_used”
- Return type:
dict
- get_volumes_db(database=None, active=None)[source]¶
Retrieve general information about all active volumes, or optionall all volumes in the database :param dict|None active: if unset/None, it gets all the elements, if none it gets the ones that match :param str|None database: if unset/None, it gets all the elements, if none it gets the ones that match :returns: the volume information from the database :rtype: list(dict)
- get_volumes_fs()[source]¶
Check the validated_volume_input to see if it exists and return the volume’s detail from the file system This function should also update the volume in the db with the volume’s usage information. use _add_or_update_volume_db()
- Parameters:
validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
- Returns:
the volume information from the filesystem. Must contain usage information as the key “space_used”
- Return type:
list[dict]
- modify_existing_volume(volume_input: dict)[source]¶
Modifies the existing volume :param dict volume_input: input that is validated against the pydantic model using partial=True
- Raises:
ACCREValueError – if the volume could not be added to the filesystem
ValueError – if the input is not valid
- Returns:
True if the volue was modified successfully
- Return type:
bool
- modify_volume_fs(validated_volume_input)[source]¶
This function modifies the existing volume first in the real thing and only after that’s done to the database
- Parameters:
validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
- primary_key_query(validated_volume_input, table=None)[source]¶
Generates a sql query from validated volume input to get the primary key SQL query
- Parameters:
validated_volume_input (dict) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
table (Table) – the table to get the primary key query from
- Returns:
the primary key SQL query based on the validated volume input
- Return type:
SqlQuery
- process_fields_db(validated_volume_input, insert=False)[source]¶
process the fields so that they can be inserted into the database for example: if hard_quota is 7 TB, then convert it to bytes and return that value
- Parameters:
volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
insert (bool) – whether to process the fields for insert or update
- Returns:
valid dict that can be inserted into volume database
- Return type:
dict
- process_usage_fields(validated_volume_input, usage_details: dict)[source]¶
process the fields so that: 1. usage table entry can be identified uniquely (mostly from validated_volume_input) 2. usage table can be populated based on usage_details
- Parameters:
volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
usage_details (dict) – the usage details of the volume
- Returns:
valid dict that can be inserted into the usage database
- Return type:
dict
- remove_usage_db(validated_usage_input: dict)[source]¶
Removes the volume from the database :param PydanticModel validated_usage_input: input that is validated against the pydantic model using partial=True
- Returns:
True if the volue was removed successfully
- Return type:
bool
- remove_volume_db(validated_volume_input: dict)[source]¶
Removes the volume from the database :param PydanticModel volume_input: input that is validated against the pydantic model using partial=True
- Returns:
True if the volue was removed successfully
- Return type:
bool
- remove_volume_fs(volume_input: dict)[source]¶
Removes the volume from the filesystem :param dict volume_input: input that is validated against the pydantic model using partial=True
- Returns:
True if the volue was removed successfully
- Return type:
bool
- validate_volume_input(volume_input: dict, partial: bool = False, identifiable: bool = False)[source]¶
This function must validate volume input. Assumes identifiable is automatically partial. :param dict validated_volume_input: input that is validated against the pydantic model and can be used to retrieve unique item from the database :param bool partial: if True, only validate input is partially qualified. :param bool identifiable: if True, validates that the input can atleast be used to identify a record in the database.
- Raises:
ValueError – if the input is not valid with the values that are bad
- Returns:
the primary key SQL query based on the validated volume input
- Return type:
data_model
filesystems.panfs¶
Main module for the panfs functionality
- class accre.filesystem.panfs.main.PanasasFileSystem(client=None)[source]¶
Bases:
GenericFileSystemMixin
FileSystem client for PanFS
- add_volume_fs(validated_volume_input: PanasasVolume)[source]¶
Add a volume to the file system.
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
- Returns:
The success of the operation, the message, and the updated volume info.
- Return type:
bool, str, dict
- create_symlink(validated_volume_input: PanasasVolume)[source]¶
Create symlinks for the volume.
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
- Returns:
Whether the symlink was created successfully
- Return type:
bool
- data_model¶
alias of
PanasasVolume
- data_model_partial¶
alias of
PanasasVolumePartial
- delete_symlink(validated_volume_input: PanasasVolume)[source]¶
Delete symlinks for the volume. Prints the output of the symlinks command as a side effect
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
- Returns:
None
- get_users_usage_fs()[source]¶
Get the usage of a userspace volume for all users from the filesystem.
- Parameters:
userspace_path (str) – The path of the userspace volume. Example, /nobackup/userspace, /home
- Returns:
The usage of the userspace volume. Example:
"name": usage["name"], "bladeset": fs_volume["bladeset"], "soft_quota" (float): usage["space_soft_quota"], "hard_quota" (float): usage["space_hard_quota"], "space_used" (int B): usage["space_used"], "path": f"{DEFAULT_MOUNT_POINT}/{fs_volume['name']}", "user_path": fs_volume["name"], "extra_info": { "files_used" (int): usage["files_used"], "files_soft_quota" (float): usage["files_soft_quota"], "files_hard_quota" (float): usage["files_hard_quota"], }
- Return type:
dict
- get_volume_fs(validated_volume_input, update_usage=False, timeout=600)[source]¶
Get the details of a volume from the file system.
- Parameters:
validated_volume_input (PanasasVolumePartial) – The validated input for the volume.
update_usage (bool) – Whether to update the usage table. Default is False.
timeout (int) – The maximum time in seconds to wait for the command to complete.
- Returns:
The output of the command. Raises exception if no volume is found in the filesystem.
- get_volumes_fs()[source]¶
This function will get the volumes from the filesystem If volume_input is None, it will get all the volumes in the filesystem
- Returns:
The list of volumes in the filesystem as a list of dicts. Dict is of format:
name[str]: The name of the volume bladeset[str]: The bladeset of the volume group[str]: The group that volume belongs to, default "accre" if no group association found soft_quota[bigint]: soft quota in bytes hard_quota[bigint]: hard quota in bytes space_used[bigint]: space used in bytes user_path[str]: Path as seen by the user path: "{DEFAULT_MOUNT_POINT}/ user_path", path as seen by the system user[str] (optional): The user of the volume, if it is a userspace volume
- Return type:
list[dict]
- static modify_hard_quota(volume_name, new_hard_quota, timeout=600)[source]¶
Modify the hard quota of a volume.
- Parameters:
volume_name (str) – The name of the volume.
new_hard_quota (int) – The new hard quota in GB.
timeout (int) – The maximum time in seconds to wait for the command to complete.
- Returns:
The output of the command.
- static modify_soft_quota(volume_name, new_soft_quota, timeout=600)[source]¶
Modify the soft quota of a volume.
- Parameters:
volume_name (str) – The name of the volume.
new_soft_quota (int) – The new soft quota in GB.
timeout (int) – The maximum time in seconds to wait for the command to complete.
- Returns:
The output of the command.
- modify_volume_fs(validated_volume_input: PanasasVolumePartial)[source]¶
Modify a volume in the file system.
- Parameters:
validated_volume_input (PanasasVolumePartial) – The validated input for the volume.
- Returns bool, str, dict:
The success of the operation, the message, and the updated volume info.
- process_fields_db(validated_volume_input, insert=False) dict [source]¶
Process the fields so that they can be inserted into the database for example: if hard_quota is 7 TB, then convert it to bytes and return that value
- Parameters:
validated_volume_input (PanasasVolume) – The validated input for the volume.
insert (bool) – If True, the input is for an insert operation. If False, the input is for an update operation.
- Returns:
valid dict that can be inserted into volume database
- Return type:
dict
- process_usage_fields(validated_volume_input: PanasasVolume, usage_details: dict)[source]¶
process the fields so that: 1. usage table entry can be identified uniquely (mostly from validated_volume_input) 2. usage table can be populated based on usage_details
- Parameters:
volume_input (VolumeDataModel) – input that is validated against the pydantic model and can be used to retrieve unique item from the database
usage_details (dict) – the usage details of the volume
- Returns:
valid dict that can be inserted into the usage database
- Return type:
dict
filesystems.utils¶
Contains utility functions for the PanFS filesystem.
- accre.filesystem.panfs.utils.get_group_from_volume_name(volume_name, volume_group_mapping=None)[source]¶
Get the group from the volume name.
- Parameters:
volume_name (str) – The name of the volume.
volume_group_mapping (dict) – A dictionary of volume name to group name. If not present, group name is derived from volume name.
- Returns:
The group of the volume.
- Return type:
str
- accre.filesystem.panfs.utils.get_key_value_from_raw(text)[source]¶
Parses the raw text from panFS command to get key and values in a dictionary. Returns None if the text says NO_VOLUME_FOUND_ERROR_MSG
- Parameters:
text – The raw text from the panfs command.
- Returns:
A dictionary of key and values or None if the volume doesn’t exist. Example Output:
{ "name: '/data/groupname', "state": 'Online (No Errors)', "bladeset: 'Set 1', "raid: '...', "recovery_priority: '50', "extended_file_system_availability_mode: 'retry', "user_and_group_quotas_policy: 'Enforced (inherited from system)', "soft_quota: '4.00 TB', "hard_quota: '5.00 TB', "space_used: '3.51 TB', "space_available: '1.02 PB' }
- accre.filesystem.panfs.utils.parse_panfs_userquota_output(stdout: str, volume_prefix: str = '/nobackup', testing_mode: bool = False)[source]¶
Parses the output of the panfs userquota command. :param str volume_prefix: The prefix of the volume to filter the output. :param bool testing_mode: If True, this function will run in testing mode :param str stdout: The standard output of the panfs userquota command. Example:
Volume Unix Windows Bytes % Bytes Used Files % Files Used User User Used Soft Hard Used Soft Hard Total Quota Quota Total Quota Quota /nobackup/userspace root 0 0 0 7.0 0 0 /nobackup/userspace uid:5007 130.90 GB 65.45 52.36 110.68 K 3.69 3.16
- Returns:
The parsed output.
Example Output: { "name": volume_prefix, "user": username, "space_used" (int B): space_used, "space_soft_quota" (float): space_soft_quota, "space_hard_quota" (float): space_hard_quota, "files_used" (int): files_used, "files_soft_quota" (float): files_soft_quota, "files_hard_quota" (float): files_hard_quota, }
- Return type:
list[dict]
- accre.filesystem.panfs.utils.parse_volume_list(stdout)[source]¶
Parses the volume_list output. Note: This only works for command “volume list show all”
- Parameters:
stdout (str) – The standard output of the volume_list command.
- Returns:
The parsed volume list.
returns { name: str, bladeset: str, raid: str, space_used: str, soft_quota: str, hard_quota: str, status: str }
- Return type:
list[dict]
- accre.filesystem.panfs.utils.remove_slash(string)[source]¶
Remove the first and the last slash from a string. :param str string: The string to remove the slash from. :returns: The string with the first slash removed.
- accre.filesystem.panfs.utils.run_panfs_command(arglist, server='admin@accrepfs.vampire', timeout=60)[source]¶
Runs a Panfs command as a subprocess and returns the standard output as a string decoded from utf-8. Optionally ssh out to a configured GPFS node and use a timeout.
- Parameters:
arglist (list(str)) – list of arguments to run
server (str) – ssh value for the panfs addrress
timeout (int) – Maximum time in seconds to wait for command to complete
- Returns:
Standard output of command interpreted as utf-8 text
- Return type:
str