mth5.helpers
Helper functions for HDF5
Created on Tue Jun 2 12:37:50 2020
- copyright:
Jared Peacock (jpeacock@usgs.gov)
- license:
MIT
Attributes
Functions
|
Validate that the input compression is supported. |
|
Recursively traverse an HDF5 group and return a string representation of its structure. |
|
Close all open HDF5 files found in memory. |
|
Recursively print the contents of an HDF5 group in a formatted tree structure. |
|
Convert a value to a numpy/HDF5 compatible type. |
|
Validate and clean a name for HDF5 compatibility. |
|
Convert a value from numpy/HDF5 format back to standard Python types. |
Coerce a value to the expected type based on metadata field definitions. |
|
|
get dictionary of expected data types from the metadata object. |
|
Get the Python data type from its string representation. |
|
Read HDF5 attributes from a group or dataset into a dictionary. |
|
Class decorator to inherit docstring from parent classes. |
|
Validate and clean a name for HDF5 compatibility. |
|
Add MTH5-specific attributes to a pydantic metadata class. |
Module Contents
- mth5.helpers.validate_compression(compression: str | None, level: int | str | None) tuple[str | None, int | str | None][source]
Validate that the input compression is supported.
- Parameters:
compression (str or None) – Type of lossless compression. Options are ‘lzf’, ‘gzip’, ‘szip’, or None.
level (int, str, or None) – Compression level if supported. - int for ‘gzip’ (0-9) - str for ‘szip’ (‘ec-8’, ‘ee-10’, ‘nn-8’, ‘nn-10’) - None for ‘lzf’ or None compression
- Returns:
compression (str or None) – Validated compression type
level (int, str, or None) – Validated compression level
- Raises:
ValueError – If compression or level are not supported
TypeError – If compression is not a string or None, or if compression level type is incorrect for the specified compression type
- mth5.helpers.recursive_hdf5_tree(group: h5py.Group | h5py.File | h5py.Dataset, lines: list[str] | None = None) str[source]
Recursively traverse an HDF5 group and return a string representation of its structure.
- Parameters:
group (h5py.Group, h5py.File, or h5py.Dataset) – HDF5 object to traverse
lines (list of str, optional) – List to accumulate the tree representation lines. If None, an empty list is used.
- Returns:
String representation of the HDF5 tree structure
- Return type:
str
Notes
This function recursively traverses HDF5 groups and files, building a text representation of the structure including groups, datasets, and attributes.
- mth5.helpers.close_open_files() None[source]
Close all open HDF5 files found in memory.
This function searches through all objects in memory using garbage collection to find and close any open HDF5 files. This is useful for cleanup operations to ensure no files are left open.
Notes
This function iterates through all objects in memory and attempts to close any h5py.File objects that are found. If a file is already closed, it will log that information. Any exceptions during the process are caught and logged.
- mth5.helpers.get_tree(parent: h5py.Group | h5py.File) str[source]
Recursively print the contents of an HDF5 group in a formatted tree structure.
- Parameters:
parent (h5py.Group or h5py.File) – HDF5 (sub-)tree to print
- Returns:
Formatted string representation of the HDF5 tree structure
- Return type:
str
- Raises:
TypeError – If the provided object is not an h5py.File or h5py.Group object
Notes
This function creates a hierarchical text representation of an HDF5 file or group structure, showing groups and datasets with appropriate indentation and formatting.
- mth5.helpers.to_numpy_type(value: Any) Any[source]
Convert a value to a numpy/HDF5 compatible type.
This function handles the conversion of various Python data types to formats that are compatible with both NumPy and HDF5. For numbers and booleans, this is straightforward as they are automatically mapped to numpy types. For strings and complex data structures, special handling is required.
- Parameters:
value (any) – The value to convert to a numpy/HDF5 compatible type
- Returns:
The converted value in a numpy/HDF5 compatible format: - None becomes “none” string - Dictionaries and lists become JSON strings - Type objects become string representations - h5py References become strings - Object arrays become string representations - Iterables with strings become numpy byte arrays - Other iterables become numpy arrays - Basic types (str, int, float, bool, complex) are returned as-is
- Return type:
various
Notes
HDF5 should only deal with ASCII characters or Unicode. No binary data is allowed. This function ensures compatibility by converting complex Python objects to appropriate string or array representations.
Lists and dictionaries are converted to JSON strings for storage in HDF5, which can be reconstructed using from_numpy_type.
- mth5.helpers.validate_name(name: str) str[source]
Clean a name by replacing spaces and slashes with underscores.
- Parameters:
name (str) – The name to validate and clean
- Returns:
The cleaned name with spaces and slashes replaced by underscores
- Return type:
str
Notes
This function ensures that names are compatible with HDF5 naming conventions by removing problematic characters.
- mth5.helpers.from_numpy_type(value: Any) Any[source]
Convert a value from numpy/HDF5 format back to standard Python types.
This function handles the reverse conversion from numpy/HDF5 compatible types back to standard Python data types. It’s the counterpart to to_numpy_type.
- Parameters:
value (any) – The value to convert from numpy/HDF5 format
- Returns:
The converted value in standard Python format: - “none” string becomes None - JSON strings become dictionaries or lists - h5py References become strings - Numpy types become standard Python types - Byte arrays become string lists - Other arrays become Python lists
- Return type:
various
- Raises:
TypeError – If the value type is not understood or supported
Notes
This function reverses the conversions made by to_numpy_type, including: - Converting JSON strings back to dictionaries and lists - Converting “none” strings back to None - Converting numpy arrays back to Python lists - Handling deprecated numpy.bool types
For numbers and booleans, they are automatically mapped from h5py to numpy types. For strings, especially lists of strings, special handling is required. HDF5 deals with ASCII characters or Unicode, no binary data is allowed.
- mth5.helpers.coerce_value_to_expected_type(key: str, value: Any, expected_type: Any) Any[source]
Coerce a value to the expected type based on metadata field definitions.
This method handles type conversions for older MTH5 files that may have stored metadata with less strict type enforcement. Uses the metadata’s attribute_information method to get expected types.
- Parameters:
key (str) – Metadata field name (may include dots for nested attributes).
value (Any) – Value to coerce.
expected_type (Any) – Expected value type (can be a type object or string representation).
- Returns:
Coerced value matching expected type, or original value if coercion fails.
- Return type:
Any
Examples
>>> coerced = channel._coerce_value_to_expected_type('sample_rate', '256.0', float) >>> print(type(coerced), coerced) <class 'float'> 256.0
>>> coerced = channel._coerce_value_to_expected_type('channel_number', 1.0, int) >>> print(type(coerced), coerced) <class 'int'> 1
- mth5.helpers.get_metadata_type_dict(metadata_class: mt_metadata.base.MetadataBase) dict[str, Type[Any]][source]
get dictionary of expected data types from the metadata object.
- Parameters:
metadata_class (MetadataBase) – Metadata class to extract data types from
- Returns:
Dictionary mapping metadata field names to their expected data types.
- Return type:
dict[str, Type[Any]]
- mth5.helpers.get_data_type(string_representation: str) Type[Any][source]
Get the Python data type from its string representation.
- Parameters:
string_representation (str) – String representation of the data type (e.g., ‘int’, ‘float’, ‘str’).
- Returns:
Corresponding Python data type.
- Return type:
type
- Raises:
ValueError – If the string representation does not correspond to a known data type.
Notes
This function maps common string representations of data types to their corresponding Python types. It supports basic types like int, float, str, bool, list, and dict.
- mth5.helpers.read_attrs_to_dict(attrs_dict: dict[str, Any], metadata_object: mt_metadata.base.MetadataBase) dict[str, Any][source]
Read HDF5 attributes from a group or dataset into a dictionary.
- Parameters:
attrs_dict (dict[str, Any]) – Dictionary of attributes to read and convert.
metadata_object (MetadataBase) – Metadata object to use for type information.
- Returns:
Dictionary containing attribute names and their corresponding values.
- Return type:
dict[str, Any]
- mth5.helpers.inherit_doc_string(cls: Type[Any]) Type[Any][source]
Class decorator to inherit docstring from parent classes.
This decorator searches through the method resolution order (MRO) of a class to find the first parent class with a docstring and applies it to the current class.
- Parameters:
cls (type) – The class to apply docstring inheritance to
- Returns:
The same class with inherited docstring if found
- Return type:
type
Notes
This is useful for subclasses that should inherit documentation from their parent classes when they don’t have their own docstring defined.
- mth5.helpers.validate_name(name: str | None, pattern: str | None = None) str[source]
Validate and clean a name for HDF5 compatibility.
- Parameters:
name (str or None) – The name to validate and clean
pattern (str, optional) – Pattern for validation (currently not used but reserved for future use)
- Returns:
The cleaned name with spaces replaced by underscores and commas removed. Returns “unknown” if input name is None.
- Return type:
str
Notes
This function ensures that names are compatible with HDF5 naming conventions by removing problematic characters. If the input name is None, it returns “unknown” as a default value.
- mth5.helpers.add_attributes_to_metadata_class_pydantic(obj: Type[Any]) Type[Any][source]
Add MTH5-specific attributes to a pydantic metadata class.
This function enhances a pydantic class by adding two important fields: - mth5_type: derived from the class name, indicates the type of MTH5 group - hdf5_reference: stores the HDF5 internal reference
- Parameters:
obj (type) – A pydantic class to enhance with MTH5 attributes
- Returns:
An instance of the enhanced class with added MTH5-specific fields
- Return type:
object
- Raises:
TypeError – If the input is not a class
Notes
This function is used to dynamically add metadata fields that are required for MTH5 group management. The mth5_type field is derived from the class name by removing “Group” suffix, and the hdf5_reference field is initialized to None but will be set when the object is associated with an HDF5 group.