mth5.tables.mth5_table
MTH5 table utilities.
This module provides the MTH5Table base class which wraps an HDF5 dataset and offers convenience methods for row management, locating entries, and exporting to pandas.DataFrame.
Notes
- Designed as a thin layer on top of NumPy/HDF5; for complex querying, prefer
converting to a DataFrame via to_dataframe().
Datatypes are validated and kept consistent with the underlying dataset.
Classes
Base wrapper around an HDF5 dataset representing a typed table. |
Module Contents
- class mth5.tables.mth5_table.MTH5Table(hdf5_dataset: h5py.Dataset, default_dtype: numpy.dtype)[source]
Base wrapper around an HDF5 dataset representing a typed table.
Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.
- Parameters:
hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.
default_dtype (numpy.dtype) – The default dtype schema for the table entries.
- Raises:
MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.
Examples
Create a simple table and add a row:
>>> import h5py, numpy as np >>> f = h5py.File('example.h5', 'w') >>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')]) >>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype) >>> from mth5.tables.mth5_table import MTH5Table >>> t = MTH5Table(ds, dtype) >>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype) >>> t.add_row(row) 1 >>> df = t.to_dataframe() >>> df.head()
- check_dtypes(other_dtype: numpy.dtype) bool[source]
Check that dtypes match the table’s dtype (including field names).
- Parameters:
other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.
- Returns:
True if the dtypes match; otherwise False.
- Return type:
bool
- locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') numpy.ndarray[source]
Locate row indices where a column satisfies a comparison.
- Parameters:
column (str) – Name of the column to test.
value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.
test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’
- Returns:
Array of matching row indices.
- Return type:
numpy.ndarray
- Raises:
ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.
Examples
Find rows with value greater than 10:
>>> idx = t.locate('value', 10, test='gt')
- add_row(row: numpy.ndarray, index: int | None = None) int[source]
Add a row to the table.
- Parameters:
row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.
index (int, optional) – Index at which to insert the row. If None, appends to the end.
- Returns:
Index of the inserted row.
- Return type:
int
- Raises:
TypeError – If row is not a numpy.ndarray.
ValueError – If the dtype is incompatible with the table.
- update_row(entry: numpy.ndarray) int[source]
Update a row by locating its index and rewriting the entry.
- Parameters:
entry (numpy.ndarray) – Entry to update, with the same dtype as the table.
- Returns:
Row index that was updated, or the new row index if not found.
- Return type:
int
Notes
Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.
- remove_row(index: int) int[source]
Remove a row by replacing it with a null entry.
- Parameters:
index (int) – Index of the row to remove.
- Returns:
Index that was updated with a null row.
- Return type:
int
- Raises:
IndexError – If the index is out of bounds for the current shape.
Notes
There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.
The current approach inserts a null row at the specified index.
- to_dataframe() pandas.DataFrame[source]
Convert the table into a pandas.DataFrame.
- Returns:
DataFrame with decoded string columns where applicable.
- Return type:
pandas.DataFrame
Examples
Convert and preview:
>>> df = t.to_dataframe() >>> df.head()
- clear_table() None[source]
Reset the table by recreating the dataset with a single null row.
Notes
Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).
- update_dtype(new_dtype: numpy.dtype) None[source]
Update the dataset’s dtype while preserving data and field names.
- Parameters:
new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.
Notes
Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.