mth5.tables.mth5_table

MTH5 table utilities.

This module provides the MTH5Table base class which wraps an HDF5 dataset and offers convenience methods for row management, locating entries, and exporting to pandas.DataFrame.

Notes

Designed as a thin layer on top of NumPy/HDF5; for complex querying, prefer
converting to a DataFrame via to_dataframe().
Datatypes are validated and kept consistent with the underlying dataset.

Classes

MTH5Table

Base wrapper around an HDF5 dataset representing a typed table.

Module Contents

class mth5.tables.mth5_table.MTH5Table(hdf5_dataset: h5py.Dataset, default_dtype: numpy.dtype)[source]

Base wrapper around an HDF5 dataset representing a typed table.

Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.

Parameters:

hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.
default_dtype (numpy.dtype) – The default dtype schema for the table entries.

Raises:

MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.

Examples

Create a simple table and add a row:

>>> import h5py, numpy as np
>>> f = h5py.File('example.h5', 'w')
>>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')])
>>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype)
>>> from mth5.tables.mth5_table import MTH5Table
>>> t = MTH5Table(ds, dtype)
>>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype)
>>> t.add_row(row)
1
>>> df = t.to_dataframe()
>>> df.head()

logger[source]

property hdf5_reference: object[source]

property dtype: numpy.dtype[source]

check_dtypes(other_dtype: numpy.dtype) → bool[source]

Check that dtypes match the table’s dtype (including field names).

Parameters:: other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.
Returns:: True if the dtypes match; otherwise False.
Return type:: bool

property shape: tuple[int, Ellipsis][source]

property nrows: int[source]

locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') → numpy.ndarray[source]

Locate row indices where a column satisfies a comparison.

Parameters:

column (str) – Name of the column to test.
value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.
test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’

Returns:

Array of matching row indices.

Return type:

numpy.ndarray

Raises:

ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.

Examples

Find rows with value greater than 10:

>>> idx = t.locate('value', 10, test='gt')

add_row(row: numpy.ndarray, index: int | None = None) → int[source]

Add a row to the table.

Parameters:

row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.
index (int, optional) – Index at which to insert the row. If None, appends to the end.

Returns:

Index of the inserted row.

Return type:

int

Raises:

TypeError – If row is not a numpy.ndarray.
ValueError – If the dtype is incompatible with the table.

update_row(entry: numpy.ndarray) → int[source]

Update a row by locating its index and rewriting the entry.

Parameters:: entry (numpy.ndarray) – Entry to update, with the same dtype as the table.
Returns:: Row index that was updated, or the new row index if not found.
Return type:: int

Notes

Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.

remove_row(index: int) → int[source]

Remove a row by replacing it with a null entry.

Parameters:: index (int) – Index of the row to remove.
Returns:: Index that was updated with a null row.
Return type:: int
Raises:: IndexError – If the index is out of bounds for the current shape.

Notes

There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.
The current approach inserts a null row at the specified index.

to_dataframe() → pandas.DataFrame[source]

Convert the table into a pandas.DataFrame.

Returns:: DataFrame with decoded string columns where applicable.
Return type:: pandas.DataFrame

Examples

Convert and preview:

>>> df = t.to_dataframe()
>>> df.head()

clear_table() → None[source]

Reset the table by recreating the dataset with a single null row.

Notes

Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).

update_dtype(new_dtype: numpy.dtype) → None[source]

Update the dataset’s dtype while preserving data and field names.

Parameters:: new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.

Notes

Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.