mth5.tables.mth5_table

MTH5 table utilities.

This module provides the MTH5Table base class which wraps an HDF5 dataset and offers convenience methods for row management, locating entries, and exporting to pandas.DataFrame.

Notes

  • Designed as a thin layer on top of NumPy/HDF5; for complex querying, prefer

    converting to a DataFrame via to_dataframe().

  • Datatypes are validated and kept consistent with the underlying dataset.

Classes

MTH5Table

Base wrapper around an HDF5 dataset representing a typed table.

Module Contents

class mth5.tables.mth5_table.MTH5Table(hdf5_dataset: h5py.Dataset, default_dtype: numpy.dtype)[source]

Base wrapper around an HDF5 dataset representing a typed table.

Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.

Parameters:
  • hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.

  • default_dtype (numpy.dtype) – The default dtype schema for the table entries.

Raises:

MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.

Examples

Create a simple table and add a row:

>>> import h5py, numpy as np
>>> f = h5py.File('example.h5', 'w')
>>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')])
>>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype)
>>> from mth5.tables.mth5_table import MTH5Table
>>> t = MTH5Table(ds, dtype)
>>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype)
>>> t.add_row(row)
1
>>> df = t.to_dataframe()
>>> df.head()
logger[source]
property hdf5_reference: object[source]
property dtype: numpy.dtype[source]
check_dtypes(other_dtype: numpy.dtype) bool[source]

Check that dtypes match the table’s dtype (including field names).

Parameters:

other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.

Returns:

True if the dtypes match; otherwise False.

Return type:

bool

property shape: tuple[int, Ellipsis][source]
property nrows: int[source]
locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') numpy.ndarray[source]

Locate row indices where a column satisfies a comparison.

Parameters:
  • column (str) – Name of the column to test.

  • value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.

  • test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’

Returns:

Array of matching row indices.

Return type:

numpy.ndarray

Raises:

ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.

Examples

Find rows with value greater than 10:

>>> idx = t.locate('value', 10, test='gt')
add_row(row: numpy.ndarray, index: int | None = None) int[source]

Add a row to the table.

Parameters:
  • row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.

  • index (int, optional) – Index at which to insert the row. If None, appends to the end.

Returns:

Index of the inserted row.

Return type:

int

Raises:
  • TypeError – If row is not a numpy.ndarray.

  • ValueError – If the dtype is incompatible with the table.

update_row(entry: numpy.ndarray) int[source]

Update a row by locating its index and rewriting the entry.

Parameters:

entry (numpy.ndarray) – Entry to update, with the same dtype as the table.

Returns:

Row index that was updated, or the new row index if not found.

Return type:

int

Notes

Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.

remove_row(index: int) int[source]

Remove a row by replacing it with a null entry.

Parameters:

index (int) – Index of the row to remove.

Returns:

Index that was updated with a null row.

Return type:

int

Raises:

IndexError – If the index is out of bounds for the current shape.

Notes

  • There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.

  • The current approach inserts a null row at the specified index.

to_dataframe() pandas.DataFrame[source]

Convert the table into a pandas.DataFrame.

Returns:

DataFrame with decoded string columns where applicable.

Return type:

pandas.DataFrame

Examples

Convert and preview:

>>> df = t.to_dataframe()
>>> df.head()
clear_table() None[source]

Reset the table by recreating the dataset with a single null row.

Notes

Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).

update_dtype(new_dtype: numpy.dtype) None[source]

Update the dataset’s dtype while preserving data and field names.

Parameters:

new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.

Notes

Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.