mth5.tables

Submodules

Classes

MTH5Table

Base wrapper around an HDF5 dataset representing a typed table.

ChannelSummaryTable

Convenience wrapper around the channel summary dataset.

FCSummaryTable

Summary table for Fourier coefficients.

TFSummaryTable

Summary table for TransferFunction groups.

Package Contents

class mth5.tables.MTH5Table(hdf5_dataset: h5py.Dataset, default_dtype: numpy.dtype)[source]

Base wrapper around an HDF5 dataset representing a typed table.

Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.

Parameters:
  • hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.

  • default_dtype (numpy.dtype) – The default dtype schema for the table entries.

Raises:

MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.

Examples

Create a simple table and add a row:

>>> import h5py, numpy as np
>>> f = h5py.File('example.h5', 'w')
>>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')])
>>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype)
>>> from mth5.tables.mth5_table import MTH5Table
>>> t = MTH5Table(ds, dtype)
>>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype)
>>> t.add_row(row)
1
>>> df = t.to_dataframe()
>>> df.head()
logger
property hdf5_reference: object
property dtype: numpy.dtype
check_dtypes(other_dtype: numpy.dtype) bool[source]

Check that dtypes match the table’s dtype (including field names).

Parameters:

other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.

Returns:

True if the dtypes match; otherwise False.

Return type:

bool

property shape: tuple[int, Ellipsis]
property nrows: int
locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') numpy.ndarray[source]

Locate row indices where a column satisfies a comparison.

Parameters:
  • column (str) – Name of the column to test.

  • value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.

  • test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’

Returns:

Array of matching row indices.

Return type:

numpy.ndarray

Raises:

ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.

Examples

Find rows with value greater than 10:

>>> idx = t.locate('value', 10, test='gt')
add_row(row: numpy.ndarray, index: int | None = None) int[source]

Add a row to the table.

Parameters:
  • row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.

  • index (int, optional) – Index at which to insert the row. If None, appends to the end.

Returns:

Index of the inserted row.

Return type:

int

Raises:
  • TypeError – If row is not a numpy.ndarray.

  • ValueError – If the dtype is incompatible with the table.

update_row(entry: numpy.ndarray) int[source]

Update a row by locating its index and rewriting the entry.

Parameters:

entry (numpy.ndarray) – Entry to update, with the same dtype as the table.

Returns:

Row index that was updated, or the new row index if not found.

Return type:

int

Notes

Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.

remove_row(index: int) int[source]

Remove a row by replacing it with a null entry.

Parameters:

index (int) – Index of the row to remove.

Returns:

Index that was updated with a null row.

Return type:

int

Raises:

IndexError – If the index is out of bounds for the current shape.

Notes

  • There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.

  • The current approach inserts a null row at the specified index.

to_dataframe() pandas.DataFrame[source]

Convert the table into a pandas.DataFrame.

Returns:

DataFrame with decoded string columns where applicable.

Return type:

pandas.DataFrame

Examples

Convert and preview:

>>> df = t.to_dataframe()
>>> df.head()
clear_table() None[source]

Reset the table by recreating the dataset with a single null row.

Notes

Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).

update_dtype(new_dtype: numpy.dtype) None[source]

Update the dataset’s dtype while preserving data and field names.

Parameters:

new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.

Notes

Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.

class mth5.tables.ChannelSummaryTable(hdf5_dataset: h5py.Dataset)[source]

Bases: mth5.tables.MTH5Table

Convenience wrapper around the channel summary dataset.

Provides helpers to summarize channels, convert to pandas, and derive run-level summaries.

Examples

>>> ch_table = ChannelSummaryTable(hdf5_dataset)
>>> df = ch_table.to_dataframe()
>>> run_df = ch_table.to_run_summary()
to_dataframe() pandas.DataFrame[source]

Convert the channel summary to a pandas DataFrame.

Returns:

Channel summary with decoded string columns and parsed datetimes.

Return type:

pandas.DataFrame

Examples

>>> df = ch_table.to_dataframe()
>>> df.head()
summarize() None[source]

Populate the summary table from channel datasets in the file.

to_run_summary(allowed_input_channels: Iterable[str] = ALLOWED_INPUT_CHANNELS, allowed_output_channels: Iterable[str] = ALLOWED_OUTPUT_CHANNELS, sortby: list[str] | None = None) pandas.DataFrame[source]

Compress channel summary into a run-level summary (one row per run).

Parameters:
  • allowed_input_channels (Iterable[str], optional) – Allowed input channel names, by default ALLOWED_INPUT_CHANNELS.

  • allowed_output_channels (Iterable[str], optional) – Allowed output channel names, by default ALLOWED_OUTPUT_CHANNELS.

  • sortby (list of str or None, optional) – Columns to sort by; defaults to ["station", "start"] when None.

Returns:

Run-level summary including channels, durations, and references.

Return type:

pandas.DataFrame

Examples

>>> run_df = ch_table.to_run_summary()
>>> run_df.columns[:4].tolist()
['survey', 'station', 'run', 'start']
class mth5.tables.FCSummaryTable(hdf5_dataset: h5py.Dataset)[source]

Bases: mth5.tables.MTH5Table

Summary table for Fourier coefficients.

This class wraps an HDF5 dataset that stores a summary of Fourier coefficient datasets and provides convenience functions such as summarize() (to populate the table) and to_dataframe() (to export entries).

Examples

Populate and export a summary from an existing MTH5 file:

>>> import h5py
>>> from mth5.tables.fc_table import FCSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> # Assume the summary dataset already exists at this path
>>> table_ds = f['Exchange']['FC_Summary']
>>> fc_table = FCSummaryTable(table_ds)
>>> fc_table.summarize()  # walk the file and fill entries
>>> df = fc_table.to_dataframe()
>>> df.head()
to_dataframe() pandas.DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:

A dataframe with decoded string columns and parsed start/end timestamps.

Return type:

pandas.DataFrame

Examples

Export to a dataframe and filter by component:

>>> df = fc_table.to_dataframe()
>>> df[df.component == 'ex']
summarize() None[source]

Populate the summary table by traversing the HDF5 hierarchy.

The traversal searches for datasets with attribute mth5_type == 'FCChannel' and adds a corresponding summary row for each.

Return type:

None

Notes

  • If the table contains rows from a different OS/encoding, row insertion can raise a ValueError. A warning is logged and processing continues for subsequent rows.

Examples

Refresh the table entries:

>>> fc_table.clear_table()
>>> fc_table.summarize()
class mth5.tables.TFSummaryTable(hdf5_dataset: h5py.Dataset)[source]

Bases: mth5.tables.MTH5Table

Summary table for TransferFunction groups.

Provides convenience functions to populate the table (summarize) and export to pandas.DataFrame (to_dataframe).

Examples

Build and export a TF summary:

>>> import h5py
>>> from mth5.tables.tf_table import TFSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> tf_summary_ds = f['Exchange']['TF_Summary']
>>> tf_table = TFSummaryTable(tf_summary_ds)
>>> tf_table.summarize()
>>> df = tf_table.to_dataframe()
>>> df.head()
to_dataframe() pandas.DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:

A dataframe with decoded string columns.

Return type:

pandas.DataFrame

Examples

Filter transfer functions that include tipper:

>>> df = tf_table.to_dataframe()
>>> df[df.has_tipper]
summarize() None[source]

Populate the summary table by traversing the HDF5 hierarchy.

Searches for groups where mth5_type equals 'transferfunction' and adds a row indicating available datasets (impedance, tipper, covariance), period min/max, and relevant references.

Return type:

None

Examples

Refresh the TF summary:

>>> tf_table.clear_table()
>>> tf_table.summarize()