mth5.tables package

Submodules

mth5.tables.channel_table module

class mth5.tables.channel_table.ChannelSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Convenience wrapper around the channel summary dataset.

Provides helpers to summarize channels, convert to pandas, and derive run-level summaries.

Examples

>>> ch_table = ChannelSummaryTable(hdf5_dataset)
>>> df = ch_table.to_dataframe()
>>> run_df = ch_table.to_run_summary()

summarize() → None[source]: Populate the summary table from channel datasets in the file.

to_dataframe() → DataFrame[source]

Convert the channel summary to a pandas DataFrame.

Returns:: Channel summary with decoded string columns and parsed datetimes.
Return type:: pandas.DataFrame

Examples

>>> df = ch_table.to_dataframe()
>>> df.head()

to_run_summary(allowed_input_channels: Iterable[str] = ['bx', 'h1', 'hx', 'hy', 'h2', 'by'], allowed_output_channels: Iterable[str] = ['bz', 'hz', 'h3', 'ex', 'e3', 'e1', 'ey', 'e2', 'e4'], sortby: list[str] | None = None) → DataFrame[source]

Compress channel summary into a run-level summary (one row per run).

Parameters:

allowed_input_channels (Iterable[str], optional) – Allowed input channel names, by default ALLOWED_INPUT_CHANNELS.
allowed_output_channels (Iterable[str], optional) – Allowed output channel names, by default ALLOWED_OUTPUT_CHANNELS.
sortby (list of str or None, optional) – Columns to sort by; defaults to ["station", "start"] when None.

Returns:

Run-level summary including channels, durations, and references.

Return type:

pandas.DataFrame

Examples

>>> run_df = ch_table.to_run_summary()
>>> run_df.columns[:4].tolist()
['survey', 'station', 'run', 'start']

mth5.tables.fc_table module

Tabulate Fourier coefficients stored in an MTH5 file.

This module provides a small utility for summarizing Fourier-coefficient datasets (e.g., FCChannel) into a structured table and exporting to a convenient pandas.DataFrame for querying and analysis.

Notes

A basic test for this module exists under
mth5/tests/version_1/test_fcs.py.
The table is populated by traversing the HDF5 hierarchy and collecting
entries for datasets labeled with the attribute mth5_type='FCChannel'.

class mth5.tables.fc_table.FCSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Summary table for Fourier coefficients.

This class wraps an HDF5 dataset that stores a summary of Fourier coefficient datasets and provides convenience functions such as summarize() (to populate the table) and to_dataframe() (to export entries).

Examples

Populate and export a summary from an existing MTH5 file:

>>> import h5py
>>> from mth5.tables.fc_table import FCSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> # Assume the summary dataset already exists at this path
>>> table_ds = f['Exchange']['FC_Summary']
>>> fc_table = FCSummaryTable(table_ds)
>>> fc_table.summarize()  # walk the file and fill entries
>>> df = fc_table.to_dataframe()
>>> df.head()

summarize() → None[source]

Populate the summary table by traversing the HDF5 hierarchy.

The traversal searches for datasets with attribute mth5_type == 'FCChannel' and adds a corresponding summary row for each.

Return type:: None

Notes

If the table contains rows from a different OS/encoding, row insertion can raise a ValueError. A warning is logged and processing continues for subsequent rows.

Examples

Refresh the table entries:

>>> fc_table.clear_table()
>>> fc_table.summarize()

to_dataframe() → DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:: A dataframe with decoded string columns and parsed start/end timestamps.
Return type:: pandas.DataFrame

Examples

Export to a dataframe and filter by component:

>>> df = fc_table.to_dataframe()
>>> df[df.component == 'ex']

mth5.tables.mth5_table module

MTH5 table utilities.

This module provides the MTH5Table base class which wraps an HDF5 dataset and offers convenience methods for row management, locating entries, and exporting to pandas.DataFrame.

Notes

Designed as a thin layer on top of NumPy/HDF5; for complex querying, prefer
converting to a DataFrame via to_dataframe().
Datatypes are validated and kept consistent with the underlying dataset.

class mth5.tables.mth5_table.MTH5Table(hdf5_dataset: Dataset, default_dtype: dtype)[source]

Bases: object

Base wrapper around an HDF5 dataset representing a typed table.

Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.

Parameters:

hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.
default_dtype (numpy.dtype) – The default dtype schema for the table entries.

Raises:

MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.

Examples

Create a simple table and add a row:

>>> import h5py, numpy as np
>>> f = h5py.File('example.h5', 'w')
>>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')])
>>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype)
>>> from mth5.tables.mth5_table import MTH5Table
>>> t = MTH5Table(ds, dtype)
>>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype)
>>> t.add_row(row)
1
>>> df = t.to_dataframe()
>>> df.head()

add_row(row: ndarray, index: int | None = None) → int[source]

Add a row to the table.

Parameters:

row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.
index (int, optional) – Index at which to insert the row. If None, appends to the end.

Returns:

Index of the inserted row.

Return type:

int

Raises:

TypeError – If row is not a numpy.ndarray.
ValueError – If the dtype is incompatible with the table.

check_dtypes(other_dtype: dtype) → bool[source]

Check that dtypes match the table’s dtype (including field names).

Parameters:: other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.
Returns:: True if the dtypes match; otherwise False.
Return type:: bool

clear_table() → None[source]

Reset the table by recreating the dataset with a single null row.

Notes

Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).

property dtype: dtype[source]

property hdf5_reference: object[source]

locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') → ndarray[source]

Locate row indices where a column satisfies a comparison.

Parameters:

column (str) – Name of the column to test.
value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.
test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’

Returns:

Array of matching row indices.

Return type:

numpy.ndarray

Raises:

ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.

Examples

Find rows with value greater than 10:

>>> idx = t.locate('value', 10, test='gt')

property nrows: int[source]

remove_row(index: int) → int[source]

Remove a row by replacing it with a null entry.

Parameters:: index (int) – Index of the row to remove.
Returns:: Index that was updated with a null row.
Return type:: int
Raises:: IndexError – If the index is out of bounds for the current shape.

Notes

There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.
The current approach inserts a null row at the specified index.

property shape: tuple[int, ...][source]

to_dataframe() → DataFrame[source]

Convert the table into a pandas.DataFrame.

Returns:: DataFrame with decoded string columns where applicable.
Return type:: pandas.DataFrame

Examples

Convert and preview:

>>> df = t.to_dataframe()
>>> df.head()

update_dtype(new_dtype: dtype) → None[source]

Update the dataset’s dtype while preserving data and field names.

Parameters:: new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.

Notes

Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.

update_row(entry: ndarray) → int[source]

Update a row by locating its index and rewriting the entry.

Parameters:: entry (numpy.ndarray) – Entry to update, with the same dtype as the table.
Returns:: Row index that was updated, or the new row index if not found.
Return type:: int

Notes

Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.

mth5.tables.tf_table module

Transfer function summary table utilities.

Summarize TransferFunction groups stored in an MTH5 file into a structured table and provide a convenient pandas.DataFrame view for querying.

Notes

Traversal searches for groups with attribute mth5_type='transferfunction'
and collects basic availability flags (impedance, tipper, covariance) along with period range and references.

class mth5.tables.tf_table.TFSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Summary table for TransferFunction groups.

Provides convenience functions to populate the table (summarize) and export to pandas.DataFrame (to_dataframe).

Examples

Build and export a TF summary:

>>> import h5py
>>> from mth5.tables.tf_table import TFSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> tf_summary_ds = f['Exchange']['TF_Summary']
>>> tf_table = TFSummaryTable(tf_summary_ds)
>>> tf_table.summarize()
>>> df = tf_table.to_dataframe()
>>> df.head()

summarize() → None[source]

Populate the summary table by traversing the HDF5 hierarchy.

Searches for groups where mth5_type equals 'transferfunction' and adds a row indicating available datasets (impedance, tipper, covariance), period min/max, and relevant references.

Return type:: None

Examples

Refresh the TF summary:

>>> tf_table.clear_table()
>>> tf_table.summarize()

to_dataframe() → DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:: A dataframe with decoded string columns.
Return type:: pandas.DataFrame

Examples

Filter transfer functions that include tipper:

>>> df = tf_table.to_dataframe()
>>> df[df.has_tipper]

Module contents

class mth5.tables.ChannelSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Convenience wrapper around the channel summary dataset.

Provides helpers to summarize channels, convert to pandas, and derive run-level summaries.

Examples

>>> ch_table = ChannelSummaryTable(hdf5_dataset)
>>> df = ch_table.to_dataframe()
>>> run_df = ch_table.to_run_summary()

summarize() → None[source]: Populate the summary table from channel datasets in the file.

to_dataframe() → DataFrame[source]

Convert the channel summary to a pandas DataFrame.

Returns:: Channel summary with decoded string columns and parsed datetimes.
Return type:: pandas.DataFrame

Examples

>>> df = ch_table.to_dataframe()
>>> df.head()

to_run_summary(allowed_input_channels: Iterable[str] = ['bx', 'h1', 'hx', 'hy', 'h2', 'by'], allowed_output_channels: Iterable[str] = ['bz', 'hz', 'h3', 'ex', 'e3', 'e1', 'ey', 'e2', 'e4'], sortby: list[str] | None = None) → DataFrame[source]

Compress channel summary into a run-level summary (one row per run).

Parameters:

allowed_input_channels (Iterable[str], optional) – Allowed input channel names, by default ALLOWED_INPUT_CHANNELS.
allowed_output_channels (Iterable[str], optional) – Allowed output channel names, by default ALLOWED_OUTPUT_CHANNELS.
sortby (list of str or None, optional) – Columns to sort by; defaults to ["station", "start"] when None.

Returns:

Run-level summary including channels, durations, and references.

Return type:

pandas.DataFrame

Examples

>>> run_df = ch_table.to_run_summary()
>>> run_df.columns[:4].tolist()
['survey', 'station', 'run', 'start']

class mth5.tables.FCSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Summary table for Fourier coefficients.

This class wraps an HDF5 dataset that stores a summary of Fourier coefficient datasets and provides convenience functions such as summarize() (to populate the table) and to_dataframe() (to export entries).

Examples

Populate and export a summary from an existing MTH5 file:

>>> import h5py
>>> from mth5.tables.fc_table import FCSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> # Assume the summary dataset already exists at this path
>>> table_ds = f['Exchange']['FC_Summary']
>>> fc_table = FCSummaryTable(table_ds)
>>> fc_table.summarize()  # walk the file and fill entries
>>> df = fc_table.to_dataframe()
>>> df.head()

summarize() → None[source]

Populate the summary table by traversing the HDF5 hierarchy.

The traversal searches for datasets with attribute mth5_type == 'FCChannel' and adds a corresponding summary row for each.

Return type:: None

Notes

If the table contains rows from a different OS/encoding, row insertion can raise a ValueError. A warning is logged and processing continues for subsequent rows.

Examples

Refresh the table entries:

>>> fc_table.clear_table()
>>> fc_table.summarize()

to_dataframe() → DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:: A dataframe with decoded string columns and parsed start/end timestamps.
Return type:: pandas.DataFrame

Examples

Export to a dataframe and filter by component:

>>> df = fc_table.to_dataframe()
>>> df[df.component == 'ex']

class mth5.tables.MTH5Table(hdf5_dataset: Dataset, default_dtype: dtype)[source]

Bases: object

Base wrapper around an HDF5 dataset representing a typed table.

Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.

Parameters:

hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.
default_dtype (numpy.dtype) – The default dtype schema for the table entries.

Raises:

MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.

Examples

Create a simple table and add a row:

>>> import h5py, numpy as np
>>> f = h5py.File('example.h5', 'w')
>>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')])
>>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype)
>>> from mth5.tables.mth5_table import MTH5Table
>>> t = MTH5Table(ds, dtype)
>>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype)
>>> t.add_row(row)
1
>>> df = t.to_dataframe()
>>> df.head()

add_row(row: ndarray, index: int | None = None) → int[source]

Add a row to the table.

Parameters:

row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.
index (int, optional) – Index at which to insert the row. If None, appends to the end.

Returns:

Index of the inserted row.

Return type:

int

Raises:

TypeError – If row is not a numpy.ndarray.
ValueError – If the dtype is incompatible with the table.

check_dtypes(other_dtype: dtype) → bool[source]

Check that dtypes match the table’s dtype (including field names).

Parameters:: other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.
Returns:: True if the dtypes match; otherwise False.
Return type:: bool

clear_table() → None[source]

Reset the table by recreating the dataset with a single null row.

Notes

Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).

property dtype: dtype

property hdf5_reference: object

locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') → ndarray[source]

Locate row indices where a column satisfies a comparison.

Parameters:

column (str) – Name of the column to test.
value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.
test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’

Returns:

Array of matching row indices.

Return type:

numpy.ndarray

Raises:

ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.

Examples

Find rows with value greater than 10:

>>> idx = t.locate('value', 10, test='gt')

property nrows: int

remove_row(index: int) → int[source]

Remove a row by replacing it with a null entry.

Parameters:: index (int) – Index of the row to remove.
Returns:: Index that was updated with a null row.
Return type:: int
Raises:: IndexError – If the index is out of bounds for the current shape.

Notes

There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.
The current approach inserts a null row at the specified index.

property shape: tuple[int, ...]

to_dataframe() → DataFrame[source]

Convert the table into a pandas.DataFrame.

Returns:: DataFrame with decoded string columns where applicable.
Return type:: pandas.DataFrame

Examples

Convert and preview:

>>> df = t.to_dataframe()
>>> df.head()

update_dtype(new_dtype: dtype) → None[source]

Update the dataset’s dtype while preserving data and field names.

Parameters:: new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.

Notes

Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.

update_row(entry: ndarray) → int[source]

Update a row by locating its index and rewriting the entry.

Parameters:: entry (numpy.ndarray) – Entry to update, with the same dtype as the table.
Returns:: Row index that was updated, or the new row index if not found.
Return type:: int

Notes

Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.

class mth5.tables.TFSummaryTable(hdf5_dataset: Dataset)[source]

Bases: MTH5Table

Summary table for TransferFunction groups.

Provides convenience functions to populate the table (summarize) and export to pandas.DataFrame (to_dataframe).

Examples

Build and export a TF summary:

>>> import h5py
>>> from mth5.tables.tf_table import TFSummaryTable
>>> f = h5py.File('example.mth5', 'r')
>>> tf_summary_ds = f['Exchange']['TF_Summary']
>>> tf_table = TFSummaryTable(tf_summary_ds)
>>> tf_table.summarize()
>>> df = tf_table.to_dataframe()
>>> df.head()

summarize() → None[source]

Populate the summary table by traversing the HDF5 hierarchy.

Searches for groups where mth5_type equals 'transferfunction' and adds a row indicating available datasets (impedance, tipper, covariance), period min/max, and relevant references.

Return type:: None

Examples

Refresh the TF summary:

>>> tf_table.clear_table()
>>> tf_table.summarize()

to_dataframe() → DataFrame[source]

Convert the table to a pandas.DataFrame for easier querying.

Returns:: A dataframe with decoded string columns.
Return type:: pandas.DataFrame

Examples

Filter transfer functions that include tipper:

>>> df = tf_table.to_dataframe()
>>> df[df.has_tipper]