mth5.tables
Submodules
Classes
Base wrapper around an HDF5 dataset representing a typed table. |
|
Convenience wrapper around the channel summary dataset. |
|
Summary table for Fourier coefficients. |
|
Summary table for TransferFunction groups. |
Package Contents
- class mth5.tables.MTH5Table(hdf5_dataset: h5py.Dataset, default_dtype: numpy.dtype)[source]
Base wrapper around an HDF5 dataset representing a typed table.
Provides simple NumPy-based operations including row insertion/removal, basic locating utilities, and conversion to pandas.DataFrame.
- Parameters:
hdf5_dataset (h5py.Dataset) – The HDF5 dataset that stores the table.
default_dtype (numpy.dtype) – The default dtype schema for the table entries.
- Raises:
MTH5TableError – If hdf5_dataset is not an instance of h5py.Dataset.
Examples
Create a simple table and add a row:
>>> import h5py, numpy as np >>> f = h5py.File('example.h5', 'w') >>> dtype = np.dtype([('name', 'S16'), ('value', 'f8')]) >>> ds = f.create_dataset('table', (1,), maxshape=(None,), dtype=dtype) >>> from mth5.tables.mth5_table import MTH5Table >>> t = MTH5Table(ds, dtype) >>> row = np.array([('alpha'.encode('utf-8'), 1.23)], dtype=dtype) >>> t.add_row(row) 1 >>> df = t.to_dataframe() >>> df.head()
- logger
- property hdf5_reference: object
- property dtype: numpy.dtype
- check_dtypes(other_dtype: numpy.dtype) bool[source]
Check that dtypes match the table’s dtype (including field names).
- Parameters:
other_dtype (numpy.dtype) – The dtype to compare against the table’s dtype.
- Returns:
True if the dtypes match; otherwise False.
- Return type:
bool
- property shape: tuple[int, Ellipsis]
- property nrows: int
- locate(column: str, value: Any, test: Literal['eq', 'lt', 'le', 'gt', 'ge', 'be', 'bt'] = 'eq') numpy.ndarray[source]
Locate row indices where a column satisfies a comparison.
- Parameters:
column (str) – Name of the column to test.
value (Any) – Value to compare against. For string columns, a str is converted to a numpy.bytes_. For time columns (start, end, start_date, end_date), values are coerced to numpy.datetime64.
test ({'eq','lt','le','gt','ge','be','bt'}, default 'eq') – Type of comparison to perform. - ‘eq’: equals - ‘lt’: less than - ‘le’: less than or equal to - ‘gt’: greater than - ‘ge’: greater than or equal to - ‘be’: strictly between - ‘bt’: alias for ‘be’
- Returns:
Array of matching row indices.
- Return type:
numpy.ndarray
- Raises:
ValueError – If test is ‘be’/’bt’ and value is not a 2-length iterable.
Examples
Find rows with value greater than 10:
>>> idx = t.locate('value', 10, test='gt')
- add_row(row: numpy.ndarray, index: int | None = None) int[source]
Add a row to the table.
- Parameters:
row (numpy.ndarray) – Row to insert. Must have the same dtype (or same field names, allowing safe casting) as the table.
index (int, optional) – Index at which to insert the row. If None, appends to the end.
- Returns:
Index of the inserted row.
- Return type:
int
- Raises:
TypeError – If row is not a numpy.ndarray.
ValueError – If the dtype is incompatible with the table.
- update_row(entry: numpy.ndarray) int[source]
Update a row by locating its index and rewriting the entry.
- Parameters:
entry (numpy.ndarray) – Entry to update, with the same dtype as the table.
- Returns:
Row index that was updated, or the new row index if not found.
- Return type:
int
Notes
Matching by hdf5_reference is not reliable; this uses add_row and will append if the original row cannot be located.
- remove_row(index: int) int[source]
Remove a row by replacing it with a null entry.
- Parameters:
index (int) – Index of the row to remove.
- Returns:
Index that was updated with a null row.
- Return type:
int
- Raises:
IndexError – If the index is out of bounds for the current shape.
Notes
There is no intrinsic index stored within the array; indexing is on-the-fly. Prefer using the HDF5 reference column for robust identification.
The current approach inserts a null row at the specified index.
- to_dataframe() pandas.DataFrame[source]
Convert the table into a pandas.DataFrame.
- Returns:
DataFrame with decoded string columns where applicable.
- Return type:
pandas.DataFrame
Examples
Convert and preview:
>>> df = t.to_dataframe() >>> df.head()
- clear_table() None[source]
Reset the table by recreating the dataset with a single null row.
Notes
Deletes the current dataset and replaces it with a new dataset with the same compression/options and dtype, but shape (1,).
- update_dtype(new_dtype: numpy.dtype) None[source]
Update the dataset’s dtype while preserving data and field names.
- Parameters:
new_dtype (numpy.dtype) – New dtype to apply. Must have identical field names.
Notes
Performs a manual copy into a new array to avoid unsafe casting errors, then recreates the dataset with the new dtype and same dataset options.
- class mth5.tables.ChannelSummaryTable(hdf5_dataset: h5py.Dataset)[source]
Bases:
mth5.tables.MTH5TableConvenience wrapper around the channel summary dataset.
Provides helpers to summarize channels, convert to pandas, and derive run-level summaries.
Examples
>>> ch_table = ChannelSummaryTable(hdf5_dataset) >>> df = ch_table.to_dataframe() >>> run_df = ch_table.to_run_summary()
- to_dataframe() pandas.DataFrame[source]
Convert the channel summary to a pandas DataFrame.
- Returns:
Channel summary with decoded string columns and parsed datetimes.
- Return type:
pandas.DataFrame
Examples
>>> df = ch_table.to_dataframe() >>> df.head()
- to_run_summary(allowed_input_channels: Iterable[str] = ALLOWED_INPUT_CHANNELS, allowed_output_channels: Iterable[str] = ALLOWED_OUTPUT_CHANNELS, sortby: list[str] | None = None) pandas.DataFrame[source]
Compress channel summary into a run-level summary (one row per run).
- Parameters:
allowed_input_channels (Iterable[str], optional) – Allowed input channel names, by default
ALLOWED_INPUT_CHANNELS.allowed_output_channels (Iterable[str], optional) – Allowed output channel names, by default
ALLOWED_OUTPUT_CHANNELS.sortby (list of str or None, optional) – Columns to sort by; defaults to
["station", "start"]whenNone.
- Returns:
Run-level summary including channels, durations, and references.
- Return type:
pandas.DataFrame
Examples
>>> run_df = ch_table.to_run_summary() >>> run_df.columns[:4].tolist() ['survey', 'station', 'run', 'start']
- class mth5.tables.FCSummaryTable(hdf5_dataset: h5py.Dataset)[source]
Bases:
mth5.tables.MTH5TableSummary table for Fourier coefficients.
This class wraps an HDF5 dataset that stores a summary of Fourier coefficient datasets and provides convenience functions such as summarize() (to populate the table) and to_dataframe() (to export entries).
Examples
Populate and export a summary from an existing MTH5 file:
>>> import h5py >>> from mth5.tables.fc_table import FCSummaryTable >>> f = h5py.File('example.mth5', 'r') >>> # Assume the summary dataset already exists at this path >>> table_ds = f['Exchange']['FC_Summary'] >>> fc_table = FCSummaryTable(table_ds) >>> fc_table.summarize() # walk the file and fill entries >>> df = fc_table.to_dataframe() >>> df.head()
- to_dataframe() pandas.DataFrame[source]
Convert the table to a pandas.DataFrame for easier querying.
- Returns:
A dataframe with decoded string columns and parsed start/end timestamps.
- Return type:
pandas.DataFrame
Examples
Export to a dataframe and filter by component:
>>> df = fc_table.to_dataframe() >>> df[df.component == 'ex']
- summarize() None[source]
Populate the summary table by traversing the HDF5 hierarchy.
The traversal searches for datasets with attribute
mth5_type == 'FCChannel'and adds a corresponding summary row for each.- Return type:
None
Notes
If the table contains rows from a different OS/encoding, row insertion can raise a ValueError. A warning is logged and processing continues for subsequent rows.
Examples
Refresh the table entries:
>>> fc_table.clear_table() >>> fc_table.summarize()
- class mth5.tables.TFSummaryTable(hdf5_dataset: h5py.Dataset)[source]
Bases:
mth5.tables.MTH5TableSummary table for TransferFunction groups.
Provides convenience functions to populate the table (summarize) and export to pandas.DataFrame (to_dataframe).
Examples
Build and export a TF summary:
>>> import h5py >>> from mth5.tables.tf_table import TFSummaryTable >>> f = h5py.File('example.mth5', 'r') >>> tf_summary_ds = f['Exchange']['TF_Summary'] >>> tf_table = TFSummaryTable(tf_summary_ds) >>> tf_table.summarize() >>> df = tf_table.to_dataframe() >>> df.head()
- to_dataframe() pandas.DataFrame[source]
Convert the table to a pandas.DataFrame for easier querying.
- Returns:
A dataframe with decoded string columns.
- Return type:
pandas.DataFrame
Examples
Filter transfer functions that include tipper:
>>> df = tf_table.to_dataframe() >>> df[df.has_tipper]
- summarize() None[source]
Populate the summary table by traversing the HDF5 hierarchy.
Searches for groups where
mth5_typeequals'transferfunction'and adds a row indicating available datasets (impedance, tipper, covariance), period min/max, and relevant references.- Return type:
None
Examples
Refresh the TF summary:
>>> tf_table.clear_table() >>> tf_table.summarize()