mth5.groups package

Subpackages

mth5.groups.filter_groups package

Submodules

mth5.groups.base module

Base Group Class

Contains all the base functions that will be used by group classes.

Created on Fri May 29 15:09:48 2020

copyright:: Jared Peacock (jpeacock@usgs.gov)
license:: MIT

class mth5.groups.base.BaseGroup(group: Group | Dataset, group_metadata: MetadataBase | None = None, **kwargs: Any)[source]

Bases: object

Base class for HDF5 group management with metadata handling.

Provides core functionality for reading, writing, and managing HDF5 groups with integrated metadata validation using mt_metadata standards.

Parameters:

group (h5py.Group or h5py.Dataset) – HDF5 group or dataset object to wrap.
group_metadata (MetadataBase, optional) – Metadata container with validated attributes. Default is None.
**kwargs (dict) – Additional keyword arguments to set as instance attributes.

hdf5_group

Weak reference to the underlying HDF5 group.

Type:: h5py.Group or h5py.Dataset

metadata[source]

Metadata object with validation and standards compliance.

Type:: MetadataBase

logger[source]

Logger instance for tracking operations.

Type:: loguru.Logger

compression[source]

HDF5 compression method (e.g., ‘gzip’).

Type:: str, optional

compression_opts[source]

Compression options/level.

Type:: int, optional

shuffle[source]

Enable HDF5 shuffle filter. Default is False.

Type:: bool

fletcher32[source]

Enable HDF5 Fletcher32 checksum. Default is False.

Type:: bool

Notes

All HDF5 group references are weak references to prevent lingering file references after the group is closed.
Metadata changes should be written using write_metadata() method.
This is a base class inherited by more specific group types like SurveyGroup, StationGroup, RunGroup, etc.

Examples

Create and manage a group with metadata

>>> import h5py
>>> with h5py.File('data.h5', 'r+') as f:
...     group = f.create_group('MyGroup')
...     base_obj = BaseGroup(group)
...     print(base_obj)
...     # Set and write metadata
...     base_obj.metadata.id = 'MyGroup'
...     base_obj.write_metadata()

Access metadata and group structure

>>> print(base_obj.metadata.id)
'MyGroup'
>>> print(base_obj.groups_list)
['subgroup1', 'subgroup2']
>>> print(base_obj.hdf5_group.ref)  # Get HDF5 reference
<HDF5 Group Reference>

property dataset_options: dict[str, Any][source]

Get the HDF5 dataset creation options.

Returns:: Dictionary containing compression, shuffle, and checksum settings.
Return type:: dict

Examples

>>> options = base_obj.dataset_options
>>> print(options)
{'compression': 'gzip', 'compression_opts': 4,
 'shuffle': True, 'fletcher32': False}

property groups_list: list[str][source]

Get list of all subgroup names in the HDF5 group.

Returns:: Names of all subgroups and datasets.
Return type:: list of str

Examples

>>> print(base_obj.groups_list)
['Station_001', 'Station_002', 'metadata']

initialize_group(**kwargs: Any) → None[source]

Initialize group by setting attributes and writing metadata.

Convenience method that sets keyword arguments as instance attributes and writes all metadata to the HDF5 file.

Parameters:: **kwargs (dict) – Key-value pairs to set as instance attributes.

Examples

Initialize with compression settings

>>> base_obj.initialize_group(
...     compression='gzip',
...     compression_opts=4,
...     shuffle=True
... )

property metadata: MetadataBase[source]

Get metadata object with lazy loading from HDF5 attributes.

Returns:: Metadata container with all attributes and validation.
Return type:: MetadataBase

Notes

Metadata is loaded on first access and cached for subsequent accesses.

Examples

>>> meta = base_obj.metadata
>>> print(meta.id)
'MyGroup'
>>> print(meta.mth5_type)
'Survey'

read_metadata() → None[source]

Read metadata from HDF5 group attributes into metadata object.

Loads all HDF5 attributes and converts them to appropriate Python types before populating the metadata object with validation.

Notes

This method is called automatically on first metadata access if metadata has not been read yet. Empty attributes are skipped with a debug message.

Examples

Manually read metadata after file changes

>>> base_obj.read_metadata()
>>> print(base_obj.metadata.id)
'MyGroup'

Check what attributes were read

>>> base_obj.read_metadata()
>>> attrs = list(base_obj.metadata.to_dict().keys())
>>> print(f"Attributes: {attrs}")
Attributes: ['id', 'comments', 'provenance']

rename_group(new_name: str) → None[source]

Rename the current group in the HDF5 file.

Parameters:: new_name (str) – New name for the group. Will be validated and normalized.
Raises:: MTH5Error – If renaming fails due to read-only mode or other issues.

Examples

Rename a group

>>> print(survey_obj.hdf5_group.name)
'/OldSurveyName'
>>> survey_obj.rename_group('NewSurveyName')
>>> print(survey_obj.hdf5_group.name)
'/NewSurveyName'

write_metadata() → None[source]

Write metadata from object to HDF5 group attributes.

Converts metadata values to numpy-compatible types before writing to HDF5 attributes. Handles read-only mode gracefully with warnings.

Raises:

KeyError – If HDF5 write fails for reasons other than read-only mode.
ValueError – If synchronous group creation fails for reasons other than read-only mode.

Notes

Keys that already exist are overwritten.
Read-only files will log a warning instead of raising an error.
This method should be called after any metadata changes.

Examples

Update metadata and write to file

>>> base_obj.metadata.id = 'UpdatedGroup'
>>> base_obj.metadata.comments = 'New comments'
>>> base_obj.write_metadata()

Verify write by reloading

>>> base_obj._has_read_metadata = False
>>> base_obj.read_metadata()
>>> print(base_obj.metadata.id)
'UpdatedGroup'

mth5.groups.channel_dataset module

Created on Sat May 27 10:03:23 2023

@author: jpeacock

class mth5.groups.channel_dataset.AuxiliaryDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for auxiliary channel data.

Inherits all functionality from ChannelDataset with auxiliary channel specific metadata handling. Used for temperature, battery voltage, etc.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing auxiliary data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> temp_dataset = run_group.get_channel('Temperature')
>>> print(type(temp_dataset))
<class 'mth5.groups.channel_dataset.AuxiliaryDataset'>
>>> print(temp_dataset.metadata.type)
'auxiliary'
>>> print(temp_dataset.metadata.units)
'celsius'

class mth5.groups.channel_dataset.ChannelDataset(dataset: Dataset | None, dataset_metadata: MetadataBase | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

A container for channel time series data stored in HDF5 format.

This class provides a flexible interface to work with magnetotelluric channel data, allowing conversion to various formats (xarray, pandas, numpy) while maintaining metadata integrity.

Parameters:

dataset (h5py.Dataset or None) – HDF5 dataset object containing the channel time series data.
dataset_metadata (MetadataBase, optional) – Metadata container for Electric, Magnetic, or Auxiliary channel types. Default is None.
write_metadata (bool, optional) – Whether to write metadata to the HDF5 dataset on initialization. Default is True.
**kwargs (dict) – Additional keyword arguments to set as instance attributes.

hdf5_dataset

Weak reference to the underlying HDF5 dataset.

Type:: h5py.Dataset

metadata[source]

Channel metadata object with validation.

Type:: MetadataBase

logger[source]

Logger instance for tracking operations.

Type:: loguru.Logger

Raises:: MTH5Error – If the dataset is not of the correct type or metadata validation fails.

See also

ElectricDataset: Specialized container for electric field channels.
MagneticDataset: Specialized container for magnetic field channels.
AuxiliaryDataset: Specialized container for auxiliary channels.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')
>>> channel = run.get_channel('Ex')
>>> channel
Channel Electric:
-------------------
  component:        Ex
  data type:        electric
  data format:      float32
  data shape:       (4096,)
  start:            1980-01-01T00:00:00+00:00
  end:              1980-01-01T00:00:01+00:00
  sample rate:      4096

Access time series data

>>> ts_data = channel.to_channel_ts()
>>> print(f"Mean: {ts_data.ts.mean():.2f}, Std: {ts_data.ts.std():.2f}")

Convert to xarray for time-based indexing

>>> xr_data = channel.to_xarray()
>>> subset = xr_data.sel(time=slice('1980-01-01T00:00:00', '1980-01-01T00:00:10'))

property channel_entry: ndarray[source]

Create a structured array entry for channel summary tables.

Returns:: Structured array with dtype=CHANNEL_DTYPE containing channel metadata and HDF5 references for survey-wide summaries.
Return type:: np.ndarray

Notes

This entry includes survey ID, station ID, run ID, location, component, time period, sample rate, and HDF5 references for navigation.

Examples

>>> entry = channel.channel_entry
>>> print(entry['component'][0])
'Ex'
>>> print(entry['sample_rate'][0])
256.0
>>> print(entry['station'][0])
'MT001'

property channel_response: ChannelResponse[source]

Get the complete channel response from applied filters.

Constructs a ChannelResponse object by retrieving all filters referenced in the channel metadata from the survey’s Filters group.

Returns:: Channel response object containing all applied filters in sequence.
Return type:: ChannelResponse

Notes

Filters are applied in the order specified by their sequence_number. Filter names are normalized by replacing ‘/’ with ‘ per ‘ and converting to lowercase.

Examples

>>> response = channel.channel_response
>>> print(f"Number of filters: {len(response.filters_list)}")
Number of filters: 3
>>> for filt in response.filters_list:
...     print(f"{filt.name}: {filt.type}")
zpk: zpk
coefficient: coefficient
time delay: time_delay

property end: MTime[source]

Calculate the end time based on start time, sample rate, and number of samples.

Returns:: Calculated end time of the data.
Return type:: MTime

Notes

End time is calculated as: start + (n_samples - 1) / sample_rate The -1 ensures the last sample falls exactly at the end time.

Examples

>>> print(f"Duration: {channel.end - channel.start} seconds")
Duration: 3600.0 seconds
>>> print(channel.end.iso_str)
'1980-01-01T01:00:00.000000+00:00'

Extend or prepend data to the existing dataset with gap handling.

Intelligently adds new data before, after, or within the existing time series. Handles time alignment, overlaps, and gaps with configurable fill strategies.

Parameters:

new_data_array (np.ndarray) – New data array with shape (npts,).
start_time (str or MTime) – Start time of the new data array in UTC.
sample_rate (float) – Sample rate of the new data array in Hz. Must match existing sample rate.
fill (str, float, int, or None, optional) –
Strategy for filling data gaps:
- None : Raise MTH5Error if gap exists (default)
- ’mean’ : Fill with mean of both datasets within fill_window
- ’median’ : Fill with median of both datasets within fill_window
- ’nan’ : Fill with NaN values
- numeric value : Fill with specified constant
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Exceeding this raises MTH5Error. Default is 1 second.
fill_window (int, optional) – Number of points from each dataset edge to estimate fill values. Default is 10 points.

Raises:

MTH5Error – If sample rates don’t match, gap exceeds max_gap_seconds, or fill strategy is invalid.
TypeError – If new_data_array cannot be converted to numpy array.

Notes

Prepend: New data start < existing start
Append: New data start > existing end
Overwrite: New data overlaps existing data

The dataset is automatically resized to accommodate new data.

Examples

Append data with a small gap

>>> ex = mth5_obj.get_channel('MT001', 'MT001a', 'Ex')
>>> print(f"Original: {ex.n_samples} samples, ends {ex.end}")
Original: 4096 samples, ends 2015-01-08T19:32:09.500000+00:00
>>> new_data = np.random.randn(4096)
>>> new_start = (ex.end + 0.5).isoformat()  # 0.5s gap
>>> ex.extend_dataset(new_data, new_start, ex.sample_rate,
...                   fill='median', max_gap_seconds=2)
>>> print(f"Extended: {ex.n_samples} samples, ends {ex.end}")
Extended: 8200 samples, ends 2015-01-08T19:40:42.500000+00:00

Prepend data seamlessly

>>> prepend_data = np.random.randn(2048)
>>> prepend_start = (ex.start - 2048/ex.sample_rate).isoformat()
>>> ex.extend_dataset(prepend_data, prepend_start, ex.sample_rate)
>>> print(f"New start: {ex.start}")

Overwrite section of existing data

>>> replacement_data = np.zeros(1024)
>>> replace_start = (ex.start + 1.0).isoformat()  # 1s after start
>>> ex.extend_dataset(replacement_data, replace_start, ex.sample_rate)

from_channel_ts(channel_ts_obj: ChannelTS, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) → None[source]

Populate the dataset from a ChannelTS object.

Parameters:

channel_ts_obj (ChannelTS) – Time series object containing data and metadata.
how ({'replace', 'extend'}, optional) –
Method for adding data:
- ’replace’ : Replace entire dataset (default)
- ’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
- None : Raise error on gaps (default)
- ’mean’ : Fill with mean of both datasets
- ’median’ : Fill with median of both datasets
- ’nan’ : Fill with NaN
- numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.

Raises:

TypeError – If channel_ts_obj is not a ChannelTS instance.
MTH5Error – If time alignment or metadata validation fails.

Examples

Replace entire dataset

>>> from mt_timeseries import ChannelTS
>>> import numpy as np
>>> ts = ChannelTS(
...     channel_type='electric',
...     data=np.random.randn(1000),
...     channel_metadata={'electric': {
...         'component': 'ex',
...         'sample_rate': 256.0
...     }}
... )
>>> channel.from_channel_ts(ts, how='replace')
>>> print(channel.n_samples)
1000

Extend existing dataset

>>> new_ts = ChannelTS(
...     channel_type='electric',
...     data=np.random.randn(500),
...     channel_metadata={'electric': {
...         'component': 'ex',
...         'sample_rate': 256.0,
...         'time_period.start': channel.end.isoformat()
...     }}
... )
>>> channel.from_channel_ts(new_ts, how='extend', fill='median')
>>> print(channel.n_samples)
1500

from_xarray(data_array: DataArray, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) → None[source]

Populate the dataset from an xarray DataArray.

Parameters:

data_array (xr.DataArray) – DataArray with time coordinate and metadata in attrs.
how ({'replace', 'extend'}, optional) –
Method for adding data:
- ’replace’ : Replace entire dataset (default)
- ’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
- None : Raise error on gaps (default)
- ’mean’ : Fill with mean of both datasets
- ’median’ : Fill with median of both datasets
- ’nan’ : Fill with NaN
- numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.

Raises:

TypeError – If data_array is not an xarray.DataArray.
MTH5Error – If time alignment fails.

Examples

Replace from xarray

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> time = pd.date_range('2020-01-01', periods=1000, freq='0.004S')
>>> data = xr.DataArray(
...     np.random.randn(1000),
...     coords=[('time', time)],
...     attrs={'component': 'ex', 'sample_rate': 256.0}
... )
>>> channel.from_xarray(data, how='replace')
>>> print(channel.n_samples)
1000

Extend from xarray with gap

>>> time2 = pd.date_range('2020-01-01T00:00:05', periods=500, freq='0.004S')
>>> data2 = xr.DataArray(np.random.randn(500), coords=[('time', time2)])
>>> channel.from_xarray(data2, how='extend', fill='mean')

get_index_from_end_time(given_time: str | MTime) → int[source]

Get the end index value (inclusive) for a given time.

Parameters:: given_time (str or MTime) – Time to convert to end index.
Returns:: Array index + 1 for inclusive slicing.
Return type:: int

Notes

Adds 1 to the calculated index to make it suitable for inclusive end slicing (e.g., array[start:end]).

Examples

>>> end_idx = channel.get_index_from_end_time('1980-01-01T00:00:10')
>>> data_slice = channel.hdf5_dataset[0:end_idx]
>>> # Includes sample at exactly 10 seconds

get_index_from_time(given_time: str | MTime) → int[source]

Calculate the array index for a given time.

Parameters:: given_time (str or MTime) – Time to convert to index.
Returns:: Array index corresponding to the given time.
Return type:: int

Notes

Index is calculated as: (time - start_time) * sample_rate and rounded to nearest integer.

Examples

>>> idx = channel.get_index_from_time('1980-01-01T00:00:10')
>>> print(f"Index for 10 seconds: {idx}")
Index for 10 seconds: 2560
>>> # With 256 Hz sample rate: 10 * 256 = 2560

>>> start_idx = channel.get_index_from_time(channel.start)
>>> print(start_idx)
0

has_data() → bool[source]

Check if the channel contains non-zero data.

Returns:: True if dataset has non-zero values, False if all zeros or empty.
Return type:: bool

Examples

>>> if channel.has_data():
...     print("Channel has valid data")
... else:
...     print("Channel is empty or all zeros")
Channel has valid data

>>> empty_channel.has_data()
False

property n_samples: int[source]

Get the total number of samples in the dataset.

Returns:: Number of data points in the time series.
Return type:: int

Examples

>>> print(f"Total samples: {channel.n_samples:,}")
Total samples: 921,600
>>> duration = channel.n_samples / channel.sample_rate
>>> print(f"Duration: {duration/3600:.1f} hours")
Duration: 1.0 hours

read_metadata() → None[source]

Read metadata from HDF5 attributes into the metadata container.

Loads all HDF5 attributes from the dataset and converts them to the appropriate Python types before populating the metadata object.

For older MTH5 files, this method attempts to coerce values to the expected types based on the metadata schema to maintain backwards compatibility.

Notes

This method automatically validates metadata through the metadata container’s validators. Type coercion is applied to handle older file formats that may have stored metadata with different types.

Examples

>>> channel.read_metadata()
>>> print(channel.metadata.component)
'Ex'
>>> print(channel.metadata.sample_rate)
256.0

Handles type coercion for older files

>>> # If sample_rate was stored as string '256.0' in old file
>>> channel.read_metadata()
>>> print(type(channel.metadata.sample_rate))
<class 'float'>

replace_dataset(new_data_array: ndarray) → None[source]

Replace the entire dataset with new data.

Parameters:: new_data_array (np.ndarray) – New data array with shape (npts,). Must be 1-dimensional.
Raises:: TypeError – If new_data_array cannot be converted to numpy array.

Notes

The HDF5 dataset will be resized if the new array has a different shape. All existing data will be overwritten.

Examples

Replace with synthetic data

>>> import numpy as np
>>> new_data = np.sin(2 * np.pi * 1.0 * np.linspace(0, 10, 2560))
>>> channel.replace_dataset(new_data)
>>> print(f"New shape: {channel.hdf5_dataset.shape}")
New shape: (2560,)

Replace with processed data

>>> original = channel.hdf5_dataset[:]
>>> filtered = np.convolve(original, np.ones(5)/5, mode='same')
>>> channel.replace_dataset(filtered)

property run_metadata: Run[source]

Get the run-level metadata containing this channel.

Returns:: Run metadata object with channel information included.
Return type:: metadata.Run

Examples

>>> run_meta = channel.run_metadata
>>> print(run_meta.id)
'MT001a'
>>> print(run_meta.channels_recorded_electric)
['Ex', 'Ey']

property sample_rate: float[source]

Get the sample rate in samples per second.

Returns:: Sample rate in Hz.
Return type:: float

Examples

>>> print(f"Sample rate: {channel.sample_rate} Hz")
Sample rate: 256.0 Hz

property start: MTime[source]

Get the start time of the channel data.

Returns:: Start time from metadata.time_period.start.
Return type:: MTime

Examples

>>> print(channel.start)
1980-01-01T00:00:00+00:00
>>> print(channel.start.iso_str)
'1980-01-01T00:00:00.000000+00:00'

property station_metadata: Station[source]

Get the station-level metadata containing this channel.

Returns:: Station metadata object with run and channel information.
Return type:: metadata.Station

Examples

>>> station_meta = channel.station_metadata
>>> print(f"{station_meta.id}: {station_meta.location.latitude}, {station_meta.location.longitude}")
'MT001: 40.5, -112.3'

property survey_id: str[source]

Get the survey identifier.

Returns:: Survey ID string.
Return type:: str

Examples

>>> print(channel.survey_id)
'MT_Survey_2023'

property survey_metadata: Survey[source]

Get the survey-level metadata containing this channel.

Returns:: Complete survey metadata hierarchy including this channel.
Return type:: metadata.Survey

Examples

>>> survey_meta = channel.survey_metadata
>>> print(survey_meta.id)
'MT Survey 2023'
>>> print(f"Stations: {len(survey_meta.stations)}")
Stations: 15

property time_index: DatetimeIndex[source]

Create a time index for the dataset based on metadata.

Returns:: Pandas datetime index spanning the entire dataset.
Return type:: pd.DatetimeIndex

Notes

The time index is useful for time-based queries and slicing operations. It is generated dynamically from start time, sample rate, and number of samples.

Examples

>>> time_idx = channel.time_index
>>> print(time_idx[0], time_idx[-1])
1980-01-01 00:00:00 1980-01-01 00:59:59.996093750
>>> print(f"Index length: {len(time_idx)}")
Index length: 921600

Extract a time slice from the channel dataset.

Parameters:

start (str or MTime) – Start time of the slice in UTC.
end (str or MTime, optional) – End time of the slice. Mutually exclusive with n_samples.
n_samples (int, optional) – Number of samples to extract. Mutually exclusive with end.
return_type ({'channel_ts', 'xarray', 'pandas', 'numpy'}, optional) – Format for returned data. Default is ‘channel_ts’.

Returns:

Time slice in the requested format with appropriate metadata.

Return type:

ChannelTS or xr.DataArray or pd.DataFrame or np.ndarray

Raises:

ValueError – If both end and n_samples are provided or neither is provided.

Notes

If the requested slice extends beyond available data, it will be automatically truncated with a warning.
Regional HDF5 references are used when possible for efficiency.

Examples

Extract by number of samples

>>> ex = mth5_obj.get_channel('FL001', 'FL001a', 'Ex')
>>> ex_slice = ex.time_slice("2015-01-08T19:49:15", n_samples=4096)
>>> print(type(ex_slice))
<class 'mt_timeseries.channel_ts.ChannelTS'>
>>> print(f"Slice shape: {ex_slice.ts.shape}")
Slice shape: (4096,)
>>> ex_slice.plot()

Extract by time range

>>> ex_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     end="2015-01-08T20:49:15"
... )
>>> print(f"Duration: {ex_slice.end - ex_slice.start} seconds")
Duration: 3600.0 seconds

Return as xarray for analysis

>>> xr_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=1000,
...     return_type='xarray'
... )
>>> print(xr_slice.mean().values)
0.152
>>> xr_slice.plot()

Return as pandas for tabular ops

>>> df_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=500,
...     return_type='pandas'
... )
>>> df_slice['data'].describe()
>>> df_slice.resample('10S').mean()

Return as numpy for computation

>>> np_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=100,
...     return_type='numpy'
... )
>>> np.fft.fft(np_slice)

to_channel_ts() → ChannelTS[source]

Convert the dataset to a ChannelTS object with full metadata.

Returns:: Time series object with data, metadata, and channel response.
Return type:: ChannelTS

Notes

Data is loaded into memory. The resulting ChannelTS object is independent of the HDF5 file and can be modified without affecting the original dataset.

Examples

>>> ts = channel.to_channel_ts()
>>> print(f"Type: {type(ts)}")
Type: <class 'mt_timeseries.channel_ts.ChannelTS'>
>>> print(f"Shape: {ts.ts.shape}, Mean: {ts.ts.mean():.2f}")
Shape: (4096,), Mean: 0.15

Process the time series

>>> filtered_ts = ts.low_pass_filter(cutoff=10.0)
>>> detrended_ts = ts.detrend('linear')
>>> ts.plot()

to_dataframe() → DataFrame[source]

Convert the dataset to a pandas DataFrame with time index.

Returns:: DataFrame with ‘data’ column and time index. Metadata stored in attrs.
Return type:: pd.DataFrame

Notes

Data is loaded into memory. Metadata is stored in the experimental attrs attribute and will not be validated if modified.

Examples

>>> df = channel.to_dataframe()
>>> print(df.head())
                     data
time
1980-01-01 00:00:00  0.931
1980-01-01 00:00:00  0.142
...

Use pandas operations

>>> df['data'].describe()
>>> df.resample('1H').mean()
>>> df.plot(y='data', figsize=(12, 4))

Access metadata

>>> print(df.attrs['component'])
'Ex'
>>> print(df.attrs['sample_rate'])
256.0

to_numpy() → recarray[source]

Convert the dataset to a numpy structured array with time and data columns.

Returns:: Record array with ‘time’ and ‘channel_data’ fields.
Return type:: np.recarray

Notes

Data is loaded into memory. The ‘data’ name is avoided as it’s a builtin to numpy.

Examples

>>> arr = channel.to_numpy()
>>> print(arr.dtype.names)
('time', 'channel_data')
>>> print(arr['time'][0])
1980-01-01T00:00:00.000000000
>>> print(arr['channel_data'].mean())
0.152

Access fields

>>> times = arr['time']
>>> data = arr['channel_data']
>>> import matplotlib.pyplot as plt
>>> plt.plot(times, data)

to_xarray() → DataArray[source]

Convert the dataset to an xarray DataArray with time coordinates.

Returns:: DataArray with time index and metadata as attributes.
Return type:: xr.DataArray

Notes

Data is loaded into memory. Metadata is stored in the attrs dictionary and will not be validated if modified.

Examples

>>> xr_data = channel.to_xarray()
>>> print(xr_data)
<xarray.DataArray (time: 4096)>
array([0.931, 0.142, ..., 0.882])
Coordinates:
  * time     (time) datetime64[ns] 1980-01-01 ... 1980-01-01T00:00:15.996
.. attribute:: component

Ex

sample_rate[source]: 256.0

...

Use xarray’s powerful selection

>>> morning = xr_data.sel(time=slice('1980-01-01T06:00', '1980-01-01T12:00'))
>>> daily_mean = xr_data.resample(time='1D').mean()
>>> xr_data.plot()

write_metadata() → None[source]

Write metadata from the container to HDF5 dataset attributes.

Converts all metadata values to numpy-compatible types before writing to HDF5 attributes. Falls back to string conversion if direct conversion fails.

Notes

This method is automatically called during initialization and when metadata is updated.

Examples

>>> channel.metadata.component = 'Ey'
>>> channel.metadata.measurement_azimuth = 90.0
>>> channel.write_metadata()

class mth5.groups.channel_dataset.ElectricDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for electric field channel data.

Inherits all functionality from ChannelDataset with electric field specific metadata handling.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing electric field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> ex_dataset = run_group.get_channel('Ex')
>>> print(type(ex_dataset))
<class 'mth5.groups.channel_dataset.ElectricDataset'>
>>> print(ex_dataset.metadata.type)
'electric'
>>> print(ex_dataset.metadata.units)
'mV/km'

class mth5.groups.channel_dataset.MagneticDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for magnetic field channel data.

Inherits all functionality from ChannelDataset with magnetic field specific metadata handling.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing magnetic field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> hx_dataset = run_group.get_channel('Hx')
>>> print(type(hx_dataset))
<class 'mth5.groups.channel_dataset.MagneticDataset'>
>>> print(hx_dataset.metadata.type)
'magnetic'
>>> print(hx_dataset.metadata.units)
'nT'

mth5.groups.estimate_dataset module

Created on Thu Mar 10 09:02:16 2022

@author: jpeacock

class mth5.groups.estimate_dataset.EstimateDataset(dataset: Dataset, dataset_metadata: StatisticalEstimate | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

Container for statistical estimates of transfer functions.

This class holds multi-dimensional statistical estimates for transfer functions with full metadata management. Estimates are stored as HDF5 datasets with dimensions for period, output channels, and input channels.

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the statistical estimate data.
dataset_metadata (mt_metadata.transfer_functions.tf.StatisticalEstimate, optional) – Metadata object for the estimate. If provided and write_metadata is True, the metadata will be written to the HDF5 attributes. Defaults to None.
write_metadata (bool, optional) – If True, write metadata to the HDF5 dataset attributes. Defaults to True.
**kwargs (Any) – Additional keyword arguments (reserved for future use).

hdf5_dataset

Weak reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata[source]

Metadata container for the estimate.

Type:: StatisticalEstimate

logger[source]

Logger instance for reporting messages.

Type:: loguru.logger

Raises:

MTH5Error – If dataset_metadata is provided but is not of type StatisticalEstimate or a compatible metadata class.
TypeError – If input data cannot be converted to numpy array or has wrong dtype/shape.

Notes

The estimate data is stored in 3D form with shape: (n_periods, n_output_channels, n_input_channels)

Metadata is automatically synchronized between the pydantic model and HDF5 attributes on initialization and after any modifications.

Examples

Create an estimate dataset from an HDF5 group:

>>> import h5py
>>> import numpy as np
>>> from mt_metadata.transfer_functions.tf.statistical_estimate import StatisticalEstimate
>>> # Create HDF5 file with estimate dataset
>>> with h5py.File('estimate.h5', 'w') as f:
...     # Create dataset with shape (10 periods, 2 outputs, 2 inputs)
...     data = np.random.rand(10, 2, 2)
...     dset = f.create_dataset('estimate', data=data)
...     # Create EstimateDataset
...     est = EstimateDataset(dset, write_metadata=True)

Convert estimate to xarray and back:

>>> periods = np.logspace(-3, 3, 10)  # 10 periods from 1e-3 to 1e3 s
>>> xr_data = est.to_xarray(periods)
>>> # Modify xarray coordinates
>>> new_xr = xr_data.rename({'output': 'new_output', 'input': 'new_input'})
>>> est.from_xarray(new_xr)  # Load modified data back

Access estimate data in different formats:

>>> # Get numpy array
>>> np_data = est.to_numpy()
>>> print(np_data.shape)  # (10, 2, 2)
>>> # Get xarray with proper coordinates
>>> xr_data = est.to_xarray(periods)
>>> print(xr_data.dims)  # ('period', 'output', 'input')

from_numpy(new_estimate: ndarray) → None[source]

Load estimate data from numpy array.

Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.

Parameters:: new_estimate (np.ndarray) – Estimate data to load. Must be convertible to numpy array. Preferred shape: (n_periods, n_output_channels, n_input_channels).
Return type:: None
Raises:: TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.

Notes

‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match.

Examples

Load estimate from numpy array:

>>> import numpy as np
>>> new_data = np.random.rand(5, 2, 2)
>>> est.from_numpy(new_data)
>>> print(est.to_numpy().shape)
(5, 2, 2)

Load with automatic dtype conversion:

>>> float_data = np.array([[[1.0, 2.0]]], dtype=np.float64)
>>> est.from_numpy(float_data)

from_xarray(data: DataArray) → None[source]

Load estimate data from xarray DataArray.

Updates metadata from xarray coordinates and attributes, then stores the data.

Parameters:: data (xr.DataArray) – DataArray containing estimate. Expected dimensions: (period, output, input).
Return type:: None

Notes

This will update output_channels, input_channels, name, and data_type from the xarray object. All changes are persisted to HDF5.

Examples

Load estimate from modified xarray:

>>> xr_data = est.to_xarray(periods)
>>> # Modify data and metadata
>>> modified = xr_data * 2  # Scale by 2
>>> est.from_xarray(modified)
>>> print(est.to_numpy()[0, 0, 0])  # Verify scale

Rename channels and reload:

>>> xr_data = est.to_xarray(periods)
>>> new_xr = xr_data.rename({
...     'output': ['Ex', 'Ey'],
...     'input': ['Bx', 'By']
... })
>>> est.from_xarray(new_xr)
>>> print(est.metadata.output_channels)
['Ex', 'Ey']

read_metadata() → None[source]

Read metadata from HDF5 attributes into metadata container.

Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.

Return type:: None

Notes

This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.

Examples

Reload metadata from HDF5 after external modification:

>>> # Metadata was modified in HDF5
>>> est.read_metadata()  # Reload changes
>>> print(est.metadata.name)  # Access updated name

replace_dataset(new_data_array: ndarray) → None[source]

Replace entire dataset with new data.

Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.

Parameters:: new_data_array (np.ndarray) – New estimate data to store. Should have shape (n_periods, n_output_channels, n_input_channels).
Return type:: None
Raises:: TypeError – If input cannot be converted to numpy array.

Notes

If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.

Examples

Replace estimate with new data:

>>> import numpy as np
>>> new_estimate = np.random.rand(10, 2, 2)  # 10 periods, 2 channels
>>> est.replace_dataset(new_estimate)
>>> print(est.to_numpy().shape)
(10, 2, 2)

Replace with data from list (auto-converted to array):

>>> data_list = [[[1, 2], [3, 4]]] * 5  # 5 periods
>>> est.replace_dataset(data_list)
>>> est.to_numpy().shape
(5, 2, 2)

to_numpy() → ndarray[source]

Convert estimate to numpy array.

Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.

Returns:: 3D array with shape (n_periods, n_output_channels, n_input_channels).
Return type:: np.ndarray

Notes

For large estimates, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access.

Examples

Get full estimate as numpy array:

>>> data = est.to_numpy()
>>> print(data.shape)
(10, 2, 2)
>>> print(data.dtype)
float64

Access specific period and channels:

>>> data = est.to_numpy()
>>> # Get first 5 periods, output channel 0, input channel 1
>>> subset = data[:5, 0, 1]
>>> print(subset.shape)
(5,)

to_xarray(period: ndarray | list) → DataArray[source]

Convert estimate to xarray DataArray.

Creates an xarray DataArray with proper coordinates for periods, output channels, and input channels. Includes metadata as attributes.

Parameters:: period (np.ndarray | list) – Period values for coordinate. Should have length equal to estimate first dimension (n_periods).
Returns:: DataArray with dimensions (period, output, input) and coordinates from metadata.
Return type:: xr.DataArray

Notes

Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.

Examples

Convert to xarray with logarithmic period spacing:

>>> import numpy as np
>>> periods = np.logspace(-2, 3, 10)  # 10 periods from 0.01 to 1000
>>> xr_data = est.to_xarray(periods)
>>> print(xr_data.dims)
('period', 'output', 'input')
>>> print(xr_data.coords['period'].values)
[1.00e-02 3.16e-02 ... 1.00e+03]

Select data by period range:

>>> subset = xr_data.sel(period=slice(0.1, 100))
>>> print(subset.shape)
(8, 2, 2)

write_metadata() → None[source]

Write metadata from container to HDF5 dataset attributes.

Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility.

Return type:: None

Notes

All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates.

Examples

Save updated metadata to HDF5:

>>> est.metadata.name = "Updated Estimate"
>>> est.write_metadata()  # Persist to file
>>> # Verify write
>>> print(est.hdf5_dataset.attrs['name'])
b'Updated Estimate'

mth5.groups.experiment module

Created on Wed Dec 23 16:59:45 2020

copyright:: Jared Peacock (jpeacock@usgs.gov)
license:: MIT

class mth5.groups.experiment.ExperimentGroup(group, **kwargs)[source]

Bases: BaseGroup

Utility class to hold general information about the experiment and accompanying metadata for an MT experiment.

To access the hdf5 group directly use ExperimentGroup.hdf5_group.

>>> experiment = ExperimentGroup(hdf5_group)
>>> experiment.hdf5_group.ref
<HDF5 Group Reference>

Note

All attributes should be input into the metadata object, that way all input will be validated against the metadata standards. If you change attributes in metadata object, you should run the ExperimentGroup.write_metadata() method. This is a temporary solution, working on an automatic updater if metadata is changed.

>>> experiment.metadata.existing_attribute = 'update_existing_attribute'
>>> experiment.write_metadata()

If you want to add a new attribute this should be done using the metadata.add_base_attribute method.

>>> experiment.metadata.add_base_attribute('new_attribute',
>>> ...                                'new_attribute_value',
>>> ...                                {'type':str,
>>> ...                                 'required':True,
>>> ...                                 'style':'free form',
>>> ...                                 'description': 'new attribute desc.',
>>> ...                                 'units':None,
>>> ...                                 'options':[],
>>> ...                                 'alias':[],
>>> ...                                 'example':'new attribute

Tip

If you want ot add stations, reports, etc to the experiment this should be done from the MTH5 object. This is to avoid duplication, at least for now.

To look at what the structure of /Experiment looks like:

>>> experiment
/Experiment:
====================
    |- Group: Surveys
    -----------------
    |- Group: Reports
    -----------------
    |- Group: Standards
    -------------------
    |- Group: Stations
    ------------------

property metadata[source]: Overwrite get metadata to include station information

property surveys_group[source]

mth5.groups.fc_dataset module

Created on Thu Mar 10 09:02:16 2022

@author: jpeacock

class mth5.groups.fc_dataset.FCChannelDataset(dataset: Dataset, dataset_metadata: FCChannel | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

Container for Fourier coefficients (FC) from windowed FFT analysis.

Holds multi-dimensional Fourier coefficient data representing time-frequency analysis results. Data is uniformly sampled in both frequency (via harmonic index) and time (via uniform FFT window step size).

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the Fourier coefficient data.
dataset_metadata (FCChannel | None, optional) – Metadata object containing FC channel properties like start time, end time, sample rates, units, and frequency method. If provided, metadata will be written to HDF5 attributes. Defaults to None.
**kwargs (Any) – Additional keyword arguments (reserved for future use).

hdf5_dataset

Weak reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata[source]

Metadata container for the Fourier coefficients.

Type:: FCChannel

logger[source]

Logger instance for reporting messages.

Type:: loguru.logger

Raises:

MTH5Error – If dataset_metadata is provided but is not of type FCChannel.
TypeError – If input data cannot be converted to numpy array or has incompatible dtype/shape.

Notes

The data array has shape (n_windows, n_frequencies) where: - n_windows: Number of time windows in the FFT moving window analysis - n_frequencies: Number of frequency bins determined by window size

Data is typically complex-valued representing Fourier coefficients. Time windows are uniformly spaced with interval 1/sample_rate_window_step. Frequencies are uniformly spaced from frequency_min to frequency_max.

Metadata includes: - Time period (start and end) - Acquisition and decimated sample rates - Window sample rate (delta_t within window) - Units - Frequency method (integer harmonic index calculation) - Component name (channel designation)

Examples

Create an FC dataset from HDF5 group:

>>> import h5py
>>> import numpy as np
>>> from mt_metadata.processing.fourier_coefficients import FCChannel
>>> with h5py.File('fc.h5', 'w') as f:
...     # Create 2D array: 50 time windows, 256 frequencies
...     data = np.random.rand(50, 256) + 1j * np.random.rand(50, 256)
...     dset = f.create_dataset('Ex', data=data, dtype=np.complex128)
...     # Create FCChannelDataset
...     fc = FCChannelDataset(dset, write_metadata=True)

Convert to xarray and access time-frequency data:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)  # ('time', 'frequency')
>>> # Access data at specific time and frequency
>>> subset = xr_data.sel(time='2023-01-01T12:00:00', method='nearest')

Inspect properties:

>>> print(f"Windows: {fc.n_windows}, Frequencies: {fc.n_frequencies}")
>>> print(f"Frequency range: {fc.frequency.min():.2f}-{fc.frequency.max():.2f} Hz")

property frequency: ndarray[source]

Frequency array from metadata frequency bounds.

Generates uniformly spaced frequency coordinates based on the metadata frequency range and number of frequency bins.

Returns:: Array of frequency values, linearly spaced from frequency_min to frequency_max.
Return type:: np.ndarray

Notes

Frequencies represent harmonic indices or actual frequency values depending on the frequency method specified in metadata. Spacing is determined by n_frequencies bins over the range.

Examples

Access frequency array for frequency-based indexing:

>>> freq_array = fc.frequency
>>> print(freq_array.shape)  # (n_frequencies,)
>>> print(f"Frequency range: {freq_array.min():.2f} to {freq_array.max():.2f} Hz")
Frequency range: 0.00 to 64.00 Hz

from_numpy(new_estimate: ndarray) → None[source]

Load FC data from numpy array.

Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.

Parameters:: new_estimate (np.ndarray) – FC data to load. Should have shape (n_windows, n_frequencies). Typically complex-valued array.
Return type:: None
Raises:: TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.

Notes

‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match. Dtype compatibility is strictly enforced.

Examples

Load FC data from numpy array:

>>> import numpy as np
>>> new_data = np.random.rand(25, 128) + 1j * np.random.rand(25, 128)
>>> fc.from_numpy(new_data)
>>> print(fc.to_numpy().shape)
(25, 128)

Load with magnitude and phase separation:

>>> magnitude = np.random.rand(20, 256)
>>> phase = np.random.rand(20, 256) * 2 * np.pi
>>> fc_data = magnitude * np.exp(1j * phase)
>>> fc.from_numpy(fc_data)

from_xarray(data: DataArray, sample_rate_decimation_level: int | float) → None[source]

Load FC data from xarray DataArray.

Updates metadata from xarray coordinates and attributes, then stores the data. Computes frequency and time parameters from the provided xarray object.

Parameters:

data (xr.DataArray) – DataArray containing FC data. Expected dimensions: (time, frequency).
sample_rate_decimation_level (int | float) – Decimation level applied to original sample rate. Used to track processing history.

Return type:

None

Notes

This will update time_period (start/end), frequency bounds, window step rate, decimation level, component name, and units from the xarray object. All changes are persisted to HDF5.

Examples

Load FC data from modified xarray:

>>> xr_data = fc.to_xarray()
>>> # Modify data (e.g., apply filter)
>>> modified = xr_data * np.hamming(256)  # Apply frequency window
>>> fc.from_xarray(modified, sample_rate_decimation_level=4)
>>> print(fc.metadata.sample_rate_decimation_level)
4

Load with updated metadata from another analysis:

>>> import xarray as xr
>>> import pandas as pd
>>> time_coords = pd.date_range('2023-01-01', periods=30, freq='1H')
>>> freq_coords = np.arange(0, 128)
>>> new_fc = xr.DataArray(
...     data=np.random.rand(30, 128) + 1j * np.random.rand(30, 128),
...     coords={'time': time_coords, 'frequency': freq_coords},
...     dims=['time', 'frequency'],
...     name='Ey',
...     attrs={'units': 'mV/km'}
... )
>>> fc.from_xarray(new_fc, sample_rate_decimation_level=1)
>>> print(fc.metadata.component)
Ey

property n_frequencies: int[source]

Number of frequency bins in the Fourier analysis.

Returns:: Number of frequency bins (second dimension of data array).
Return type:: int

Notes

This corresponds to the number of columns in the 2D spectrogram data. Determined by the FFT window size and relates to the frequency resolution of the analysis.

Examples

>>> print(f"Frequency bins: {fc.n_frequencies}")
Frequency bins: 256

property n_windows: int[source]

Number of time windows in the FFT analysis.

Returns:: Number of time windows (first dimension of data array).
Return type:: int

Notes

This corresponds to the number of rows in the 2D spectrogram data. Each window represents a uniform time interval determined by the window step size (1/sample_rate_window_step).

Examples

>>> print(f"Time windows: {fc.n_windows}")
Time windows: 50

read_metadata() → None[source]

Read metadata from HDF5 attributes into metadata container.

Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.

Return type:: None

Notes

This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.

Examples

Reload metadata from HDF5 after external modification:

>>> # Metadata was modified in HDF5
>>> fc.read_metadata()  # Reload changes
>>> print(fc.metadata.component)  # Access updated component

replace_dataset(new_data_array: ndarray) → None[source]

Replace entire dataset with new data.

Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.

Parameters:: new_data_array (np.ndarray) – New FC data to store. Should have shape (n_windows, n_frequencies) and typically complex-valued.
Return type:: None
Raises:: TypeError – If input cannot be converted to numpy array.

Notes

If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.

Examples

Replace FC data with new analysis results:

>>> import numpy as np
>>> new_fc = np.random.rand(30, 256) + 1j * np.random.rand(30, 256)
>>> fc.replace_dataset(new_fc)
>>> print(fc.to_numpy().shape)
(30, 256)

Replace with data from list (auto-converted to array):

>>> data_list = [[[1+1j, 2+2j]], [[3+3j, 4+4j]]] * 15
>>> fc.replace_dataset(data_list)
>>> fc.to_numpy().shape
(30, 2)

property time: ndarray[source]

Time array including the start of each time window.

Generates uniformly spaced time coordinates based on the start time, window step rate, and number of windows. Uses metadata time period to determine bounds.

Returns:: Array of datetime64 values for each window start time.
Return type:: np.ndarray

Notes

Time coordinates are generated using make_dt_coordinates, which ensures consistency between specified start/end times and the number of windows.

Examples

Access time array for time-based indexing:

>>> time_array = fc.time
>>> print(time_array.shape)  # (n_windows,)
>>> print(time_array[0])  # First window time
2023-01-01T00:00:00.000000

to_numpy() → ndarray[source]

Convert FC data to numpy array.

Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.

Returns:: 2D complex array with shape (n_windows, n_frequencies).
Return type:: np.ndarray

Notes

For large spectrograms, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access to subsets.

Examples

Get full FC data as numpy array:

>>> data = fc.to_numpy()
>>> print(data.shape)
(50, 256)
>>> print(data.dtype)
complex128

Access specific time window and frequency:

>>> data = fc.to_numpy()
>>> # Get first 10 windows, frequency bin 100
>>> subset = data[:10, 100]
>>> print(subset.shape)
(10,)

to_xarray() → DataArray[source]

Convert FC data to xarray DataArray.

Creates an xarray DataArray with proper coordinates for time and frequency. Includes metadata as attributes.

Returns:: DataArray with dimensions (time, frequency) and coordinates from metadata and computed properties.
Return type:: xr.DataArray

Notes

Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.

Examples

Convert to xarray with automatic coordinates:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)
('time', 'frequency')
>>> print(xr_data.shape)
(50, 256)

Select data by time and frequency range:

>>> subset = xr_data.sel(
...     time=slice('2023-01-01T00:00:00', '2023-01-01T12:00:00'),
...     frequency=slice(0, 10)
... )
>>> print(subset.shape)  # Subset shape

write_metadata() → None[source]

Write metadata from container to HDF5 dataset attributes.

Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility. Always ensures ‘mth5_type’ attribute is set to ‘FCChannel’.

Return type:: None

Notes

All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates. Read-only files will silently skip writes.

Examples

Save updated metadata to HDF5:

>>> fc.metadata.component = "Ey"
>>> fc.write_metadata()  # Persist to file
>>> # Verify write
>>> print(fc.hdf5_dataset.attrs['component'])
b'Ey'

mth5.groups.feature_dataset module

Created on Thu Mar 10 09:02:16 2022

@author: jpeacock

class mth5.groups.feature_dataset.FeatureChannelDataset(dataset: Dataset, dataset_metadata: FeatureDecimationChannel | None = None, write_metadata: bool = True, **kwargs)[source]

Bases: object

Container for multi-dimensional Fourier Coefficients organized by time and frequency.

This class manages Fourier Coefficient data with frequency band organization, similar to FCDataset but with enhanced band tracking capabilities. The data array is organized with the following assumptions:

Data are grouped into frequency bands
Data are uniformly sampled in time (uniform FFT moving window step size)

The dataset tracks temporal evolution of frequency content across multiple windows, making it suitable for time-frequency analysis of geophysical signals.

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the Fourier coefficient data.
dataset_metadata (FeatureDecimationChannel, optional) – Metadata for the dataset. See mt_metadata.features.FeatureDecimationChannel. If provided, must be of the same type as the internal metadata class. Default is None.
**kwargs – Additional keyword arguments for future extensibility.

hdf5_dataset

Reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata[source]

Metadata container with the following attributes:

namestr
Dataset name
time_period.startdatetime
Start time of the data acquisition
time_period.enddatetime
End time of the data acquisition
sample_rate_window_stepfloat
Sample rate of the time window stepping (Hz)
frequency_minfloat
Minimum frequency in the band (Hz)
frequency_maxfloat
Maximum frequency in the band (Hz)
unitsstr
Physical units of the coefficient data
componentstr
Component identifier (e.g., ‘Ex’, ‘Hy’)
sample_rate_decimation_levelint
Decimation level applied to acquire this data

Type:: FeatureDecimationChannel

Raises:: MTH5Error – If dataset_metadata type does not match the expected FeatureDecimationChannel type.

Examples

>>> import h5py
>>> from mt_metadata.features import FeatureDecimationChannel
>>> from mth5.groups.feature_dataset import FeatureChannelDataset

Create a feature dataset from an HDF5 group:

>>> with h5py.File('data.h5', 'r') as f:
...     h5_dataset = f['feature_group']['Ex']
...     feature = FeatureChannelDataset(h5_dataset)
...     print(f"Time windows: {feature.n_windows}")
...     print(f"Frequencies: {feature.n_frequencies}")

Access time and frequency arrays:

>>> time_array = feature.time
>>> freq_array = feature.frequency
>>> data_array = feature.to_numpy()

property frequency: ndarray[source]

Get the frequency array for the dataset.

Returns a linearly-spaced frequency array from frequency_min to frequency_max with n_frequencies points.

Returns:: Array of float64 frequencies in Hz with shape (n_frequencies,).
Return type:: np.ndarray

Examples

>>> freq_array = feature.frequency
>>> print(freq_array.shape)
(256,)
>>> print(f"Frequency range: {freq_array[0]:.2f} - {freq_array[-1]:.2f} Hz")
Frequency range: 0.01 - 100.00 Hz

from_numpy(new_estimate: ndarray) → None[source]

Load data from a numpy array into the HDF5 dataset.

This method updates the HDF5 dataset with new data from a numpy array. The input array must match the dataset’s dtype. The HDF5 dataset will be resized if necessary to accommodate the new data.

Parameters:: new_estimate (np.ndarray) – Numpy array to write to the HDF5 dataset. Must have compatible dtype with the existing dataset.
Raises:: TypeError – If input array dtype does not match the HDF5 dataset dtype or if input cannot be converted to numpy array.

Notes

The variable ‘data’ is a builtin in numpy and cannot be used as a parameter name.

Examples

>>> import numpy as np
>>> new_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> feature.from_numpy(new_data)
>>> loaded_data = feature.to_numpy()
>>> assert loaded_data.shape == new_data.shape

from_xarray(data: DataArray, sample_rate_decimation_level: int) → None[source]

Load data and metadata from an xarray DataArray.

This method updates both the HDF5 dataset and metadata from an xarray DataArray. It extracts time coordinates, frequency range, and component information from the DataArray and its attributes.

Parameters:

data (xr.DataArray) – Input xarray DataArray with ‘time’ and ‘frequency’ coordinates. Expected dimensions are [‘time’, ‘frequency’].
sample_rate_decimation_level (int) – Decimation level applied to the original data to produce this feature dataset (integer >= 1).

Notes

Metadata stored in xarray attributes will be extracted and written to the HDF5 file. The full dataset is loaded into memory during this process.

Examples

>>> import xarray as xr
>>> import numpy as np

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex',
...     attrs={'units': 'mV/km'}
... )

Load into feature dataset:

>>> feature.from_xarray(xr_data, sample_rate_decimation_level=2)
>>> print(feature.metadata.component)
'Ex'

property n_frequencies: int[source]

Get the number of frequency bins in the dataset.

Returns:: Number of frequency bins (second dimension of the dataset).
Return type:: int

property n_windows: int[source]

Get the number of time windows in the dataset.

Returns:: Number of time windows (first dimension of the dataset).
Return type:: int

read_metadata() → None[source]

Read metadata from the HDF5 file into the metadata container.

This method loads all attributes from the HDF5 dataset into the metadata container, enabling validation and type checking.

Examples

>>> feature.read_metadata()
>>> print(feature.metadata.component)
'Ex'

replace_dataset(new_data_array: ndarray) → None[source]

Replace the entire HDF5 dataset with new data.

This method resizes the HDF5 dataset as needed and replaces all data. The input array must have the same dtype as the existing dataset.

Parameters:: new_data_array (np.ndarray) – New data array to replace the existing dataset. Will be converted to numpy array if necessary.
Raises:: TypeError – If input cannot be converted to a numpy array or has incompatible shape.

Examples

>>> import numpy as np
>>> new_data = np.random.randn(100, 256)
>>> feature.replace_dataset(new_data)

property time: ndarray[source]

Get the time array for each window.

Returns an array of datetime64 values representing the start time of each time window. The time spacing is determined by the sample rate of the window stepping.

Returns:: Array of datetime64 values with shape (n_windows,) representing the start time of each window.
Return type:: np.ndarray

Examples

>>> time_array = feature.time
>>> print(time_array.shape)
(100,)
>>> print(time_array[0])
numpy.datetime64('2023-01-01T00:00:00')

to_numpy() → ndarray[source]

Convert the feature dataset to a numpy array.

Returns the dataset as a numpy array by loading it from the HDF5 file into memory. The array shape is (n_windows, n_frequencies).

Returns:: Numpy array containing all feature data with shape (n_windows, n_frequencies).
Return type:: np.ndarray

Examples

>>> data = feature.to_numpy()
>>> print(data.shape)
(100, 256)
>>> print(data.dtype)
complex128
>>> mean_amplitude = np.abs(data).mean()

to_xarray() → DataArray[source]

Convert the feature dataset to an xarray DataArray.

Returns an xarray DataArray with proper time and frequency coordinates, metadata attributes, and component naming. The entire dataset is loaded into memory.

Returns:: DataArray with dimensions [‘time’, ‘frequency’] and coordinates matching the dataset’s time and frequency arrays.
Return type:: xr.DataArray

Notes

Metadata stored in xarray attributes will not be validated if modified. The full dataset is loaded into memory; use with caution for large datasets.

Examples

>>> xr_data = feature.to_xarray()
>>> print(xr_data.dims)
('time', 'frequency')
>>> print(xr_data.name)
'Ex'
>>> subset = xr_data.sel(time=slice('2023-01-01', '2023-01-02'))

write_metadata() → None[source]

Write metadata from the metadata container to the HDF5 attributes.

This method serializes the metadata container and writes all metadata as attributes to the HDF5 dataset. Raises exceptions are caught for read-only files.

Examples

>>> feature.metadata.component = 'Ey'
>>> feature.write_metadata()

mth5.groups.features module

Created on Fri Dec 13 12:40:34 2024

@author: jpeacock

class mth5.groups.features.FeatureDecimationGroup(group: Group, decimation_level_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single decimation level with multiple Fourier Coefficient channels.

This class manages Fourier Coefficient data organized by frequency, time, and channel. Data is assumed to be uniformly sampled in both frequency and time domains.

Hierarchy

FeatureDecimationGroup -> FeatureChannelDataset (multiple channels)

Data Assumptions

Data are uniformly sampled in frequency domain
Data are uniformly sampled in time domain
FFT moving window has uniform step size

start time

Start time of the decimation level

Type:: datetime

end time

End time of the decimation level

Type:: datetime

channels

List of channel names in this decimation level

Type:: list

decimation_factor

Factor by which data was decimated

Type:: int

decimation_level

Level index in decimation hierarchy

Type:: int

decimation_sample_rate

Sample rate after decimation (Hz)

Type:: float

method

Method used (FFT, wavelet, etc.)

Type:: str

anti_alias_filter

Anti-aliasing filter used

Type:: optional

prewhitening_type

Type of prewhitening applied

Type:: optional

harmonics_kept

Harmonic indices kept in the data

Type:: list or ‘all’

window

Window parameters (length, overlap, type, sample rate)

Type:: dict

bands

Frequency bands in the data

Type:: list

param group:: HDF5 group object for this FeatureDecimationGroup.
type group:: h5py.Group
param decimation_level_metadata:: Metadata for the decimation level. Default is None.
type decimation_level_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> decimation = FeatureDecimationGroup(h5_group, metadata)
>>> channel = decimation.add_channel('Ex', fc_data=fc_array, fc_metadata=ch_metadata)

add_channel(fc_name: str, fc_data: ~numpy.ndarray | ~xarray.core.dataarray.DataArray | ~xarray.core.dataset.Dataset | ~pandas.DataFrame | None = None, fc_metadata: ~mt_metadata.features.feature_decimation_channel.FeatureDecimationChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = <class 'complex'>, **kwargs) → FeatureChannelDataset[source]

Add a Fourier Coefficient channel to the decimation level.

Creates a new FeatureChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array, xarray, DataFrame, or created empty.

Parameters:

fc_name (str) – Name for the Fourier Coefficient channel.
fc_data (np.ndarray, xr.DataArray, xr.Dataset, pd.DataFrame, optional) – Input data. Can be numpy array (time, frequency) or xarray/DataFrame format. Default is None (creates empty dataset).
fc_metadata (FeatureDecimationChannel, optional) – Metadata for the channel. Default is None.
max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
dtype (type, default=complex) – Data type for the dataset (e.g., complex, float, int).
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Returns:

Newly created FeatureChannelDataset object.

Return type:

FeatureChannelDataset

Raises:

TypeError – If fc_data type is not supported or metadata type mismatch.
RuntimeError or OSError – If channel already exists (will return existing channel).

Notes

Data layout assumes (time, frequency) organization:

time index: window start times
frequency index: harmonic indices or float values
data: complex Fourier coefficients

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> metadata = FeatureDecimationChannel(name='Ex')

Create from numpy array:

>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)

Create empty channel (expandable):

>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)

add_weights(weight_name: str, weight_data: ndarray | None = None, weight_metadata: object | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) → None[source]

Add weight or masking data for Fourier Coefficients.

Creates a dataset to store weights or masks for quality control, frequency band selection, or time window filtering.

Parameters:

weight_name (str) – Name for the weight dataset.
weight_data (np.ndarray, optional) – Weight values. Default is None.
weight_metadata (optional) – Metadata for the weight dataset. Default is None.
max_shape (tuple, default=(None, None, None)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use HDF5 chunking.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Notes

Weight datasets can track:

weight_channel: Per-channel weights
weight_band: Per-frequency-band weights
weight_time: Per-time-window weights

This method is a placeholder for future implementation.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.add_weights('coherency_weights', weight_data=weights)

property channel_summary: DataFrame[source]

Get a summary of all channels in this decimation level.

Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.

Returns:

DataFrame with columns:

namestr
Channel name
startdatetime64[ns]
Start time of the channel data
enddatetime64[ns]
End time of the channel data
n_frequencyint64
Number of frequency bins
n_windowsint64
Number of time windows
sample_rate_decimation_levelfloat64
Decimation level sample rate (Hz)
sample_rate_window_stepfloat64
Sample rate of window stepping (Hz)
unitsstr
Physical units of the data
hdf5_referenceh5py.ref_dtype
HDF5 reference to the channel dataset

Return type:

pd.DataFrame

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> summary = decimation.channel_summary
>>> print(summary[['name', 'n_frequency', 'n_windows']])

from_dataframe(df: DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') → None[source]

Load Fourier Coefficient data from a pandas DataFrame.

Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.

Parameters:

df (pd.DataFrame) – Input DataFrame containing the coefficient data.
channel_key (str) – Name of the column containing coefficient values.
time_key (str, default='time') – Name of the time coordinate column.
frequency_key (str, default='frequency') – Name of the frequency coordinate column.

Raises:

TypeError – If df is not a pandas DataFrame.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')

from_numpy_array(nd_array: ndarray, ch_name: str | list) → None[source]

Load Fourier Coefficient data from a numpy array.

Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.

Parameters:

nd_array (np.ndarray) – Input numpy array containing coefficient data.
ch_name (str or list) – Channel name (for 2D array) or list of channel names (for 3D array).

Raises:

TypeError – If nd_array is not a numpy ndarray.
ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).

Examples

>>> decimation = FeatureDecimationGroup(h5_group)

Load single channel:

>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100)
>>> decimation.from_numpy_array(data_2d, ch_name='Ex')

Load multiple channels:

>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100)
>>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])

from_xarray(data_array: DataArray | Dataset, sample_rate_decimation_level: float) → None[source]

Load Fourier Coefficient data from an xarray DataArray or Dataset.

Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FeatureChannelDataset instances for each variable or the single DataArray.

Parameters:

data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).
sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).

Raises:

TypeError – If data_array is not an xarray Dataset or DataArray.

Notes

Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.

Examples

>>> import xarray as xr
>>> import numpy as np
>>> decimation = FeatureDecimationGroup(h5_group)

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex',
...     attrs={'units': 'mV/km'}
... )

Load into decimation group:

>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)

get_channel(fc_name: str) → FeatureChannelDataset[source]

Retrieve a Fourier Coefficient channel by name.

Parameters:: fc_name (str) – Name of the channel to retrieve.
Returns:: The requested FeatureChannelDataset object.
Return type:: FeatureChannelDataset
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> channel = decimation.get_channel('Ex')
>>> data = channel.to_numpy()

property metadata[source]: Overwrite get metadata to include channel information in the runs

remove_channel(fc_name: str) → None[source]

Remove a Fourier Coefficient channel from the decimation level.

Deletes the channel from the HDF5 file. Note that this removes the reference but does not reduce file size.

Parameters:: fc_name (str) – Name of the channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Notes

To reduce HDF5 file size, copy desired data to a new file.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.remove_channel('Ex')

to_xarray(channels: list | None = None) → Dataset[source]

Create an xarray Dataset from Fourier Coefficient channels.

If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.

Parameters:: channels (list, optional) – List of channel names to include. If None, all channels are used. Default is None.
Returns:: xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.
Return type:: xr.Dataset

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> xr_data = decimation.to_xarray()
>>> print(xr_data.data_vars)
Data variables:
    Ex  (time, frequency) complex128
    Ey  (time, frequency) complex128

Get specific channels:

>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])

update_metadata() → None[source]

Update metadata from all channels in the decimation level.

Scans all channels and updates the decimation-level metadata with aggregated information including time ranges and sampling rates.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.update_metadata()

class mth5.groups.features.FeatureFCRunGroup(group: Group, feature_run_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Container for Fourier Coefficient features from a processing run.

This class manages Fourier Coefficient data organized by decimation levels, each containing multiple frequency channels with time-frequency data.

Hierarchy

FeatureFCRunGroup -> FeatureDecimationGroup -> FeatureChannelDataset

metadata[source]

Metadata including:

list of decimation levels
start time (earliest)
end time (latest)
method (fft, wavelet, …)
list of channels used
starting sample rate
bands used
type (TS or FC)

Type:: Decimation

param group:: HDF5 group object for this FeatureFCRunGroup.
type group:: h5py.Group
param feature_run_metadata:: Decimation metadata for the feature run. Default is None.
type feature_run_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group, feature_run_metadata=metadata)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)

add_decimation_level(decimation_level_name: str, feature_decimation_level_metadata: object | None = None) → FeatureDecimationGroup[source]

Add a decimation level group to the feature run.

Parameters:

decimation_level_name (str) – Name for the decimation level.
feature_decimation_level_metadata (optional) – Metadata for the decimation level. Default is None.

Returns:

Newly created decimation level group.

Return type:

FeatureDecimationGroup

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)
>>> print(decimation.name)
'level_0'

property decimation_level_summary: DataFrame[source]

Get a summary of all decimation levels in the run.

Returns a pandas DataFrame with information about each decimation level including decimation factor, time range, and HDF5 reference.

Returns:

DataFrame with columns:

namestr
Decimation level name
startdatetime64[ns]
Start time of the decimation level
enddatetime64[ns]
End time of the decimation level
hdf5_referenceh5py.ref_dtype
HDF5 reference to the decimation level group

Return type:

pd.DataFrame

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> summary = fc_run.decimation_level_summary
>>> print(summary[['name', 'start', 'end']])

get_decimation_level(decimation_level_name: str) → FeatureDecimationGroup[source]

Retrieve a decimation level group by name.

Parameters:: decimation_level_name (str) – Name of the decimation level to retrieve.
Returns:: The requested decimation level group.
Return type:: FeatureDecimationGroup
Raises:: MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.get_decimation_level('level_0')

property metadata: Decimation[source]: Overwrite get metadata to include channel information in the runs

remove_decimation_level(decimation_level_name: str) → None[source]

Remove a decimation level from the feature run.

Parameters:: decimation_level_name (str) – Name of the decimation level to remove.
Raises:: MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.remove_decimation_level('level_0')

update_metadata() → None[source]

Update metadata from all decimation levels.

Scans all decimation levels and updates the run-level metadata with aggregated information including time ranges.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.update_metadata()

class mth5.groups.features.FeatureGroup(group: Group, feature_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single feature set with all associated runs and decimation levels.

This class manages feature-specific data including all processing runs and decimation levels. Features can include both Fourier Coefficient and time series data.

Hierarchy

FeatureGroup -> FeatureRunGroup ->

FC: FeatureDecimationLevel -> FeatureChannelDataset
TS: FeatureChannelDataset

param group:: HDF5 group object for this FeatureGroup.
type group:: h5py.Group
param feature_metadata:: Metadata specific to this feature. Should include description and parameters.
type feature_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Notes

Feature metadata should be specific to the feature and include descriptions of the feature and any parameters used in its computation.

Examples

>>> feature = FeatureGroup(h5_group, feature_metadata=metadata)
>>> run_group = feature.add_feature_run_group('run_1', domain='fc')

add_feature_run_group(feature_name: str, feature_run_metadata: object | None = None, domain: str = 'fc') → object[source]

Add a feature run group for a single feature.

Creates either a Fourier Coefficient run group or a time series run group based on the specified domain. The domain can be determined from the metadata or explicitly provided.

Parameters:

feature_name (str) – Name for the feature run group.
feature_run_metadata (optional) – Metadata for the feature run. If provided, domain is extracted from metadata.domain attribute. Default is None.
domain (str, default='fc') –
Domain type for the data. Must be one of:
- ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
- ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

Newly created feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:

ValueError – If domain is not recognized.
AttributeError – If metadata does not have a domain attribute when metadata is provided.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.add_feature_run_group('processing_run_1', domain='fc')
>>> ts_run = feature.add_feature_run_group('ts_analysis', domain='ts')

get_feature_run_group(feature_name: str, domain: str = 'frequency') → object[source]

Retrieve a feature run group by name and domain type.

Parameters:

feature_name (str) – Name of the feature run group to retrieve.
domain (str, default='frequency') –
Domain type. Must be one of:
- ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
- ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

The requested feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:

ValueError – If domain is not recognized.
MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.get_feature_run_group('processing_run_1', domain='fc')

remove_feature_run_group(feature_name: str) → None[source]

Remove a feature run group.

Deletes the specified feature run group and all its associated data. Note that deletion removes the reference but does not reduce HDF5 file size.

Parameters:: feature_name (str) – Name of the feature run group to remove.
Raises:: MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> feature.remove_feature_run_group('processing_run_1')

class mth5.groups.features.FeatureTSRunGroup(group: Group, feature_run_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for time series features from a processing or analysis run.

This class wraps a RunGroup to manage time series data features while maintaining compatibility with the feature hierarchy structure.

Parameters:

group (h5py.Group) – HDF5 group object for this FeatureTSRunGroup.
feature_run_metadata (optional) – Metadata for the feature run (same type as timeseries.Run).
**kwargs – Additional keyword arguments passed to BaseGroup.

Notes

This class uses methods from RunGroup for channel management, which may have performance implications due to multiple RunGroup instantiations.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group, feature_run_metadata=metadata)
>>> channel = ts_run.add_feature_channel('Ex', 'electric', data)

add_feature_channel(channel_name: str, channel_type: str, data: ndarray | None = None, channel_dtype: str = 'int32', shape: tuple | None = None, max_shape: tuple = (None,), chunks: bool = True, channel_metadata: object | None = None, **kwargs) → object[source]

Add a time series channel to the feature run group.

Creates a new channel for time series data with the specified properties and optional metadata. Channel metadata should be a timeseries.Channel object.

Parameters:

channel_name (str) – Name for the channel.
channel_type (str) – Type of channel (e.g., ‘electric’, ‘magnetic’).
data (np.ndarray, optional) – Initial data for the channel. Default is None.
channel_dtype (str, default='int32') – Data type for the channel.
shape (tuple, optional) – Shape of the channel data. Default is None.
max_shape (tuple, default=(None,)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use chunking for the dataset.
channel_metadata (optional) – Metadata object (timeseries.Channel type). Default is None.
**kwargs – Additional keyword arguments for dataset creation.

Returns:

Channel object from RunGroup.

Return type:

object

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.add_feature_channel(
...     'Ex', 'electric', data=np.arange(1000))

get_feature_channel(channel_name: str) → object[source]

Retrieve a feature channel by name.

Parameters:: channel_name (str) – Name of the channel to retrieve.
Returns:: Channel object from RunGroup.
Return type:: object
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.get_feature_channel('Ex')

remove_feature_channel(channel_name: str) → None[source]

Remove a feature channel from the run group.

Parameters:: channel_name (str) – Name of the channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> ts_run.remove_feature_channel('Ex')

class mth5.groups.features.MasterFeaturesGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Master group container for features associated with Fourier Coefficients or time series.

This class manages the top-level organization of geophysical feature data, organizing it into feature-specific groups. Features can include various frequency or time-domain analyses.

Hierarchy

MasterFeatureGroup -> FeatureGroup -> FeatureRunGroup ->

FC: FeatureDecimationGroup -> FeatureChannelDataset
Time Series: FeatureChannelDataset

param group:: HDF5 group object for this MasterFeaturesGroup.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> import h5py
>>> from mth5.groups.features import MasterFeaturesGroup
>>> with h5py.File('data.h5', 'r') as f:
...     master = MasterFeaturesGroup(f['features'])
...     feature_list = master.groups_list

add_feature_group(feature_name: str, feature_metadata: FeatureDecimationChannel | None = None) → FeatureGroup[source]

Add a feature group to the master features container.

Creates a new FeatureGroup with the specified name and optional metadata. Feature groups organize all runs and decimation levels for a particular feature.

Parameters:

feature_name (str) – Name for the feature group. Will be validated and formatted.
feature_metadata (FeatureDecimationChannel, optional) – Metadata describing the feature. Default is None.

Returns:

Newly created feature group object.

Return type:

FeatureGroup

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.add_feature_group('coherency')
>>> print(feature.name)
'coherency'

get_feature_group(feature_name: str) → FeatureGroup[source]

Retrieve a feature group by name.

Parameters:: feature_name (str) – Name of the feature group to retrieve.
Returns:: The requested feature group.
Return type:: FeatureGroup
Raises:: MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.get_feature_group('coherency')
>>> print(feature.name)
'coherency'

remove_feature_group(feature_name: str) → None[source]

Remove a feature group from the master container.

Deletes the specified feature group and its associated data from the HDF5 file. Note that this operation removes the reference but does not reduce the file size; copy desired data to a new file for size reduction.

Parameters:: feature_name (str) – Name of the feature group to remove.
Raises:: MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> master.remove_feature_group('coherency')

mth5.groups.filters module

Filter groups manager for handling multiple filter types in MTH5.

This module provides a unified interface for managing different types of filters including zeros-poles-gain (ZPK), coefficients, time delays, frequency-amplitude-phase (FAP), and finite impulse response (FIR) filters.

copyright:: Jared Peacock (jpeacock@usgs.gov)
license:: MIT

class mth5.groups.filters.FiltersGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Container for managing all filter types in MTH5 format.

This class provides a unified interface for organizing and accessing filters of different types. It automatically creates and manages subgroups for each filter type (ZPK, Coefficient, Time Delay, FAP, and FIR) within the HDF5 file structure.

Filter Types

zpk: Zeros, Poles, and Gain representation
coefficient: FIR coefficient filter
time_delay: Time delay filter
fap: Frequency-Amplitude-Phase (FAP) lookup table
fir: Finite Impulse Response filter

param group:: HDF5 group object for the filters container.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

zpk_group

Subgroup for zeros-poles-gain filters.

Type:: ZPKGroup

coefficient_group

Subgroup for coefficient filters.

Type:: CoefficientGroup

time_delay_group

Subgroup for time delay filters.

Type:: TimeDelayGroup

fap_group

Subgroup for frequency-amplitude-phase filters.

Type:: FAPGroup

fir_group

Subgroup for FIR filters.

Type:: FIRGroup

Examples

>>> import h5py
>>> from mth5.groups.filters import FiltersGroup
>>> with h5py.File('data.h5', 'r') as f:
...     filters = FiltersGroup(f['Filters'])
...     all_filters = filters.filter_dict
...     zpk_filter = filters.to_filter_object('my_zpk_filter')

add_filter(filter_object: object) → object[source]

Add a filter dataset based on its type.

Automatically detects the filter type and routes the filter to the appropriate subgroup. Filter names are normalized to lowercase and forward slashes are replaced with “ per “ for consistency.

Parameters:

filter_object (mt_metadata.timeseries.filters) –

An MT metadata filter object with a ‘type’ attribute. Supported types:

’zpk’, ‘poles_zeros’: Zeros-Poles-Gain filter
’coefficient’: Coefficient filter
’time_delay’, ‘time delay’: Time delay filter
’fap’, ‘frequency response table’: Frequency-Amplitude-Phase filter
’fir’: Finite Impulse Response filter

Returns:

Filter group object from the appropriate subgroup.

Return type:

object

Notes

If a filter with the same name already exists, the existing filter is returned instead of creating a duplicate.

Examples

>>> from mt_metadata.timeseries.filters import ZPK
>>> filters = FiltersGroup(h5_group)
>>> zpk_filter = ZPK(name='my_filter')
>>> added_filter = filters.add_filter(zpk_filter)

Add coefficient filter:

>>> from mt_metadata.timeseries.filters import Coefficient
>>> coeff_filter = Coefficient(name='lowpass')
>>> filters.add_filter(coeff_filter)

property filter_dict: dict[str, Any][source]

Get a dictionary of all filters across all filter type groups.

Aggregates filters from all subgroups (ZPK, Coefficient, Time Delay, FAP, FIR) into a single dictionary for convenient access and querying.

Returns:: Dictionary mapping filter names to filter metadata dictionaries. Each entry contains filter information including type and HDF5 reference.
Return type:: dict[str, Any]

Examples

>>> filters = FiltersGroup(h5_group)
>>> all_filters = filters.filter_dict
>>> print(list(all_filters.keys()))
['my_zpk_filter', 'lowpass_coefficient', 'time_delay_1', ...]
>>> print(all_filters['my_zpk_filter']['type'])
'zpk'

get_filter(name: str) → Dataset | Group[source]

Retrieve a filter dataset by name.

Looks up the filter by name in the aggregated filter dictionary and returns the HDF5 dataset or group object.

Parameters:: name (str) – Name of the filter to retrieve.
Returns:: HDF5 dataset or group object for the requested filter.
Return type:: h5py.Dataset or h5py.Group
Raises:: KeyError – If the filter name is not found in the filter dictionary.

Examples

>>> filters = FiltersGroup(h5_group)
>>> filter_dataset = filters.get_filter('my_zpk_filter')
>>> print(filter_dataset.attrs)

to_filter_object(name: str) → object[source]

Convert a filter HDF5 dataset to an MT metadata filter object.

Retrieves the filter metadata from the HDF5 file and converts it to the appropriate MT metadata filter class based on filter type.

Parameters:: name (str) – Name of the filter to convert.
Returns:: MT metadata filter object (ZPK, Coefficient, TimeDelay, FAP, or FIR).
Return type:: object
Raises:: KeyError – If the filter name is not found in the filter dictionary.

Examples

>>> filters = FiltersGroup(h5_group)
>>> zpk_filter = filters.to_filter_object('my_zpk_filter')
>>> print(zpk_filter.name)
'my_zpk_filter'
>>> print(type(zpk_filter))
<class 'mt_metadata.timeseries.filters.ZPK'>

Get different filter types:

>>> coeff_filter = filters.to_filter_object('lowpass_coefficient')
>>> fap_filter = filters.to_filter_object('frequency_response_1')

mth5.groups.fourier_coefficients module

Fourier Coefficient group management for MTH5 format.

This module provides classes for organizing and managing Fourier Coefficient data at multiple decimation levels, including utilities for data import/export with different formats (numpy, xarray, pandas).

copyright:: Jared Peacock (jpeacock@usgs.gov)

class mth5.groups.fourier_coefficients.FCDecimationGroup(group: Group, decimation_level_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single decimation level of Fourier Coefficient data.

This class manages all channels at a specific decimation level, assuming uniform sampling in both frequency and time domains.

Data Assumptions

Data uniformly sampled in frequency domain
Data uniformly sampled in time domain
FFT moving window has uniform step size

start_time

Start time of the decimation level

Type:: datetime

end_time

End time of the decimation level

Type:: datetime

channels

List of channel names in this decimation level

Type:: list

decimation_factor

Factor by which data was decimated

Type:: int

decimation_level

Level index in decimation hierarchy

Type:: int

sample_rate

Sample rate after decimation (Hz)

Type:: float

method

Method used (FFT, wavelet, etc.)

Type:: str

window

Window parameters (length, overlap, type, sample rate)

Type:: dict

param group:: HDF5 group object for this decimation level.
type group:: h5py.Group
param decimation_level_metadata:: Metadata for the decimation level. Default is None.
type decimation_level_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> decimation = FCDecimationGroup(h5_group, decimation_level_metadata=metadata)
>>> channel = decimation.add_channel('Ex', fc_data=fc_array)

add_channel(fc_name: str, fc_data: ~numpy.ndarray | None = None, fc_metadata: ~mt_metadata.processing.fourier_coefficients.fc_channel.FCChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = <class 'complex'>, **kwargs) → FCChannelDataset[source]

Add a Fourier Coefficient channel to the decimation level.

Creates a new FCChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array or created empty.

Parameters:

fc_name (str) – Name for the Fourier Coefficient channel (usually component name like ‘Ex’).
fc_data (np.ndarray, optional) – Input data with shape (n_frequencies, n_windows). Default is None (creates empty).
fc_metadata (fc.FCChannel, optional) – Metadata for the channel. Default is None.
max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
dtype (type, default=complex) – Data type for the dataset.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Returns:

Newly created FCChannelDataset object.

Return type:

FCChannelDataset

Raises:

TypeError – If fc_data type is not supported.

Notes

Data layout assumes (time, frequency) organization:

time index: window start times
frequency index: harmonic indices or float values
data: complex Fourier coefficients

If a channel with the same name already exists, the existing channel is returned instead of creating a duplicate.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> metadata = fc.FCChannel(component='Ex')

Create from numpy array:

>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)

Create empty channel (expandable):

>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)

add_feature(feature_name: str, feature_data: ndarray | None = None, feature_metadata: dict | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) → None[source]

Add a feature dataset to the decimation level.

Creates a new dataset for auxiliary features or derived quantities related to Fourier Coefficients (e.g., SNR, coherency, power, etc.).

Parameters:

feature_name (str) – Name for the feature dataset.
feature_data (np.ndarray, optional) – Input data for the feature. Default is None (creates empty).
feature_metadata (dict, optional) – Metadata dictionary for the feature. Default is None.
max_shape (tuple, default=(None, None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Notes

Feature types may include:

Power: Total power in Fourier coefficients
SNR: Signal-to-noise ratio
Coherency: Cross-component coherence
Weights: Channel-specific weights
Flags: Data quality or processing flags

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> snr_data = np.random.randn(100, 256)
>>> decimation.add_feature('snr', feature_data=snr_data)

Or create empty feature for later population:

>>> decimation.add_feature('power_Ex')

property channel_summary: DataFrame[source]

Get a summary of all channels in this decimation level.

Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.

Returns:

DataFrame with columns:

componentstr
Channel component name (e.g., ‘Ex’, ‘Hy’)
startdatetime64[ns]
Start time of the channel data
enddatetime64[ns]
End time of the channel data
n_frequencyint64
Number of frequency bins
n_windowsint64
Number of time windows
sample_rate_decimation_levelfloat64
Decimation level sample rate (Hz)
sample_rate_window_stepfloat64
Sample rate of window stepping (Hz)
unitsstr
Physical units of the data
hdf5_referenceh5py.ref_dtype
HDF5 reference to the channel dataset

Return type:

pd.DataFrame

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> summary = decimation.channel_summary
>>> print(summary[['component', 'n_frequency', 'n_windows']])

from_dataframe(df: DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') → None[source]

Load Fourier Coefficient data from a pandas DataFrame.

Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.

Parameters:

df (pd.DataFrame) – Input DataFrame containing the coefficient data.
channel_key (str) – Name of the column containing coefficient values.
time_key (str, default='time') – Name of the time coordinate column.
frequency_key (str, default='frequency') – Name of the frequency coordinate column.

Raises:

TypeError – If df is not a pandas DataFrame.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')

from_numpy_array(nd_array: ndarray, ch_name: str | list[str]) → None[source]

Load Fourier Coefficient data from a numpy array.

Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.

Parameters:

nd_array (np.ndarray) – Input numpy array containing coefficient data.
ch_name (str or list[str]) – Channel name (for 2D array) or list of channel names (for 3D array).

Raises:

TypeError – If nd_array is not a numpy ndarray.
ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).

Examples

>>> decimation = FCDecimationGroup(h5_group)

Load single channel:

>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100)
>>> decimation.from_numpy_array(data_2d, ch_name='Ex')

Load multiple channels:

>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100)
>>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])

from_xarray(data_array: Dataset | DataArray, sample_rate_decimation_level: float) → None[source]

Load Fourier Coefficient data from an xarray DataArray or Dataset.

Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FCChannelDataset instances for each variable or the single DataArray.

Parameters:

data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).
sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).

Raises:

TypeError – If data_array is not an xarray Dataset or DataArray.

Notes

Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.

Examples

>>> import xarray as xr
>>> import numpy as np
>>> decimation = FCDecimationGroup(h5_group)

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex'
... )

Load into decimation group:

>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)

get_channel(fc_name: str) → FCChannelDataset[source]

Retrieve a Fourier Coefficient channel by name.

Parameters:

fc_name (str) – Name of the Fourier Coefficient channel to retrieve.

Returns:

The requested Fourier Coefficient channel dataset.

Return type:

FCChannelDataset

Raises:

KeyError – If the channel does not exist in this decimation level.
MTH5Error – If unable to retrieve the channel from HDF5.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> channel = decimation.get_channel('Ex')
>>> print(channel.shape)
(100, 256)

property metadata[source]: Overwrite get metadata to include channel information in the runs

remove_channel(fc_name: str) → None[source]

Remove a Fourier Coefficient channel from the decimation level.

Deletes the HDF5 dataset associated with the channel. Note that this removes the reference but does not reduce the HDF5 file size.

Parameters:: fc_name (str) – Name of the Fourier Coefficient channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Notes

Deleting a channel does not reduce the HDF5 file size; it simply removes the reference to the data. To truly reduce file size, copy the desired data to a new file.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.remove_channel('Ex')

to_xarray(channels: list[str] | None = None) → Dataset[source]

Create an xarray Dataset from Fourier Coefficient channels.

If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.

Parameters:: channels (list[str], optional) – List of channel names to include. If None, all channels are used. Default is None.
Returns:: xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.
Return type:: xr.Dataset

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> xr_data = decimation.to_xarray()
>>> print(xr_data.data_vars)
Data variables:
    Ex  (time, frequency) complex128
    Ey  (time, frequency) complex128

Get specific channels:

>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])

update_metadata() → None[source]

Update decimation level metadata from all channels.

Aggregates metadata from all FC channels in the decimation level including time period, sample rates, and window step information. Updates the internal metadata object and writes to HDF5.

Notes

Collects the following information from channels:

Time period start/end from channel data
Sample rate decimation level
Sample rate window step

Should be called after adding or modifying channels to keep metadata synchronized.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.add_channel('Ex', fc_data=data_ex)
>>> decimation.add_channel('Ey', fc_data=data_ey)
>>> decimation.update_metadata()

class mth5.groups.fourier_coefficients.FCGroup(group: Group, decimation_level_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Manage a set of Fourier Coefficients from a single processing run.

Holds Fourier Coefficient estimations organized by decimation level. Each decimation level contains channels (Ex, Ey, Hz, etc.) with complex frequency or time-frequency representations of the input signal.

All channels must use the same calibration. Recalibration requires rerunning the Fourier Coefficient estimation.

hdf5_group

The HDF5 group containing decimation levels

Type:: h5py.Group

metadata[source]

Decimation metadata including time period, sample rates, and channels

Type:: fc.Decimation

Notes

Processing run structure:

Multiple decimation levels at different sample rates
Each decimation level contains multiple channels
Each channel contains complex Fourier coefficients
Time period and sample rates define the estimation window

Examples

>>> with h5py.File('data.h5', 'r') as f:
...     fc_run = FCGroup(f['Fourier_Coefficients/run_1'])
...     print(fc_run.decimation_level_summary)

add_decimation_level(decimation_level_name: str, decimation_level_metadata: dict | Decimation | None = None) → FCDecimationGroup[source]

Add a new decimation level to the processing run.

Creates a new FCDecimationGroup for a single decimation level containing Fourier Coefficient channels at a specific sample rate.

Parameters:

decimation_level_name (str) – Identifier for the decimation level.
decimation_level_metadata (dict | fc.Decimation, optional) – Metadata for the decimation level. Can be a dictionary or fc.Decimation object. Default is None.

Returns:

Newly created decimation level group.

Return type:

FCDecimationGroup

Examples

>>> fc_run = FCGroup(h5_group)
>>> metadata = fc.Decimation(decimation_level=0)
>>> decimation = fc_run.add_decimation_level('0', metadata)

property decimation_level_summary: DataFrame[source]

Get a summary of all decimation levels in this processing run.

Returns information about each decimation level including sample rate, decimation level value, and time span.

Returns:

Summary with columns:

decimation_level: Integer decimation level identifier
start: ISO format start time of this decimation level
end: ISO format end time of this decimation level
hdf5_reference: Reference to the HDF5 group

Return type:

pd.DataFrame

Notes

Each row represents a single decimation level containing multiple channels with Fourier coefficients at different sample rates.

Examples

>>> fc_run = FCGroup(h5_group)
>>> summary = fc_run.decimation_level_summary
>>> print(summary[['decimation_level', 'start', 'end']])
   decimation_level                start                  end
0              0     2023-01-01T00:00:00.000000  2023-01-01T01:00:00.000000
1              1     2023-01-01T00:00:00.000000  2023-01-01T02:00:00.000000

get_decimation_level(decimation_level_name: str) → FCDecimationGroup[source]

Retrieve a decimation level by name.

Parameters:: decimation_level_name (str) – Name or identifier of the decimation level.
Returns:: The requested decimation level group.
Return type:: FCDecimationGroup

Examples

>>> fc_run = FCGroup(h5_group)
>>> decimation = fc_run.get_decimation_level('0')
>>> channels = decimation.groups_list

property metadata: Decimation[source]

Get processing run metadata including all decimation levels.

Collects metadata from all decimation level groups and aggregates into a single Decimation metadata object.

Returns:: Metadata containing time period, sample rates, and all decimation level information.
Return type:: fc.Decimation

Notes

This getter automatically populates:

Time period (start and end)
List of all decimation levels and their metadata
HDF5 reference to this group

Examples

>>> fc_run = FCGroup(h5_group)
>>> metadata = fc_run.metadata
>>> print(metadata.time_period.start)
2023-01-01T00:00:00

remove_decimation_level(decimation_level_name: str) → None[source]

Remove a decimation level from the processing run.

Deletes the HDF5 group and all its channels (FCChannelDataset objects).

Parameters:: decimation_level_name (str) – Name or identifier of the decimation level to remove.

Notes

This removes the entire decimation level and all channels within it. To remove individual channels, use FCDecimationGroup.remove_channel() instead.

Examples

>>> fc_run = FCGroup(h5_group)
>>> fc_run.remove_decimation_level('0')

supports_aurora_processing_config(processing_config: aurora.config.metadata.processing.Processing, remote: bool) → bool[source]

Check if all required decimation levels exist for Aurora processing.

Performs an all-or-nothing check: returns True only if every decimation level required by the processing config is available in this FCGroup.

Uses sequential logic to short-circuit: if any required decimation level is missing, immediately returns False without checking remaining levels.

Parameters:

processing_config (aurora.config.metadata.processing.Processing) – Aurora processing configuration containing required decimation levels.
remote (bool) – Whether to check for remote processing compatibility.

Returns:

True if all required decimation levels are available and consistent, False otherwise.

Return type:

bool

Notes

Validation logic:

Extract list of decimation levels from processing config
Iterate through each required level in sequence
For each level, find a matching FCDecimation in this group
Check consistency using Aurora’s validation method
If any level is missing or inconsistent, return False immediately
Return True only if all levels pass validation

Examples

>>> fc_run = FCGroup(h5_group)
>>> config = aurora.config.metadata.processing.Processing(...)
>>> if fc_run.supports_aurora_processing_config(config, remote=False):
...     # All decimation levels are available
...     pass

update_metadata() → None[source]

Update processing run metadata from all decimation levels.

Aggregates time period information from all decimation levels and writes updated metadata to HDF5.

Notes

Collects:

Earliest start time across all decimation levels
Latest end time across all decimation levels

Should be called after adding or removing decimation levels.

Examples

>>> fc_run = FCGroup(h5_group)
>>> fc_run.add_decimation_level('0', metadata0)
>>> fc_run.add_decimation_level('1', metadata1)
>>> fc_run.update_metadata()

class mth5.groups.fourier_coefficients.MasterFCGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Master container for all Fourier Coefficient estimations of time series data.

This class manages multiple Fourier Coefficient processing runs, each containing different decimation levels. No metadata is required at the master level.

Hierarchy

MasterFCGroup -> FCGroup (processing runs) -> FCDecimationGroup (decimation levels) -> FCChannelDataset (individual channels)

param group:: HDF5 group object for the master FC container.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> import h5py
>>> from mth5.groups.fourier_coefficients import MasterFCGroup
>>> with h5py.File('data.h5', 'r') as f:
...     master = MasterFCGroup(f['FC'])
...     fc_group = master.add_fc_group('processing_run_1')

add_fc_group(fc_name: str, fc_metadata: Decimation | None = None) → FCGroup[source]

Add a Fourier Coefficient processing run group.

Parameters:

fc_name (str) – Name for the FC group (usually identifies the processing run).
fc_metadata (fc.Decimation, optional) – Metadata for the FC group. Default is None.

Returns:

Newly created Fourier Coefficient group.

Return type:

FCGroup

Examples

>>> master = MasterFCGroup(h5_group)
>>> fc_group = master.add_fc_group('processing_run_1')
>>> print(fc_group.name)
'processing_run_1'

property fc_summary: DataFrame[source]

Get a summary of all Fourier Coefficient processing runs.

Returns:: Summary information for all FC groups including names and metadata.
Return type:: pd.DataFrame

Examples

>>> master = MasterFCGroup(h5_group)
>>> summary = master.fc_summary

get_fc_group(fc_name: str) → FCGroup[source]

Retrieve a Fourier Coefficient group by name.

Parameters:: fc_name (str) – Name of the FC group to retrieve.
Returns:: The requested Fourier Coefficient group.
Return type:: FCGroup
Raises:: MTH5Error – If the FC group does not exist.

Examples

>>> master = MasterFCGroup(h5_group)
>>> fc_group = master.get_fc_group('processing_run_1')

remove_fc_group(fc_name: str) → None[source]

Remove a Fourier Coefficient group.

Deletes the specified FC group and all associated decimation levels and channels.

Parameters:: fc_name (str) – Name of the FC group to remove.
Raises:: MTH5Error – If the FC group does not exist.

Examples

>>> master = MasterFCGroup(h5_group)
>>> master.remove_fc_group('processing_run_1')

mth5.groups.reports module

class mth5.groups.reports.ReportsGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Store report files (PDF/text) and images under /Survey/Reports.

Files are embedded into HDF5 datasets with basic metadata preserved.

Examples

>>> reports = survey.reports_group
>>> _ = reports.add_report("site_report", filename="/tmp/report.pdf")
>>> _ = reports.get_report("site_report")

add_report(report_name: str, report_metadata: dict[str, Any] | None = None, filename: str | Path | None = None) → None[source]

Add a report or image file to the group.

Parameters:

report_name (str) – Dataset name to store the file under.
report_metadata (dict, optional) – Additional attributes to attach to the dataset.
filename (str or Path, optional) – Path to the file to embed; supported types: PDF/TXT/MD and common images.

Raises:

FileNotFoundError – If filename does not exist.

Examples

>>> reports.add_report("manual", filename="docs/manual.pdf")

get_report(report_name: str, write=True) → Path[source]

Extract a stored report or image to the current working directory.

Parameters:: report_name (str) – Name of the stored dataset.
Returns:: Path to the materialized file on disk.
Return type:: pathlib.Path
Raises:: ValueError – If the stored file type is unsupported.

Examples

>>> path = reports.get_report("site_report")
>>> path.exists()
True

list_reports() → list[str][source]

List all stored reports and images in the group.

Returns:: Names of all stored datasets in the reports group.
Return type:: list of str

Examples

>>> report_names = reports.list_reports()
>>> print(report_names)
['site_report', 'manual', 'overview_image']

remove_report(report_name: str) → None[source]

Remove a stored report or image from the group.

Parameters:: report_name (str) – Name of the stored dataset to remove.

Examples

>>> reports.remove_report("manual")

mth5.groups.run module

Created on Sat May 27 09:59:03 2023

@author: jpeacock

class mth5.groups.run.RunGroup(group: Group, run_metadata: Run | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Container for a single MT measurement run with multiple channels.

Manages time series data and metadata for one measurement run within a station. A run can contain multiple channels of electric, magnetic, and auxiliary data. This class provides methods to add, retrieve, and manage individual channels, along with convenient access to station and survey metadata.

The run group is located at /Survey/Stations/{station_name}/{run_name} in the HDF5 file hierarchy.

metadata[source]

Run metadata including sample rate, time period, and channel information.

Type:: mt_metadata.timeseries.Run

channel_summary[source]

Summary table of all channels in the run.

Type:: pd.DataFrame

groups_list[source]

List of channel names in the run.

Type:: list[str]

Parameters:

group (h5py.Group) – HDF5 group for the run, should have path like /Survey/Stations/{station_name}/{run_name}
run_metadata (mt_metadata.timeseries.Run, optional) – Metadata container for the run. Default is None.
**kwargs (Any) – Additional keyword arguments passed to BaseGroup.

Notes

Key behaviors:

Channels can be of type: electric, magnetic, or auxiliary
All metadata updates should use the metadata object for validation
Call write_metadata() after modifying metadata to persist changes
Channel metadata is cached for performance during repeated access
Deleting a channel removes the reference but doesn’t reduce file size

Examples

Access run from an open MTH5 file:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')

Check available channels:

>>> run.groups_list
['Ex', 'Ey', 'Hx', 'Hy']

Access HDF5 group directly:

>>> run.hdf5_group.ref
<HDF5 Group Reference>

Update metadata and persist to file:

>>> run.metadata.sample_rate = 512.0
>>> run.write_metadata()

Add a channel:

>>> import numpy as np
>>> data = np.random.rand(4096)
>>> ex = run.add_channel('Ex', 'electric', data=data)

This class provides methods to add and get channels. A summary table of all existing channels in the run is also provided as a convenience look up table to make searching easier.

Parameters:

group (h5py.Group) – HDF5 group for a station, should have a path /Survey/Stations/station_name/run_name
station_metadata (mth5.metadata.Station, optional) – metadata container, defaults to None

Access RunGroup from an open MTH5 file:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')

Check what channels exist:

>>> station.groups_list
['Ex', 'Ey', 'Hx', 'Hy']

To access the hdf5 group directly use RunGroup.hdf5_group

>>> station.hdf5_group.ref
<HDF5 Group Reference>

Note

All attributes should be input into the metadata object, that way all input will be validated against the metadata standards. If you change attributes in metadata object, you should run the SurveyGroup.write_metadata() method. This is a temporary solution, working on an automatic updater if metadata is changed.

>>> run.metadata.existing_attribute = 'update_existing_attribute'
>>> run.write_metadata()

If you want to add a new attribute this should be done using the metadata.add_base_attribute method.

>>> station.metadata.add_base_attribute('new_attribute',
>>> ...                                 'new_attribute_value',
>>> ...                                 {'type':str,
>>> ...                                  'required':True,
>>> ...                                  'style':'free form',
>>> ...                                  'description': 'new attribute desc.',
>>> ...                                  'units':None,
>>> ...                                  'options':[],
>>> ...                                  'alias':[],
>>> ...                                  'example':'new attribute

Add a channel:

>>> new_channel = run.add_channel('Ex', 'electric',
>>> ...                            data=numpy.random.rand(4096))
>>> new_run
/Survey/Stations/MT001/MT001a:
=======================================
    --> Dataset: summary
    ......................
    --> Dataset: Ex
    ......................
    --> Dataset: Ey
    ......................
    --> Dataset: Hx
    ......................
    --> Dataset: Hy
    ......................

Add a channel with metadata:

>>> from mth5.metadata import Electric
>>> ex_metadata = Electric()
>>> ex_metadata.time_period.start = '2020-01-01T12:30:00'
>>> ex_metadata.time_period.end = '2020-01-03T16:30:00'
>>> new_ex = run.add_channel('Ex', 'electric',
>>> ...                       channel_metadata=ex_metadata)
>>> # to look at the metadata
>>> new_ex.metadata
{
     "electric": {
        "ac.end": 1.2,
        "ac.start": 2.3,
        ...
        }
}

See also

mth5.metadata for details on how to add metadata from various files and python objects.

Remove a channel:

>>> run.remove_channel('Ex')
>>> station
/Survey/Stations/MT001/MT001a:
=======================================
    --> Dataset: summary
    ......................
    --> Dataset: Ey
    ......................
    --> Dataset: Hx
    ......................
    --> Dataset: Hy
    ......................

Note

Deleting a station is not as simple as del(station). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the station.

Get a channel:

>>> existing_ex = stations.get_channel('Ex')
>>> existing_ex
Channel Electric:
-------------------
    data type:        Ex
    data type:        electric
    data format:      float32
    data shape:       (4096,)
    start:            1980-01-01T00:00:00+00:00
    end:              1980-01-01T00:32:+08:00
    sample rate:      8

Summary Table:

A summary table is provided to make searching easier. The table summarized all stations within a survey. To see what names are in the summary table:

>>> run.summary_table.dtype.descr
[('component', ('|S5', {'h5py_encoding': 'ascii'})),
 ('start', ('|S32', {'h5py_encoding': 'ascii'})),
 ('end', ('|S32', {'h5py_encoding': 'ascii'})),
 ('n_samples', '<i4'),
 ('measurement_type', ('|S12', {'h5py_encoding': 'ascii'})),
 ('units', ('|S25', {'h5py_encoding': 'ascii'})),
 ('hdf5_reference', ('|O', {'ref': h5py.h5r.Reference}))]

Note

When a run is added an entry is added to the summary table, where the information is pulled from the metadata.

>>> new_run.summary_table
index | component | start | end | n_samples | measurement_type | units |
hdf5_reference
--------------------------------------------------------------------------
-------------

add_channel(channel_name, channel_type, data, channel_dtype='int32', shape=None, max_shape=(None,), chunks=True, channel_metadata=None, **kwargs)[source]

Add a channel to the run.

Parameters:

channel_name (str) – Name of the channel (e.g., ‘ex’, ‘ey’, ‘hx’, ‘hy’, ‘hz’).
channel_type (str) – Type of channel: ‘electric’, ‘magnetic’, or ‘auxiliary’.
data (numpy.ndarray or None) – Time series data for the channel. If None, an empty resizable dataset will be created.
channel_dtype (str, optional) – Data type for the channel if data is None, by default “int32”.
shape (tuple of int, optional) – Initial shape of the dataset. If None and data is None, shape is estimated from metadata or set to (1,), by default None.
max_shape (tuple of int or None, optional) – Maximum shape the dataset can be resized to. Use None for unlimited growth in that dimension, by default (None,).
chunks (bool or int, optional) – Enable chunked storage. If True, uses automatic chunking. If int, uses that chunk size, by default True.
channel_metadata (mt_metadata.timeseries.Electric, Magnetic, or Auxiliary, optional) – Metadata object for the channel, by default None.
**kwargs (dict) – Additional keyword arguments.

Returns:

The created channel dataset object.

Return type:

ElectricDataset or MagneticDataset or AuxiliaryDataset

Raises:

MTH5Error – If channel_type is not one of: electric, magnetic, auxiliary.

Examples

Add a channel with data:

>>> import numpy as np
>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='a')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> data = np.random.rand(4096)
>>> ex = run.add_channel('ex', 'electric', data)
>>> print(ex.metadata.component)
ex

Add a channel with metadata:

>>> from mt_metadata.timeseries import Electric
>>> ex_meta = Electric()
>>> ex_meta.time_period.start = '2020-01-01T12:30:00'
>>> ex_meta.sample_rate = 256.0
>>> ex = run.add_channel('ex', 'electric', None,
...                      channel_metadata=ex_meta)
>>> print(ex.metadata.sample_rate)
256.0

Add a channel with custom shape:

>>> ex = run.add_channel('ex', 'electric', None,
...                      shape=(8192,), channel_dtype='float32')
>>> print(ex.hdf5_dataset.shape)
(8192,)

property channel_summary: DataFrame[source]

Get summary of all channels in the run as a DataFrame.

Returns:: DataFrame with columns: component, start, end, n_samples, sample_rate, measurement_type, units, hdf5_reference.
Return type:: pandas.DataFrame

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> summary = run.channel_summary
>>> print(summary[['component', 'sample_rate', 'n_samples']])
  component  sample_rate  n_samples
0        ex        256.0      65536
1        ey        256.0      65536
2        hx        256.0      65536
3        hy        256.0      65536

from_channel_ts(channel_ts_obj: ChannelTS) → ElectricDataset | MagneticDataset | AuxiliaryDataset[source]

Create a channel dataset from a ChannelTS timeseries object.

Converts a single ChannelTS object with time series data and metadata into an HDF5 channel dataset. Handles filter registration and updates run metadata with channel information.

Parameters:: channel_ts_obj (ChannelTS) – ChannelTS object containing time series data and metadata.
Returns:: Created channel dataset object.
Return type:: ElectricDataset | MagneticDataset | AuxiliaryDataset
Raises:: MTH5Error – If input is not a ChannelTS object.

Notes

Registers filters from channel response if present
Validates and corrects station/run ID mismatches
Updates run metadata recorded channel lists
Automatically determines channel type from metadata

Examples

>>> from mt_timeseries import ChannelTS
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> channel = ChannelTS.from_file("ex_timeseries.txt")
>>> ex = run.from_channel_ts(channel)
>>> print(ex.metadata.component)
ex

from_runts(run_ts_obj: RunTS, **kwargs: Any) → list[ElectricDataset | MagneticDataset | AuxiliaryDataset][source]

Create channel datasets from a RunTS timeseries object.

Converts a RunTS object with multiple channels and metadata into HDF5 channel datasets and updates run metadata accordingly.

Parameters:

run_ts_obj (RunTS) – RunTS object containing multiple channels and metadata.
**kwargs (Any) – Additional keyword arguments.

Returns:

List of created channel dataset objects.

Return type:

list[ElectricDataset | MagneticDataset | AuxiliaryDataset]

Raises:

MTH5Error – If input is not a RunTS object.

Notes

Updates run metadata from input object
Validates station and run IDs match current context
Creates appropriate channel type based on channel metadata
Automatically registers recorded channels in run metadata

Examples

>>> from mt_timeseries import RunTS
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> runts = RunTS.from_file("timeseries_data.txt")
>>> channels = run.from_runts(runts)
>>> print(f"Created {len(channels)} channels")
Created 4 channels

get_channel(channel_name: str) → ElectricDataset | MagneticDataset | AuxiliaryDataset | ChannelDataset[source]

Get a channel from an existing name.

Returns the appropriate channel dataset container based on the channel type (electric, magnetic, or auxiliary).

Parameters:: channel_name (str) – Name of the channel to retrieve (e.g., ‘ex’, ‘ey’, ‘hx’).
Returns:: Channel dataset object containing the channel data and metadata.
Return type:: ElectricDataset or MagneticDataset or AuxiliaryDataset or ChannelDataset
Raises:: MTH5Error – If the channel does not exist in the run.

Examples

Attempting to get a non-existent channel:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> ex = run.get_channel('ex')
MTH5Error: ex does not exist, check groups_list for existing names

Check available channels first:

>>> run.groups_list
['ey', 'hx', 'hz']

Get an existing channel:

>>> ey = run.get_channel('ey')
>>> print(ey)
Channel Electric:
-------------------
        component:        ey
        data type:        electric
        data format:      float32
        data shape:       (4096,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:01+00:00
        sample rate:      4096

has_data() → bool[source]

Check if the run contains any non-empty, non-zero data.

Verifies that all channels in the run have valid data (non-zero and non-empty arrays). Returns False if any channel lacks data.

Returns:: True if all channels have data, False if any channel is empty or all zeros.
Return type:: bool

Notes

A channel is considered to have data if its has_data() method returns True, meaning it contains non-zero values.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> if run.has_data():
...     print("Run contains valid data")
...     runts = run.to_runts()

property metadata: Run[source]

Get run metadata including all channel information.

This property dynamically reads and caches channel metadata from HDF5, ensuring the run metadata always reflects the current state of channels.

Returns:: Run metadata object with all channels included.
Return type:: metadata.Run

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run_meta = run.metadata
>>> print(run_meta.channels_recorded_electric)
['ex', 'ey']
>>> print(run_meta.sample_rate)
256.0

plot(start: str | None = None, end: str | None = None, n_samples: int | None = None) → Any[source]

Create a matplotlib plot of all channels in the run.

Generates a multi-panel plot showing all channels in the run using the RunTS plotting functionality.

Parameters:

start (str, optional) – Start time for time slice in ISO format. If None, plots entire channel data. Default is None.
end (str, optional) – End time for time slice in ISO format. Only used if start is specified. Default is None.
n_samples (int, optional) – Number of samples to extract from start. If both end and n_samples are specified, end takes precedence. Default is None.

Returns:

Matplotlib figure or axes object (depends on RunTS.plot() implementation).

Return type:

Any

Notes

Creates separate subplots for each channel type (electric, magnetic, auxiliary)
Time slice parameters work the same as to_runts()
Requires matplotlib to be installed

Examples

Plot entire run:

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> fig = run.plot()
>>> fig.show()

Plot time slice:

>>> fig = run.plot(start='2023-01-01T12:00:00',
...                end='2023-01-01T13:00:00')

recache_channel_metadata() → None[source]

Clear and rebuild the channel metadata cache from current HDF5 data.

This method reads all channel metadata from HDF5 storage and updates the internal cache. Useful when channel metadata has been modified externally or needs to be synchronized.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.recache_channel_metadata()
>>> # Cache is now synchronized with HDF5 storage

remove_channel(channel_name: str) → None[source]

Remove a channel from the run.

Deleting a channel is not as simple as del(channel). In HDF5, this does not free up memory; it simply removes the reference to that channel. The common way to get around this is to copy what you want into a new file, or overwrite the channel.

Parameters:: channel_name (str) – Name of the existing channel to remove.

Notes

Deleting a channel does not reduce the HDF5 file size. It simply removes the reference. If file size reduction is your goal, copy what you want into another file.

Todo: Need to remove summary table entry as well.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')
>>> run.remove_channel('ex')

property station_metadata: Station[source]

Get station metadata with current run included.

Returns:: Station metadata object containing this run’s information.
Return type:: metadata.Station

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> station_meta = run.station_metadata
>>> print(station_meta.id)
MT001

property survey_metadata: Survey[source]

Get survey metadata with current station and run included.

Returns:: Survey metadata object containing the full hierarchy.
Return type:: metadata.Survey

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> survey_meta = run.survey_metadata
>>> print(survey_meta.id)
CONUS_South

to_runts(start: str | None = None, end: str | None = None, n_samples: int | None = None) → RunTS[source]

Convert run to a RunTS timeseries object.

Combines all channels in the run into a RunTS object which handles multi-channel time series data with associated metadata.

Parameters:

start (str, optional) – Start time for time slice in ISO format (e.g., ‘2023-01-01T12:00:00’). If None, uses entire channel data. Default is None.
end (str, optional) – End time for time slice in ISO format. Only used if start is specified. Default is None.
n_samples (int, optional) – Number of samples to extract from start. If both end and n_samples are specified, end takes precedence. Default is None.

Returns:

RunTS object containing all channels with full run and station metadata.

Return type:

RunTS

Notes

Includes run, station, and survey metadata in the output
Skips the ‘summary’ group which is not a channel
If start is specified, performs time slicing; otherwise returns full data

Examples

Convert entire run to RunTS:

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> runts = run.to_runts()
>>> print(runts.channels)
['ex', 'ey', 'hx', 'hy']

Time slice the run:

>>> runts = run.to_runts(start='2023-01-01T12:00:00',
...                       end='2023-01-01T13:00:00')
>>> print(runts.ex.ts.shape)
(1024,)

update_metadata() → None[source]

Update run metadata from all channels and persist to HDF5.

Aggregates metadata from all channels including time period and sample rate, then writes updated metadata to HDF5 attributes.

Raises:: Exception – May raise exceptions if no channels exist (logs warning).

Notes

Updates:

Time period start from minimum of all channels
Time period end from maximum of all channels
Sample rate from first channel (assumes uniform across channels)

Should be called after adding or removing channels to maintain consistency between channel and run metadata.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.add_channel('ex', 'electric', data=ex_data)
>>> run.add_channel('ey', 'electric', data=ey_data)
>>> run.update_metadata()  # Updates time period and sample rate

update_run_metadata() → None[source]

Update metadata and table entries (Deprecated). .. deprecated:

Use update_metadata() instead.

Raises:: DeprecationWarning – Always raised to indicate this method should not be used.

write_metadata() → None[source]

Write run metadata to HDF5 attributes.

Converts metadata object to dictionary and writes all attributes to the HDF5 group.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.metadata.sample_rate = 512.0
>>> run.write_metadata()
>>> # Metadata is now persisted to HDF5 file

mth5.groups.standards module

Created on Wed Dec 23 17:05:33 2020

copyright:: Jared Peacock (jpeacock@usgs.gov)
license:: MIT

class mth5.groups.standards.StandardsGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Container for metadata standards documentation stored in the HDF5 file.

Stores metadata standards used throughout the survey in a standardized summary table. This enables users to understand metadata directly from the file without requiring external documentation.

The standards are organized in a summary table at /Survey/Standards/summary with columns for attribute name, type, requirements, style, units, and descriptions.

summary_table[source]

The standards summary table with metadata definitions.

Type:: MTH5Table

Notes

Standards include definitions for:

Survey, Station, Run, Electric, Magnetic, Auxiliary metadata
Filter types: Coefficient, FIR, FrequencyResponseTable, PoleZero, TimeDelay
Processing standards from aurora and fourier_coefficients modules

Examples

>>> with MTH5('survey.mth5') as mth5_obj:
...     standards = mth5_obj.standards_group
...     summary = standards.summary_table
...     print(summary.array.dtype.names)
('attribute', 'type', 'required', 'style', 'units', 'description', ...)

Get information about a specific attribute:

>>> standards.get_attribute_information('survey.release_license')
survey.release_license
--------------------------
        type          : string
        required      : True
        style         : controlled vocabulary
        ...

get_attribute_information(attribute_name: str) → None[source]

Print detailed information about a metadata attribute.

Retrieves and displays all metadata standards information for the specified attribute from the standards summary table.

Parameters:: attribute_name (str) – Name of the attribute to describe (e.g., ‘survey.release_license’).
Raises:: MTH5TableError – If the attribute is not found in the standards summary table.

Notes

Prints formatted output including:

Data type
Whether attribute is required
Style (e.g., controlled vocabulary)
Units
Description
Valid options
Aliases
Example values
Default value

Examples

>>> standards = mth5_obj.standards_group
>>> standards.get_attribute_information('survey.release_license')
survey.release_license
--------------------------
        type          : string
        required      : True
        style         : controlled vocabulary
        units         :
        description   : How the data can be used. The options are based on
                 Creative Commons licenses.
        options       : CC-0,CC-BY,CC-BY-SA,CC-BY-ND,CC-BY-NC-SA
        alias         :
        example       : CC-0
        default       : CC-0

get_standards_summary(modules: list[str] | None = None) → ndarray[source]

Get standards for specified metadata modules.

Retrieves and concatenates standards arrays from one or more metadata modules for inclusion in the standards table.

Parameters:: modules (list[str], optional) – List of module names to include (e.g., ‘timeseries’, ‘filters’). If None, uses default modules: common, timeseries, timeseries.filters, transfer_functions.tf, features, features.weights, processing, processing.fourier_coefficients, processing.aurora. Default is None.
Returns:: Concatenated numpy structured array containing standards for all requested modules with dtype matching STANDARDS_DTYPE.
Return type:: np.ndarray

Examples

>>> standards = StandardsGroup(group)
>>> ts_standards = standards.get_standards_summary(['timeseries'])
>>> print(ts_standards.shape)
(45,)

Get all default modules:

>>> all_standards = standards.get_standards_summary()

initialize_group() → None[source]

Initialize the standards group and create the summary table.

Creates the summary table dataset in the HDF5 file and populates it with metadata standards from all default modules. Sets appropriate HDF5 attributes and writes the group metadata.

Notes

Initialization process:

Creates HDF5 dataset for summary table with maximum expandable shape
Applies compression if configured in dataset_options
Sets HDF5 attributes: type, last_updated, reference
Populates table with standards from all default modules
Writes group metadata to HDF5

The summary table uses STANDARDS_DTYPE and supports up to 1000 rows.

Examples

>>> mth5_obj.initialize_group()
>>> summary_table = mth5_obj.standards_group.summary_table
>>> print(summary_table.array.shape)
(342,)

property summary_table: MTH5Table[source]

summary_table_from_array(array: ndarray) → None[source]

Populate summary table from a numpy structured array.

Converts a structured numpy array into rows in the HDF5 summary table.

Parameters:: array (np.ndarray) – Structured numpy array with dtype matching STANDARDS_DTYPE. Each row represents one metadata attribute definition.

Notes

Iterates through all rows of the structured array and adds them sequentially to the summary table using add_row().

Examples

>>> standards = StandardsGroup(group)
>>> standards_array = standards.get_standards_summary()
>>> standards.summary_table_from_array(standards_array)

summary_table_from_dict(summary_dict: dict[str, Any]) → None[source]

Populate summary table from a dictionary of metadata standards.

Converts a flattened dictionary of metadata standards into rows in the HDF5 summary table.

Parameters:: summary_dict (dict[str, Any]) – Flattened dictionary of all metadata standards. Keys are attribute names, values are dictionaries with type, required, style, units, description, etc.

Notes

Processes dictionary values:

Lists are converted to comma-separated strings
None values become empty strings
Bytes are decoded to UTF-8

Examples

>>> standards = StandardsGroup(group)
>>> metadata = summarize_metadata_standards()
>>> standards.summary_table_from_dict(metadata)

mth5.groups.standards.summarize_metadata_standards() → BaseDict[source]

Summarize metadata standards into a dictionary.

Aggregates metadata standard definitions from timeseries and filter classes, creating a flattened dictionary suitable for storage in the standards summary table.

Returns:: Flattened dictionary containing metadata standards for all supported classes (Survey, Station, Run, Electric, Magnetic, Auxiliary, and various Filter types).
Return type:: BaseDict

Notes

Creates copies of attribute dictionaries to avoid mutations to the original class definitions.

Examples

>>> standards = summarize_metadata_standards()
>>> 'survey' in standards
True
>>> 'electric' in standards
True

mth5.groups.station module

class mth5.groups.station.MasterStationGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Collection helper for all stations in a survey.

The group lives at /Survey/Stations and offers convenience accessors to add, fetch, or remove stations along with a summary table.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> _ = mth5_obj.open_mth5("/tmp/example.mth5", mode="a")
>>> stations = mth5_obj.stations_group
>>> _ = stations.add_station("MT001")
>>> stations.station_summary.head()

add_station(station_name: str, station_metadata: Station | None = None) → StationGroup[source]

Add or fetch a station group at /Survey/Stations/<name>.

Parameters:

station_name (str) – Station identifier, typically matches metadata.id.
station_metadata (mt_metadata.timeseries.Station, optional) – Metadata container to seed the station attributes.

Returns:

Convenience wrapper for the created or existing station.

Return type:

StationGroup

Raises:

ValueError – If station_name is empty.

Examples

>>> station = stations.add_station("MT001")
>>> station.metadata.id
'MT001'

get_station(station_name: str) → StationGroup[source]

Return an existing station by name.

Parameters:: station_name (str) – Name of the station to retrieve.
Returns:: Wrapper for the requested station.
Return type:: StationGroup
Raises:: MTH5Error – If the station does not exist.

Examples

>>> existing = stations.get_station("MT001")
>>> existing.name
'MT001'

remove_station(station_name: str) → None[source]

Delete a station group reference from the file.

Parameters:: station_name (str) – Existing station name.

Notes

HDF5 deletion removes the reference only; underlying storage is not reclaimed.

Examples

>>> stations.remove_station("MT001")

property station_summary: DataFrame[source]

Return a summary DataFrame of all stations in the file.

Returns:: Columns include station, start, end, latitude, and longitude. Empty if no stations are present.
Return type:: pandas.DataFrame

Notes

Timestamps are parsed to pandas datetime64[ns] when possible.

Examples

>>> summary = stations.station_summary
>>> list(summary.columns)
['station', 'start', 'end', 'latitude', 'longitude']

class mth5.groups.station.StationGroup(group: Group, station_metadata: Station | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Utility wrapper for a single station at /Survey/Stations/<id>.

Station groups manage run collections, metadata propagation, and provide summary utilities for quick inspection.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> station = m5.stations_group.add_station("MT001")
>>> _ = station.add_run("MT001a")
>>> station.run_summary.shape[0] >= 1
True

add_run(run_name: str, run_metadata: Run | None = None) → RunGroup[source]

Add a run under this station.

Parameters:

run_name (str) – Run identifier (for example id + suffix).
run_metadata (mt_metadata.timeseries.Run, optional) – Metadata container to seed the run attributes.

Returns:

Wrapper for the created or existing run.

Return type:

RunGroup

Examples

>>> run = station.add_run("MT001a")
>>> run.metadata.id
'MT001a'

property features_group: MasterFeaturesGroup[source]: Convenience accessor for /Station/Features.

property fourier_coefficients_group: MasterFCGroup[source]: Convenience accessor for /Station/Fourier_Coefficients.

get_run(run_name: str) → RunGroup[source]

Return a run by name.

Parameters:: run_name (str) – Existing run name.
Returns:: Wrapper for the requested run.
Return type:: RunGroup
Raises:: MTH5Error – If the run does not exist.

Examples

>>> existing_run = station.get_run("MT001a")
>>> existing_run.name
'MT001a'

initialize_group(**kwargs: Any) → None[source]

Create default subgroups and write metadata.

Parameters:: **kwargs – Additional attributes to set on the instance before initialization.

Examples

>>> station.initialize_group()

locate_run(sample_rate: float, start: str | MTime) → DataFrame | None[source]

Locate runs matching a sample rate and start time.

Parameters:

sample_rate (float) – Sample rate in samples per second.
start (str or MTime) – Start time string or MTime instance.

Returns:

Matching rows from run_summary or None when no match exists.

Return type:

pandas.DataFrame or None

Examples

>>> station.locate_run(256.0, "2020-01-01T00:00:00")

make_run_name(alphabet: bool = False) → str | None[source]

Generate the next run name using an alphabetic or numeric suffix.

Parameters:: alphabet (bool, default False) – If True use letters (a, b, …); otherwise use numeric suffixes (001).
Returns:: Proposed run name or None if generation fails.
Return type:: str or None

Examples

>>> station.metadata.id = "MT001"
>>> station.make_run_name()
'MT001a'

property master_station_group: MasterStationGroup[source]: Shortcut to the containing master station group.

property metadata: Station[source]: Station metadata enriched with run information.

property name: str[source]

remove_run(run_name: str) → None[source]

Remove a run from this station.

Parameters:: run_name (str) – Existing run name.

Notes

Deleting removes the reference only; storage is not reclaimed.

Examples

>>> station.remove_run("MT001a")

property run_summary: DataFrame[source]

Return a summary of runs belonging to the station.

Returns:: Columns include id, start, end, components, measurement_type, sample_rate, and hdf5_reference.
Return type:: pandas.DataFrame

Notes

Channel lists stored as byte arrays or JSON strings are normalized before summarization.

Examples

>>> station.run_summary.head()

property survey_metadata: Survey[source]: Return survey metadata with this station appended.

property transfer_functions_group: TransferFunctionsGroup[source]: Convenience accessor for /Station/Transfer_Functions.

update_metadata() → None[source]

Synchronize station metadata from contained runs.

Notes

The station time_period is set to the min/max of all runs, and channels_recorded combines all recorded components.

Examples

>>> _ = station.update_metadata()
>>> station.metadata.time_period.start
'2020-01-01T00:00:00'

update_station_metadata() → None[source]

Deprecated alias for update_metadata().

Raises:: DeprecationWarning – Always raised to direct callers to update_metadata.

Examples

>>> station.update_station_metadata()
Traceback (most recent call last):
...
DeprecationWarning: 'update_station_metadata' has been deprecated use 'update_metadata()'

mth5.groups.survey module

class mth5.groups.survey.MasterSurveyGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Collection helper for surveys under Experiment/Surveys.

Provides helpers to add, fetch, or remove surveys and to summarize all channels in the experiment.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> surveys = m5.surveys_group
>>> _ = surveys.add_survey("survey_01")
>>> surveys.channel_summary.head()

add_survey(survey_name: str, survey_metadata: Survey | None = None) → SurveyGroup[source]

Add or fetch a survey at /Experiment/Surveys/<name>.

Parameters:

survey_name (str) – Survey identifier; validated with validate_name.
survey_metadata (Survey, optional) – Metadata container used to seed the survey attributes.

Returns:

Wrapper for the created or existing survey.

Return type:

SurveyGroup

Raises:

ValueError – If survey_name is empty.
MTH5Error – If the provided metadata id conflicts with the group name.

Examples

>>> survey = surveys.add_survey("survey_01")
>>> survey.metadata.id
'survey_01'

property channel_summary: DataFrame[source]

Return a DataFrame summarizing all channels across surveys.

Returns:: Columns include survey, station, run, location, component, start/end, sample info, orientation, units, and HDF5 reference.
Return type:: pandas.DataFrame

Examples

>>> summary = surveys.channel_summary
>>> set(summary.columns) >= {"survey", "station", "run", "component"}
True

get_survey(survey_name: str) → SurveyGroup[source]

Return an existing survey by name.

Parameters:: survey_name (str) – Existing survey name.
Returns:: Wrapper for the requested survey.
Return type:: SurveyGroup
Raises:: MTH5Error – If the survey does not exist.

Examples

>>> existing = surveys.get_survey("survey_01")
>>> existing.metadata.id
'survey_01'

remove_survey(survey_name: str) → None[source]

Delete a survey reference from the file.

Parameters:: survey_name (str) – Existing survey name.

Notes

HDF5 deletion removes the reference only; storage is not reclaimed.

Examples

>>> surveys.remove_survey("survey_01")

class mth5.groups.survey.SurveyGroup(group: Group, survey_metadata: Survey | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Wrapper for a single survey at Experiment/Surveys/<id>.

Handles survey-level metadata, child groups (stations, reports, filters, standards), and synchronization utilities.

Examples

>>> survey = surveys.add_survey("survey_01")
>>> survey.metadata.id
'survey_01'

property filters_group: FiltersGroup[source]: Convenience accessor for /Survey/Filters group.

initialize_group(**kwargs: Any) → None[source]

Create default subgroups and write survey metadata.

Parameters:: **kwargs – Additional attributes to set on the instance before initialization.

Examples

>>> survey.initialize_group()

property metadata: Survey[source]: Survey metadata enriched with station and filter information.

property reports_group: ReportsGroup[source]: Convenience accessor for /Survey/Reports group.

property standards_group: StandardsGroup[source]: Convenience accessor for /Survey/Standards group.

property stations_group: MasterStationGroup[source]

update_metadata(survey_dict: dict[str, Any] | None = None) → None[source]

Synchronize survey metadata from station summaries.

Parameters:: survey_dict (dict, optional) – Additional metadata values to merge before synchronization.

Notes

Updates survey start/end dates and bounding box from station summaries, then writes metadata to HDF5.

Examples

>>> _ = survey.update_metadata()
>>> survey.metadata.time_period.start_date
'2020-01-01'

update_survey_metadata(survey_dict: dict[str, Any] | None = None) → None[source]

Deprecated alias for update_metadata().

Raises:: DeprecationWarning – Always raised to direct callers to update_metadata.

Examples

>>> survey.update_survey_metadata()
Traceback (most recent call last):
...
DeprecationWarning: 'update_survey_metadata' has been deprecated use 'update_metadata()'

write_metadata() → None[source]: Write HDF5 attributes from the survey metadata object.

mth5.groups.transfer_function module

class mth5.groups.transfer_function.TransferFunctionGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Wrapper for a single transfer function estimation.

Add a statistical estimate dataset.

Parameters:

estimate_name (str) – Dataset name.
estimate_data (numpy.ndarray or xarray.DataArray, optional) – Estimate values; if None a placeholder array is created.
estimate_metadata (StatisticalEstimate, optional) – Metadata describing the estimate.
max_shape (tuple of int or None, default (None, None, None)) – Maximum shape for resizable datasets.
chunks (bool, default True) – Chunking flag forwarded to HDF5 dataset creation.

Returns:

Wrapper combining dataset and metadata.

Return type:

EstimateDataset

Raises:

TypeError – If estimate_data is not array-like.

Examples

>>> est = tf_group.add_statistical_estimate("transfer_function")
>>> isinstance(est, EstimateDataset)
True

from_tf_object(tf_obj: TF, update_metadata: bool = True) → None[source]

Populate datasets from a TF object.

Parameters:

tf_obj (TF) – Transfer function object containing estimates and metadata.
update_metadata (bool, default True) – If True write transfer function metadata to HDF5.

Raises:

ValueError – If tf_obj is not a TF instance.

Examples

>>> tf_group.from_tf_object(tf_obj)

get_estimate(estimate_name: str) → EstimateDataset[source]: Return a statistical estimate dataset by name.

has_estimate(estimate: str) → bool[source]: Return True if an estimate exists and is populated.

property period: ndarray | None[source]: Return period array stored in period dataset, if present.

remove_estimate(estimate_name: str) → None[source]: Remove a statistical estimate dataset reference.

to_tf_object() → TF[source]

Convert this group into a populated TF object.

Returns:: TF instance with survey, station, runs, channels, period, and estimate datasets applied.
Return type:: mt_metadata.transfer_functions.core.TF
Raises:: ValueError – If no period dataset is present.

Examples

>>> tf_obj = tf_group.to_tf_object()

class mth5.groups.transfer_function.TransferFunctionsGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Container for transfer functions under a station.

Each child group is a single transfer function estimation managed by TransferFunctionGroup.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> station = m5.stations_group.add_station("mt01")
>>> tf_group = station.transfer_functions_group
>>> tf_group.groups_list
[]

add_transfer_function(name: str, tf_object: TF | None = None) → TransferFunctionGroup[source]

Add a transfer function group under this station.

Parameters:

name (str) – Transfer function identifier.
tf_object (TF, optional) – Transfer function instance to seed metadata and datasets.

Returns:

Wrapper for the created or existing transfer function.

Return type:

TransferFunctionGroup

Examples

>>> tf_group = station.transfer_functions_group
>>> _ = tf_group.add_transfer_function("mt01_4096")

get_tf_object(tf_id: str) → TF[source]

Return a populated mt_metadata.transfer_functions.core.TF.

Parameters:: tf_id (str) – Transfer function name to convert.
Returns:: Transfer function populated with metadata and estimates.
Return type:: mt_metadata.transfer_functions.core.TF

Examples

>>> tf_obj = tf_group.get_tf_object("mt01_4096")

get_transfer_function(tf_id: str) → TransferFunctionGroup[source]

Return an existing transfer function by id.

Parameters:: tf_id (str) – Name of the transfer function.
Returns:: Wrapper for the requested transfer function.
Return type:: TransferFunctionGroup
Raises:: MTH5Error – If the transfer function does not exist.

Examples

>>> existing = station.transfer_functions_group.get_transfer_function("mt01_4096")
>>> existing.name
'mt01_4096'

remove_transfer_function(tf_id: str) → None[source]

Delete a transfer function reference from the station.

Parameters:: tf_id (str) – Transfer function name.

Notes

HDF5 deletion removes the reference only; storage is not reclaimed.

Examples

>>> tf_group.remove_transfer_function("mt01_4096")

tf_summary(as_dataframe: bool = True) → DataFrame | ndarray[source]

Summarize transfer functions stored for the station.

Parameters:: as_dataframe (bool, default True) – If True return a pandas DataFrame, otherwise a NumPy structured array.
Returns:: Summary rows including station reference, location, and TF metadata.
Return type:: pandas.DataFrame or numpy.ndarray

Examples

>>> summary = tf_group.tf_summary()
>>> summary.columns[:4].tolist()
['station_hdf5_reference', 'station', 'latitude', 'longitude']

Module contents

Import all Group objects

class mth5.groups.AuxiliaryDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for auxiliary channel data.

Inherits all functionality from ChannelDataset with auxiliary channel specific metadata handling. Used for temperature, battery voltage, etc.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing auxiliary data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> temp_dataset = run_group.get_channel('Temperature')
>>> print(type(temp_dataset))
<class 'mth5.groups.channel_dataset.AuxiliaryDataset'>
>>> print(temp_dataset.metadata.type)
'auxiliary'
>>> print(temp_dataset.metadata.units)
'celsius'

class mth5.groups.BaseGroup(group: Group | Dataset, group_metadata: MetadataBase | None = None, **kwargs: Any)[source]

Bases: object

Base class for HDF5 group management with metadata handling.

Provides core functionality for reading, writing, and managing HDF5 groups with integrated metadata validation using mt_metadata standards.

Parameters:

group (h5py.Group or h5py.Dataset) – HDF5 group or dataset object to wrap.
group_metadata (MetadataBase, optional) – Metadata container with validated attributes. Default is None.
**kwargs (dict) – Additional keyword arguments to set as instance attributes.

hdf5_group

Weak reference to the underlying HDF5 group.

Type:: h5py.Group or h5py.Dataset

metadata

Metadata object with validation and standards compliance.

Type:: MetadataBase

logger

Logger instance for tracking operations.

Type:: loguru.Logger

compression

HDF5 compression method (e.g., ‘gzip’).

Type:: str, optional

compression_opts

Compression options/level.

Type:: int, optional

shuffle

Enable HDF5 shuffle filter. Default is False.

Type:: bool

fletcher32

Enable HDF5 Fletcher32 checksum. Default is False.

Type:: bool

Notes

All HDF5 group references are weak references to prevent lingering file references after the group is closed.
Metadata changes should be written using write_metadata() method.
This is a base class inherited by more specific group types like SurveyGroup, StationGroup, RunGroup, etc.

Examples

Create and manage a group with metadata

>>> import h5py
>>> with h5py.File('data.h5', 'r+') as f:
...     group = f.create_group('MyGroup')
...     base_obj = BaseGroup(group)
...     print(base_obj)
...     # Set and write metadata
...     base_obj.metadata.id = 'MyGroup'
...     base_obj.write_metadata()

Access metadata and group structure

>>> print(base_obj.metadata.id)
'MyGroup'
>>> print(base_obj.groups_list)
['subgroup1', 'subgroup2']
>>> print(base_obj.hdf5_group.ref)  # Get HDF5 reference
<HDF5 Group Reference>

property dataset_options: dict[str, Any]

Get the HDF5 dataset creation options.

Returns:: Dictionary containing compression, shuffle, and checksum settings.
Return type:: dict

Examples

>>> options = base_obj.dataset_options
>>> print(options)
{'compression': 'gzip', 'compression_opts': 4,
 'shuffle': True, 'fletcher32': False}

property groups_list: list[str]

Get list of all subgroup names in the HDF5 group.

Returns:: Names of all subgroups and datasets.
Return type:: list of str

Examples

>>> print(base_obj.groups_list)
['Station_001', 'Station_002', 'metadata']

initialize_group(**kwargs: Any) → None[source]

Initialize group by setting attributes and writing metadata.

Convenience method that sets keyword arguments as instance attributes and writes all metadata to the HDF5 file.

Parameters:: **kwargs (dict) – Key-value pairs to set as instance attributes.

Examples

Initialize with compression settings

>>> base_obj.initialize_group(
...     compression='gzip',
...     compression_opts=4,
...     shuffle=True
... )

property metadata: MetadataBase

Get metadata object with lazy loading from HDF5 attributes.

Returns:: Metadata container with all attributes and validation.
Return type:: MetadataBase

Notes

Metadata is loaded on first access and cached for subsequent accesses.

Examples

>>> meta = base_obj.metadata
>>> print(meta.id)
'MyGroup'
>>> print(meta.mth5_type)
'Survey'

read_metadata() → None[source]

Read metadata from HDF5 group attributes into metadata object.

Loads all HDF5 attributes and converts them to appropriate Python types before populating the metadata object with validation.

Notes

This method is called automatically on first metadata access if metadata has not been read yet. Empty attributes are skipped with a debug message.

Examples

Manually read metadata after file changes

>>> base_obj.read_metadata()
>>> print(base_obj.metadata.id)
'MyGroup'

Check what attributes were read

>>> base_obj.read_metadata()
>>> attrs = list(base_obj.metadata.to_dict().keys())
>>> print(f"Attributes: {attrs}")
Attributes: ['id', 'comments', 'provenance']

rename_group(new_name: str) → None[source]

Rename the current group in the HDF5 file.

Parameters:: new_name (str) – New name for the group. Will be validated and normalized.
Raises:: MTH5Error – If renaming fails due to read-only mode or other issues.

Examples

Rename a group

>>> print(survey_obj.hdf5_group.name)
'/OldSurveyName'
>>> survey_obj.rename_group('NewSurveyName')
>>> print(survey_obj.hdf5_group.name)
'/NewSurveyName'

write_metadata() → None[source]

Write metadata from object to HDF5 group attributes.

Converts metadata values to numpy-compatible types before writing to HDF5 attributes. Handles read-only mode gracefully with warnings.

Raises:

KeyError – If HDF5 write fails for reasons other than read-only mode.
ValueError – If synchronous group creation fails for reasons other than read-only mode.

Notes

Keys that already exist are overwritten.
Read-only files will log a warning instead of raising an error.
This method should be called after any metadata changes.

Examples

Update metadata and write to file

>>> base_obj.metadata.id = 'UpdatedGroup'
>>> base_obj.metadata.comments = 'New comments'
>>> base_obj.write_metadata()

Verify write by reloading

>>> base_obj._has_read_metadata = False
>>> base_obj.read_metadata()
>>> print(base_obj.metadata.id)
'UpdatedGroup'

class mth5.groups.ChannelDataset(dataset: Dataset | None, dataset_metadata: MetadataBase | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

A container for channel time series data stored in HDF5 format.

This class provides a flexible interface to work with magnetotelluric channel data, allowing conversion to various formats (xarray, pandas, numpy) while maintaining metadata integrity.

Parameters:

dataset (h5py.Dataset or None) – HDF5 dataset object containing the channel time series data.
dataset_metadata (MetadataBase, optional) – Metadata container for Electric, Magnetic, or Auxiliary channel types. Default is None.
write_metadata (bool, optional) – Whether to write metadata to the HDF5 dataset on initialization. Default is True.
**kwargs (dict) – Additional keyword arguments to set as instance attributes.

hdf5_dataset

Weak reference to the underlying HDF5 dataset.

Type:: h5py.Dataset

metadata

Channel metadata object with validation.

Type:: MetadataBase

logger

Logger instance for tracking operations.

Type:: loguru.Logger

Raises:: MTH5Error – If the dataset is not of the correct type or metadata validation fails.

See also

ElectricDataset: Specialized container for electric field channels.
MagneticDataset: Specialized container for magnetic field channels.
AuxiliaryDataset: Specialized container for auxiliary channels.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')
>>> channel = run.get_channel('Ex')
>>> channel
Channel Electric:
-------------------
  component:        Ex
  data type:        electric
  data format:      float32
  data shape:       (4096,)
  start:            1980-01-01T00:00:00+00:00
  end:              1980-01-01T00:00:01+00:00
  sample rate:      4096

Access time series data

>>> ts_data = channel.to_channel_ts()
>>> print(f"Mean: {ts_data.ts.mean():.2f}, Std: {ts_data.ts.std():.2f}")

Convert to xarray for time-based indexing

>>> xr_data = channel.to_xarray()
>>> subset = xr_data.sel(time=slice('1980-01-01T00:00:00', '1980-01-01T00:00:10'))

property channel_entry: ndarray

Create a structured array entry for channel summary tables.

Returns:: Structured array with dtype=CHANNEL_DTYPE containing channel metadata and HDF5 references for survey-wide summaries.
Return type:: np.ndarray

Notes

This entry includes survey ID, station ID, run ID, location, component, time period, sample rate, and HDF5 references for navigation.

Examples

>>> entry = channel.channel_entry
>>> print(entry['component'][0])
'Ex'
>>> print(entry['sample_rate'][0])
256.0
>>> print(entry['station'][0])
'MT001'

property channel_response: ChannelResponse

Get the complete channel response from applied filters.

Constructs a ChannelResponse object by retrieving all filters referenced in the channel metadata from the survey’s Filters group.

Returns:: Channel response object containing all applied filters in sequence.
Return type:: ChannelResponse

Notes

Filters are applied in the order specified by their sequence_number. Filter names are normalized by replacing ‘/’ with ‘ per ‘ and converting to lowercase.

Examples

>>> response = channel.channel_response
>>> print(f"Number of filters: {len(response.filters_list)}")
Number of filters: 3
>>> for filt in response.filters_list:
...     print(f"{filt.name}: {filt.type}")
zpk: zpk
coefficient: coefficient
time delay: time_delay

property end: MTime

Calculate the end time based on start time, sample rate, and number of samples.

Returns:: Calculated end time of the data.
Return type:: MTime

Notes

End time is calculated as: start + (n_samples - 1) / sample_rate The -1 ensures the last sample falls exactly at the end time.

Examples

>>> print(f"Duration: {channel.end - channel.start} seconds")
Duration: 3600.0 seconds
>>> print(channel.end.iso_str)
'1980-01-01T01:00:00.000000+00:00'

Extend or prepend data to the existing dataset with gap handling.

Intelligently adds new data before, after, or within the existing time series. Handles time alignment, overlaps, and gaps with configurable fill strategies.

Parameters:

new_data_array (np.ndarray) – New data array with shape (npts,).
start_time (str or MTime) – Start time of the new data array in UTC.
sample_rate (float) – Sample rate of the new data array in Hz. Must match existing sample rate.
fill (str, float, int, or None, optional) –
Strategy for filling data gaps:
- None : Raise MTH5Error if gap exists (default)
- ’mean’ : Fill with mean of both datasets within fill_window
- ’median’ : Fill with median of both datasets within fill_window
- ’nan’ : Fill with NaN values
- numeric value : Fill with specified constant
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Exceeding this raises MTH5Error. Default is 1 second.
fill_window (int, optional) – Number of points from each dataset edge to estimate fill values. Default is 10 points.

Raises:

MTH5Error – If sample rates don’t match, gap exceeds max_gap_seconds, or fill strategy is invalid.
TypeError – If new_data_array cannot be converted to numpy array.

Notes

Prepend: New data start < existing start
Append: New data start > existing end
Overwrite: New data overlaps existing data

The dataset is automatically resized to accommodate new data.

Examples

Append data with a small gap

>>> ex = mth5_obj.get_channel('MT001', 'MT001a', 'Ex')
>>> print(f"Original: {ex.n_samples} samples, ends {ex.end}")
Original: 4096 samples, ends 2015-01-08T19:32:09.500000+00:00
>>> new_data = np.random.randn(4096)
>>> new_start = (ex.end + 0.5).isoformat()  # 0.5s gap
>>> ex.extend_dataset(new_data, new_start, ex.sample_rate,
...                   fill='median', max_gap_seconds=2)
>>> print(f"Extended: {ex.n_samples} samples, ends {ex.end}")
Extended: 8200 samples, ends 2015-01-08T19:40:42.500000+00:00

Prepend data seamlessly

>>> prepend_data = np.random.randn(2048)
>>> prepend_start = (ex.start - 2048/ex.sample_rate).isoformat()
>>> ex.extend_dataset(prepend_data, prepend_start, ex.sample_rate)
>>> print(f"New start: {ex.start}")

Overwrite section of existing data

>>> replacement_data = np.zeros(1024)
>>> replace_start = (ex.start + 1.0).isoformat()  # 1s after start
>>> ex.extend_dataset(replacement_data, replace_start, ex.sample_rate)

from_channel_ts(channel_ts_obj: ChannelTS, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) → None[source]

Populate the dataset from a ChannelTS object.

Parameters:

channel_ts_obj (ChannelTS) – Time series object containing data and metadata.
how ({'replace', 'extend'}, optional) –
Method for adding data:
- ’replace’ : Replace entire dataset (default)
- ’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
- None : Raise error on gaps (default)
- ’mean’ : Fill with mean of both datasets
- ’median’ : Fill with median of both datasets
- ’nan’ : Fill with NaN
- numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.

Raises:

TypeError – If channel_ts_obj is not a ChannelTS instance.
MTH5Error – If time alignment or metadata validation fails.

Examples

Replace entire dataset

>>> from mt_timeseries import ChannelTS
>>> import numpy as np
>>> ts = ChannelTS(
...     channel_type='electric',
...     data=np.random.randn(1000),
...     channel_metadata={'electric': {
...         'component': 'ex',
...         'sample_rate': 256.0
...     }}
... )
>>> channel.from_channel_ts(ts, how='replace')
>>> print(channel.n_samples)
1000

Extend existing dataset

>>> new_ts = ChannelTS(
...     channel_type='electric',
...     data=np.random.randn(500),
...     channel_metadata={'electric': {
...         'component': 'ex',
...         'sample_rate': 256.0,
...         'time_period.start': channel.end.isoformat()
...     }}
... )
>>> channel.from_channel_ts(new_ts, how='extend', fill='median')
>>> print(channel.n_samples)
1500

from_xarray(data_array: DataArray, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) → None[source]

Populate the dataset from an xarray DataArray.

Parameters:

data_array (xr.DataArray) – DataArray with time coordinate and metadata in attrs.
how ({'replace', 'extend'}, optional) –
Method for adding data:
- ’replace’ : Replace entire dataset (default)
- ’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
- None : Raise error on gaps (default)
- ’mean’ : Fill with mean of both datasets
- ’median’ : Fill with median of both datasets
- ’nan’ : Fill with NaN
- numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.

Raises:

TypeError – If data_array is not an xarray.DataArray.
MTH5Error – If time alignment fails.

Examples

Replace from xarray

>>> import xarray as xr
>>> import numpy as np
>>> import pandas as pd
>>> time = pd.date_range('2020-01-01', periods=1000, freq='0.004S')
>>> data = xr.DataArray(
...     np.random.randn(1000),
...     coords=[('time', time)],
...     attrs={'component': 'ex', 'sample_rate': 256.0}
... )
>>> channel.from_xarray(data, how='replace')
>>> print(channel.n_samples)
1000

Extend from xarray with gap

>>> time2 = pd.date_range('2020-01-01T00:00:05', periods=500, freq='0.004S')
>>> data2 = xr.DataArray(np.random.randn(500), coords=[('time', time2)])
>>> channel.from_xarray(data2, how='extend', fill='mean')

get_index_from_end_time(given_time: str | MTime) → int[source]

Get the end index value (inclusive) for a given time.

Parameters:: given_time (str or MTime) – Time to convert to end index.
Returns:: Array index + 1 for inclusive slicing.
Return type:: int

Notes

Adds 1 to the calculated index to make it suitable for inclusive end slicing (e.g., array[start:end]).

Examples

>>> end_idx = channel.get_index_from_end_time('1980-01-01T00:00:10')
>>> data_slice = channel.hdf5_dataset[0:end_idx]
>>> # Includes sample at exactly 10 seconds

get_index_from_time(given_time: str | MTime) → int[source]

Calculate the array index for a given time.

Parameters:: given_time (str or MTime) – Time to convert to index.
Returns:: Array index corresponding to the given time.
Return type:: int

Notes

Index is calculated as: (time - start_time) * sample_rate and rounded to nearest integer.

Examples

>>> idx = channel.get_index_from_time('1980-01-01T00:00:10')
>>> print(f"Index for 10 seconds: {idx}")
Index for 10 seconds: 2560
>>> # With 256 Hz sample rate: 10 * 256 = 2560

>>> start_idx = channel.get_index_from_time(channel.start)
>>> print(start_idx)
0

has_data() → bool[source]

Check if the channel contains non-zero data.

Returns:: True if dataset has non-zero values, False if all zeros or empty.
Return type:: bool

Examples

>>> if channel.has_data():
...     print("Channel has valid data")
... else:
...     print("Channel is empty or all zeros")
Channel has valid data

>>> empty_channel.has_data()
False

property n_samples: int

Get the total number of samples in the dataset.

Returns:: Number of data points in the time series.
Return type:: int

Examples

>>> print(f"Total samples: {channel.n_samples:,}")
Total samples: 921,600
>>> duration = channel.n_samples / channel.sample_rate
>>> print(f"Duration: {duration/3600:.1f} hours")
Duration: 1.0 hours

read_metadata() → None[source]

Read metadata from HDF5 attributes into the metadata container.

Loads all HDF5 attributes from the dataset and converts them to the appropriate Python types before populating the metadata object.

For older MTH5 files, this method attempts to coerce values to the expected types based on the metadata schema to maintain backwards compatibility.

Notes

This method automatically validates metadata through the metadata container’s validators. Type coercion is applied to handle older file formats that may have stored metadata with different types.

Examples

>>> channel.read_metadata()
>>> print(channel.metadata.component)
'Ex'
>>> print(channel.metadata.sample_rate)
256.0

Handles type coercion for older files

>>> # If sample_rate was stored as string '256.0' in old file
>>> channel.read_metadata()
>>> print(type(channel.metadata.sample_rate))
<class 'float'>

replace_dataset(new_data_array: ndarray) → None[source]

Replace the entire dataset with new data.

Parameters:: new_data_array (np.ndarray) – New data array with shape (npts,). Must be 1-dimensional.
Raises:: TypeError – If new_data_array cannot be converted to numpy array.

Notes

The HDF5 dataset will be resized if the new array has a different shape. All existing data will be overwritten.

Examples

Replace with synthetic data

>>> import numpy as np
>>> new_data = np.sin(2 * np.pi * 1.0 * np.linspace(0, 10, 2560))
>>> channel.replace_dataset(new_data)
>>> print(f"New shape: {channel.hdf5_dataset.shape}")
New shape: (2560,)

Replace with processed data

>>> original = channel.hdf5_dataset[:]
>>> filtered = np.convolve(original, np.ones(5)/5, mode='same')
>>> channel.replace_dataset(filtered)

property run_metadata: Run

Get the run-level metadata containing this channel.

Returns:: Run metadata object with channel information included.
Return type:: metadata.Run

Examples

>>> run_meta = channel.run_metadata
>>> print(run_meta.id)
'MT001a'
>>> print(run_meta.channels_recorded_electric)
['Ex', 'Ey']

property sample_rate: float

Get the sample rate in samples per second.

Returns:: Sample rate in Hz.
Return type:: float

Examples

>>> print(f"Sample rate: {channel.sample_rate} Hz")
Sample rate: 256.0 Hz

property start: MTime

Get the start time of the channel data.

Returns:: Start time from metadata.time_period.start.
Return type:: MTime

Examples

>>> print(channel.start)
1980-01-01T00:00:00+00:00
>>> print(channel.start.iso_str)
'1980-01-01T00:00:00.000000+00:00'

property station_metadata: Station

Get the station-level metadata containing this channel.

Returns:: Station metadata object with run and channel information.
Return type:: metadata.Station

Examples

>>> station_meta = channel.station_metadata
>>> print(f"{station_meta.id}: {station_meta.location.latitude}, {station_meta.location.longitude}")
'MT001: 40.5, -112.3'

property survey_id: str

Get the survey identifier.

Returns:: Survey ID string.
Return type:: str

Examples

>>> print(channel.survey_id)
'MT_Survey_2023'

property survey_metadata: Survey

Get the survey-level metadata containing this channel.

Returns:: Complete survey metadata hierarchy including this channel.
Return type:: metadata.Survey

Examples

>>> survey_meta = channel.survey_metadata
>>> print(survey_meta.id)
'MT Survey 2023'
>>> print(f"Stations: {len(survey_meta.stations)}")
Stations: 15

property time_index: DatetimeIndex

Create a time index for the dataset based on metadata.

Returns:: Pandas datetime index spanning the entire dataset.
Return type:: pd.DatetimeIndex

Notes

The time index is useful for time-based queries and slicing operations. It is generated dynamically from start time, sample rate, and number of samples.

Examples

>>> time_idx = channel.time_index
>>> print(time_idx[0], time_idx[-1])
1980-01-01 00:00:00 1980-01-01 00:59:59.996093750
>>> print(f"Index length: {len(time_idx)}")
Index length: 921600

Extract a time slice from the channel dataset.

Parameters:

start (str or MTime) – Start time of the slice in UTC.
end (str or MTime, optional) – End time of the slice. Mutually exclusive with n_samples.
n_samples (int, optional) – Number of samples to extract. Mutually exclusive with end.
return_type ({'channel_ts', 'xarray', 'pandas', 'numpy'}, optional) – Format for returned data. Default is ‘channel_ts’.

Returns:

Time slice in the requested format with appropriate metadata.

Return type:

ChannelTS or xr.DataArray or pd.DataFrame or np.ndarray

Raises:

ValueError – If both end and n_samples are provided or neither is provided.

Notes

If the requested slice extends beyond available data, it will be automatically truncated with a warning.
Regional HDF5 references are used when possible for efficiency.

Examples

Extract by number of samples

>>> ex = mth5_obj.get_channel('FL001', 'FL001a', 'Ex')
>>> ex_slice = ex.time_slice("2015-01-08T19:49:15", n_samples=4096)
>>> print(type(ex_slice))
<class 'mt_timeseries.channel_ts.ChannelTS'>
>>> print(f"Slice shape: {ex_slice.ts.shape}")
Slice shape: (4096,)
>>> ex_slice.plot()

Extract by time range

>>> ex_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     end="2015-01-08T20:49:15"
... )
>>> print(f"Duration: {ex_slice.end - ex_slice.start} seconds")
Duration: 3600.0 seconds

Return as xarray for analysis

>>> xr_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=1000,
...     return_type='xarray'
... )
>>> print(xr_slice.mean().values)
0.152
>>> xr_slice.plot()

Return as pandas for tabular ops

>>> df_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=500,
...     return_type='pandas'
... )
>>> df_slice['data'].describe()
>>> df_slice.resample('10S').mean()

Return as numpy for computation

>>> np_slice = ex.time_slice(
...     "2015-01-08T19:49:15",
...     n_samples=100,
...     return_type='numpy'
... )
>>> np.fft.fft(np_slice)

to_channel_ts() → ChannelTS[source]

Convert the dataset to a ChannelTS object with full metadata.

Returns:: Time series object with data, metadata, and channel response.
Return type:: ChannelTS

Notes

Data is loaded into memory. The resulting ChannelTS object is independent of the HDF5 file and can be modified without affecting the original dataset.

Examples

>>> ts = channel.to_channel_ts()
>>> print(f"Type: {type(ts)}")
Type: <class 'mt_timeseries.channel_ts.ChannelTS'>
>>> print(f"Shape: {ts.ts.shape}, Mean: {ts.ts.mean():.2f}")
Shape: (4096,), Mean: 0.15

Process the time series

>>> filtered_ts = ts.low_pass_filter(cutoff=10.0)
>>> detrended_ts = ts.detrend('linear')
>>> ts.plot()

to_dataframe() → DataFrame[source]

Convert the dataset to a pandas DataFrame with time index.

Returns:: DataFrame with ‘data’ column and time index. Metadata stored in attrs.
Return type:: pd.DataFrame

Notes

Data is loaded into memory. Metadata is stored in the experimental attrs attribute and will not be validated if modified.

Examples

>>> df = channel.to_dataframe()
>>> print(df.head())
                     data
time
1980-01-01 00:00:00  0.931
1980-01-01 00:00:00  0.142
...

Use pandas operations

>>> df['data'].describe()
>>> df.resample('1H').mean()
>>> df.plot(y='data', figsize=(12, 4))

Access metadata

>>> print(df.attrs['component'])
'Ex'
>>> print(df.attrs['sample_rate'])
256.0

to_numpy() → recarray[source]

Convert the dataset to a numpy structured array with time and data columns.

Returns:: Record array with ‘time’ and ‘channel_data’ fields.
Return type:: np.recarray

Notes

Data is loaded into memory. The ‘data’ name is avoided as it’s a builtin to numpy.

Examples

>>> arr = channel.to_numpy()
>>> print(arr.dtype.names)
('time', 'channel_data')
>>> print(arr['time'][0])
1980-01-01T00:00:00.000000000
>>> print(arr['channel_data'].mean())
0.152

Access fields

>>> times = arr['time']
>>> data = arr['channel_data']
>>> import matplotlib.pyplot as plt
>>> plt.plot(times, data)

to_xarray() → DataArray[source]

Convert the dataset to an xarray DataArray with time coordinates.

Returns:: DataArray with time index and metadata as attributes.
Return type:: xr.DataArray

Notes

Data is loaded into memory. Metadata is stored in the attrs dictionary and will not be validated if modified.

Examples

>>> xr_data = channel.to_xarray()
>>> print(xr_data)
<xarray.DataArray (time: 4096)>
array([0.931, 0.142, ..., 0.882])
Coordinates:
  * time     (time) datetime64[ns] 1980-01-01 ... 1980-01-01T00:00:15.996
.. attribute:: component

Ex

sample_rate: 256.0

...

Use xarray’s powerful selection

>>> morning = xr_data.sel(time=slice('1980-01-01T06:00', '1980-01-01T12:00'))
>>> daily_mean = xr_data.resample(time='1D').mean()
>>> xr_data.plot()

write_metadata() → None[source]

Write metadata from the container to HDF5 dataset attributes.

Converts all metadata values to numpy-compatible types before writing to HDF5 attributes. Falls back to string conversion if direct conversion fails.

Notes

This method is automatically called during initialization and when metadata is updated.

Examples

>>> channel.metadata.component = 'Ey'
>>> channel.metadata.measurement_azimuth = 90.0
>>> channel.write_metadata()

class mth5.groups.ElectricDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for electric field channel data.

Inherits all functionality from ChannelDataset with electric field specific metadata handling.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing electric field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> ex_dataset = run_group.get_channel('Ex')
>>> print(type(ex_dataset))
<class 'mth5.groups.channel_dataset.ElectricDataset'>
>>> print(ex_dataset.metadata.type)
'electric'
>>> print(ex_dataset.metadata.units)
'mV/km'

class mth5.groups.EstimateDataset(dataset: Dataset, dataset_metadata: StatisticalEstimate | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

Container for statistical estimates of transfer functions.

This class holds multi-dimensional statistical estimates for transfer functions with full metadata management. Estimates are stored as HDF5 datasets with dimensions for period, output channels, and input channels.

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the statistical estimate data.
dataset_metadata (mt_metadata.transfer_functions.tf.StatisticalEstimate, optional) – Metadata object for the estimate. If provided and write_metadata is True, the metadata will be written to the HDF5 attributes. Defaults to None.
write_metadata (bool, optional) – If True, write metadata to the HDF5 dataset attributes. Defaults to True.
**kwargs (Any) – Additional keyword arguments (reserved for future use).

hdf5_dataset

Weak reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata

Metadata container for the estimate.

Type:: StatisticalEstimate

logger

Logger instance for reporting messages.

Type:: loguru.logger

Raises:

MTH5Error – If dataset_metadata is provided but is not of type StatisticalEstimate or a compatible metadata class.
TypeError – If input data cannot be converted to numpy array or has wrong dtype/shape.

Notes

The estimate data is stored in 3D form with shape: (n_periods, n_output_channels, n_input_channels)

Metadata is automatically synchronized between the pydantic model and HDF5 attributes on initialization and after any modifications.

Examples

Create an estimate dataset from an HDF5 group:

>>> import h5py
>>> import numpy as np
>>> from mt_metadata.transfer_functions.tf.statistical_estimate import StatisticalEstimate
>>> # Create HDF5 file with estimate dataset
>>> with h5py.File('estimate.h5', 'w') as f:
...     # Create dataset with shape (10 periods, 2 outputs, 2 inputs)
...     data = np.random.rand(10, 2, 2)
...     dset = f.create_dataset('estimate', data=data)
...     # Create EstimateDataset
...     est = EstimateDataset(dset, write_metadata=True)

Convert estimate to xarray and back:

>>> periods = np.logspace(-3, 3, 10)  # 10 periods from 1e-3 to 1e3 s
>>> xr_data = est.to_xarray(periods)
>>> # Modify xarray coordinates
>>> new_xr = xr_data.rename({'output': 'new_output', 'input': 'new_input'})
>>> est.from_xarray(new_xr)  # Load modified data back

Access estimate data in different formats:

>>> # Get numpy array
>>> np_data = est.to_numpy()
>>> print(np_data.shape)  # (10, 2, 2)
>>> # Get xarray with proper coordinates
>>> xr_data = est.to_xarray(periods)
>>> print(xr_data.dims)  # ('period', 'output', 'input')

from_numpy(new_estimate: ndarray) → None[source]

Load estimate data from numpy array.

Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.

Parameters:: new_estimate (np.ndarray) – Estimate data to load. Must be convertible to numpy array. Preferred shape: (n_periods, n_output_channels, n_input_channels).
Return type:: None
Raises:: TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.

Notes

‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match.

Examples

Load estimate from numpy array:

>>> import numpy as np
>>> new_data = np.random.rand(5, 2, 2)
>>> est.from_numpy(new_data)
>>> print(est.to_numpy().shape)
(5, 2, 2)

Load with automatic dtype conversion:

>>> float_data = np.array([[[1.0, 2.0]]], dtype=np.float64)
>>> est.from_numpy(float_data)

from_xarray(data: DataArray) → None[source]

Load estimate data from xarray DataArray.

Updates metadata from xarray coordinates and attributes, then stores the data.

Parameters:: data (xr.DataArray) – DataArray containing estimate. Expected dimensions: (period, output, input).
Return type:: None

Notes

This will update output_channels, input_channels, name, and data_type from the xarray object. All changes are persisted to HDF5.

Examples

Load estimate from modified xarray:

>>> xr_data = est.to_xarray(periods)
>>> # Modify data and metadata
>>> modified = xr_data * 2  # Scale by 2
>>> est.from_xarray(modified)
>>> print(est.to_numpy()[0, 0, 0])  # Verify scale

Rename channels and reload:

>>> xr_data = est.to_xarray(periods)
>>> new_xr = xr_data.rename({
...     'output': ['Ex', 'Ey'],
...     'input': ['Bx', 'By']
... })
>>> est.from_xarray(new_xr)
>>> print(est.metadata.output_channels)
['Ex', 'Ey']

read_metadata() → None[source]

Read metadata from HDF5 attributes into metadata container.

Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.

Return type:: None

Notes

This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.

Examples

Reload metadata from HDF5 after external modification:

>>> # Metadata was modified in HDF5
>>> est.read_metadata()  # Reload changes
>>> print(est.metadata.name)  # Access updated name

replace_dataset(new_data_array: ndarray) → None[source]

Replace entire dataset with new data.

Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.

Parameters:: new_data_array (np.ndarray) – New estimate data to store. Should have shape (n_periods, n_output_channels, n_input_channels).
Return type:: None
Raises:: TypeError – If input cannot be converted to numpy array.

Notes

If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.

Examples

Replace estimate with new data:

>>> import numpy as np
>>> new_estimate = np.random.rand(10, 2, 2)  # 10 periods, 2 channels
>>> est.replace_dataset(new_estimate)
>>> print(est.to_numpy().shape)
(10, 2, 2)

Replace with data from list (auto-converted to array):

>>> data_list = [[[1, 2], [3, 4]]] * 5  # 5 periods
>>> est.replace_dataset(data_list)
>>> est.to_numpy().shape
(5, 2, 2)

to_numpy() → ndarray[source]

Convert estimate to numpy array.

Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.

Returns:: 3D array with shape (n_periods, n_output_channels, n_input_channels).
Return type:: np.ndarray

Notes

For large estimates, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access.

Examples

Get full estimate as numpy array:

>>> data = est.to_numpy()
>>> print(data.shape)
(10, 2, 2)
>>> print(data.dtype)
float64

Access specific period and channels:

>>> data = est.to_numpy()
>>> # Get first 5 periods, output channel 0, input channel 1
>>> subset = data[:5, 0, 1]
>>> print(subset.shape)
(5,)

to_xarray(period: ndarray | list) → DataArray[source]

Convert estimate to xarray DataArray.

Creates an xarray DataArray with proper coordinates for periods, output channels, and input channels. Includes metadata as attributes.

Parameters:: period (np.ndarray | list) – Period values for coordinate. Should have length equal to estimate first dimension (n_periods).
Returns:: DataArray with dimensions (period, output, input) and coordinates from metadata.
Return type:: xr.DataArray

Notes

Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.

Examples

Convert to xarray with logarithmic period spacing:

>>> import numpy as np
>>> periods = np.logspace(-2, 3, 10)  # 10 periods from 0.01 to 1000
>>> xr_data = est.to_xarray(periods)
>>> print(xr_data.dims)
('period', 'output', 'input')
>>> print(xr_data.coords['period'].values)
[1.00e-02 3.16e-02 ... 1.00e+03]

Select data by period range:

>>> subset = xr_data.sel(period=slice(0.1, 100))
>>> print(subset.shape)
(8, 2, 2)

write_metadata() → None[source]

Write metadata from container to HDF5 dataset attributes.

Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility.

Return type:: None

Notes

All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates.

Examples

Save updated metadata to HDF5:

>>> est.metadata.name = "Updated Estimate"
>>> est.write_metadata()  # Persist to file
>>> # Verify write
>>> print(est.hdf5_dataset.attrs['name'])
b'Updated Estimate'

class mth5.groups.ExperimentGroup(group, **kwargs)[source]

Bases: BaseGroup

Utility class to hold general information about the experiment and accompanying metadata for an MT experiment.

To access the hdf5 group directly use ExperimentGroup.hdf5_group.

>>> experiment = ExperimentGroup(hdf5_group)
>>> experiment.hdf5_group.ref
<HDF5 Group Reference>

Note

All attributes should be input into the metadata object, that way all input will be validated against the metadata standards. If you change attributes in metadata object, you should run the ExperimentGroup.write_metadata() method. This is a temporary solution, working on an automatic updater if metadata is changed.

>>> experiment.metadata.existing_attribute = 'update_existing_attribute'
>>> experiment.write_metadata()

If you want to add a new attribute this should be done using the metadata.add_base_attribute method.

>>> experiment.metadata.add_base_attribute('new_attribute',
>>> ...                                'new_attribute_value',
>>> ...                                {'type':str,
>>> ...                                 'required':True,
>>> ...                                 'style':'free form',
>>> ...                                 'description': 'new attribute desc.',
>>> ...                                 'units':None,
>>> ...                                 'options':[],
>>> ...                                 'alias':[],
>>> ...                                 'example':'new attribute

Tip

If you want ot add stations, reports, etc to the experiment this should be done from the MTH5 object. This is to avoid duplication, at least for now.

To look at what the structure of /Experiment looks like:

>>> experiment
/Experiment:
====================
    |- Group: Surveys
    -----------------
    |- Group: Reports
    -----------------
    |- Group: Standards
    -------------------
    |- Group: Stations
    ------------------

property metadata[source]: Overwrite get metadata to include station information

property surveys_group

class mth5.groups.FCChannelDataset(dataset: Dataset, dataset_metadata: FCChannel | None = None, write_metadata: bool = True, **kwargs: Any)[source]

Bases: object

Container for Fourier coefficients (FC) from windowed FFT analysis.

Holds multi-dimensional Fourier coefficient data representing time-frequency analysis results. Data is uniformly sampled in both frequency (via harmonic index) and time (via uniform FFT window step size).

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the Fourier coefficient data.
dataset_metadata (FCChannel | None, optional) – Metadata object containing FC channel properties like start time, end time, sample rates, units, and frequency method. If provided, metadata will be written to HDF5 attributes. Defaults to None.
**kwargs (Any) – Additional keyword arguments (reserved for future use).

hdf5_dataset

Weak reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata

Metadata container for the Fourier coefficients.

Type:: FCChannel

logger

Logger instance for reporting messages.

Type:: loguru.logger

Raises:

MTH5Error – If dataset_metadata is provided but is not of type FCChannel.
TypeError – If input data cannot be converted to numpy array or has incompatible dtype/shape.

Notes

The data array has shape (n_windows, n_frequencies) where: - n_windows: Number of time windows in the FFT moving window analysis - n_frequencies: Number of frequency bins determined by window size

Data is typically complex-valued representing Fourier coefficients. Time windows are uniformly spaced with interval 1/sample_rate_window_step. Frequencies are uniformly spaced from frequency_min to frequency_max.

Metadata includes: - Time period (start and end) - Acquisition and decimated sample rates - Window sample rate (delta_t within window) - Units - Frequency method (integer harmonic index calculation) - Component name (channel designation)

Examples

Create an FC dataset from HDF5 group:

>>> import h5py
>>> import numpy as np
>>> from mt_metadata.processing.fourier_coefficients import FCChannel
>>> with h5py.File('fc.h5', 'w') as f:
...     # Create 2D array: 50 time windows, 256 frequencies
...     data = np.random.rand(50, 256) + 1j * np.random.rand(50, 256)
...     dset = f.create_dataset('Ex', data=data, dtype=np.complex128)
...     # Create FCChannelDataset
...     fc = FCChannelDataset(dset, write_metadata=True)

Convert to xarray and access time-frequency data:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)  # ('time', 'frequency')
>>> # Access data at specific time and frequency
>>> subset = xr_data.sel(time='2023-01-01T12:00:00', method='nearest')

Inspect properties:

>>> print(f"Windows: {fc.n_windows}, Frequencies: {fc.n_frequencies}")
>>> print(f"Frequency range: {fc.frequency.min():.2f}-{fc.frequency.max():.2f} Hz")

property frequency: ndarray

Frequency array from metadata frequency bounds.

Generates uniformly spaced frequency coordinates based on the metadata frequency range and number of frequency bins.

Returns:: Array of frequency values, linearly spaced from frequency_min to frequency_max.
Return type:: np.ndarray

Notes

Frequencies represent harmonic indices or actual frequency values depending on the frequency method specified in metadata. Spacing is determined by n_frequencies bins over the range.

Examples

Access frequency array for frequency-based indexing:

>>> freq_array = fc.frequency
>>> print(freq_array.shape)  # (n_frequencies,)
>>> print(f"Frequency range: {freq_array.min():.2f} to {freq_array.max():.2f} Hz")
Frequency range: 0.00 to 64.00 Hz

from_numpy(new_estimate: ndarray) → None[source]

Load FC data from numpy array.

Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.

Parameters:: new_estimate (np.ndarray) – FC data to load. Should have shape (n_windows, n_frequencies). Typically complex-valued array.
Return type:: None
Raises:: TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.

Notes

‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match. Dtype compatibility is strictly enforced.

Examples

Load FC data from numpy array:

>>> import numpy as np
>>> new_data = np.random.rand(25, 128) + 1j * np.random.rand(25, 128)
>>> fc.from_numpy(new_data)
>>> print(fc.to_numpy().shape)
(25, 128)

Load with magnitude and phase separation:

>>> magnitude = np.random.rand(20, 256)
>>> phase = np.random.rand(20, 256) * 2 * np.pi
>>> fc_data = magnitude * np.exp(1j * phase)
>>> fc.from_numpy(fc_data)

from_xarray(data: DataArray, sample_rate_decimation_level: int | float) → None[source]

Load FC data from xarray DataArray.

Updates metadata from xarray coordinates and attributes, then stores the data. Computes frequency and time parameters from the provided xarray object.

Parameters:

data (xr.DataArray) – DataArray containing FC data. Expected dimensions: (time, frequency).
sample_rate_decimation_level (int | float) – Decimation level applied to original sample rate. Used to track processing history.

Return type:

None

Notes

This will update time_period (start/end), frequency bounds, window step rate, decimation level, component name, and units from the xarray object. All changes are persisted to HDF5.

Examples

Load FC data from modified xarray:

>>> xr_data = fc.to_xarray()
>>> # Modify data (e.g., apply filter)
>>> modified = xr_data * np.hamming(256)  # Apply frequency window
>>> fc.from_xarray(modified, sample_rate_decimation_level=4)
>>> print(fc.metadata.sample_rate_decimation_level)
4

Load with updated metadata from another analysis:

>>> import xarray as xr
>>> import pandas as pd
>>> time_coords = pd.date_range('2023-01-01', periods=30, freq='1H')
>>> freq_coords = np.arange(0, 128)
>>> new_fc = xr.DataArray(
...     data=np.random.rand(30, 128) + 1j * np.random.rand(30, 128),
...     coords={'time': time_coords, 'frequency': freq_coords},
...     dims=['time', 'frequency'],
...     name='Ey',
...     attrs={'units': 'mV/km'}
... )
>>> fc.from_xarray(new_fc, sample_rate_decimation_level=1)
>>> print(fc.metadata.component)
Ey

property n_frequencies: int

Number of frequency bins in the Fourier analysis.

Returns:: Number of frequency bins (second dimension of data array).
Return type:: int

Notes

This corresponds to the number of columns in the 2D spectrogram data. Determined by the FFT window size and relates to the frequency resolution of the analysis.

Examples

>>> print(f"Frequency bins: {fc.n_frequencies}")
Frequency bins: 256

property n_windows: int

Number of time windows in the FFT analysis.

Returns:: Number of time windows (first dimension of data array).
Return type:: int

Notes

This corresponds to the number of rows in the 2D spectrogram data. Each window represents a uniform time interval determined by the window step size (1/sample_rate_window_step).

Examples

>>> print(f"Time windows: {fc.n_windows}")
Time windows: 50

read_metadata() → None[source]

Read metadata from HDF5 attributes into metadata container.

Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.

Return type:: None

Notes

This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.

Examples

Reload metadata from HDF5 after external modification:

>>> # Metadata was modified in HDF5
>>> fc.read_metadata()  # Reload changes
>>> print(fc.metadata.component)  # Access updated component

replace_dataset(new_data_array: ndarray) → None[source]

Replace entire dataset with new data.

Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.

Parameters:: new_data_array (np.ndarray) – New FC data to store. Should have shape (n_windows, n_frequencies) and typically complex-valued.
Return type:: None
Raises:: TypeError – If input cannot be converted to numpy array.

Notes

If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.

Examples

Replace FC data with new analysis results:

>>> import numpy as np
>>> new_fc = np.random.rand(30, 256) + 1j * np.random.rand(30, 256)
>>> fc.replace_dataset(new_fc)
>>> print(fc.to_numpy().shape)
(30, 256)

Replace with data from list (auto-converted to array):

>>> data_list = [[[1+1j, 2+2j]], [[3+3j, 4+4j]]] * 15
>>> fc.replace_dataset(data_list)
>>> fc.to_numpy().shape
(30, 2)

property time: ndarray

Time array including the start of each time window.

Generates uniformly spaced time coordinates based on the start time, window step rate, and number of windows. Uses metadata time period to determine bounds.

Returns:: Array of datetime64 values for each window start time.
Return type:: np.ndarray

Notes

Time coordinates are generated using make_dt_coordinates, which ensures consistency between specified start/end times and the number of windows.

Examples

Access time array for time-based indexing:

>>> time_array = fc.time
>>> print(time_array.shape)  # (n_windows,)
>>> print(time_array[0])  # First window time
2023-01-01T00:00:00.000000

to_numpy() → ndarray[source]

Convert FC data to numpy array.

Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.

Returns:: 2D complex array with shape (n_windows, n_frequencies).
Return type:: np.ndarray

Notes

For large spectrograms, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access to subsets.

Examples

Get full FC data as numpy array:

>>> data = fc.to_numpy()
>>> print(data.shape)
(50, 256)
>>> print(data.dtype)
complex128

Access specific time window and frequency:

>>> data = fc.to_numpy()
>>> # Get first 10 windows, frequency bin 100
>>> subset = data[:10, 100]
>>> print(subset.shape)
(10,)

to_xarray() → DataArray[source]

Convert FC data to xarray DataArray.

Creates an xarray DataArray with proper coordinates for time and frequency. Includes metadata as attributes.

Returns:: DataArray with dimensions (time, frequency) and coordinates from metadata and computed properties.
Return type:: xr.DataArray

Notes

Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.

Examples

Convert to xarray with automatic coordinates:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)
('time', 'frequency')
>>> print(xr_data.shape)
(50, 256)

Select data by time and frequency range:

>>> subset = xr_data.sel(
...     time=slice('2023-01-01T00:00:00', '2023-01-01T12:00:00'),
...     frequency=slice(0, 10)
... )
>>> print(subset.shape)  # Subset shape

write_metadata() → None[source]

Write metadata from container to HDF5 dataset attributes.

Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility. Always ensures ‘mth5_type’ attribute is set to ‘FCChannel’.

Return type:: None

Notes

All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates. Read-only files will silently skip writes.

Examples

Save updated metadata to HDF5:

>>> fc.metadata.component = "Ey"
>>> fc.write_metadata()  # Persist to file
>>> # Verify write
>>> print(fc.hdf5_dataset.attrs['component'])
b'Ey'

class mth5.groups.FCDecimationGroup(group: Group, decimation_level_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single decimation level of Fourier Coefficient data.

This class manages all channels at a specific decimation level, assuming uniform sampling in both frequency and time domains.

Data Assumptions

Data uniformly sampled in frequency domain
Data uniformly sampled in time domain
FFT moving window has uniform step size

start_time

Start time of the decimation level

Type:: datetime

end_time

End time of the decimation level

Type:: datetime

channels

List of channel names in this decimation level

Type:: list

decimation_factor

Factor by which data was decimated

Type:: int

decimation_level

Level index in decimation hierarchy

Type:: int

sample_rate

Sample rate after decimation (Hz)

Type:: float

method

Method used (FFT, wavelet, etc.)

Type:: str

window

Window parameters (length, overlap, type, sample rate)

Type:: dict

param group:: HDF5 group object for this decimation level.
type group:: h5py.Group
param decimation_level_metadata:: Metadata for the decimation level. Default is None.
type decimation_level_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> decimation = FCDecimationGroup(h5_group, decimation_level_metadata=metadata)
>>> channel = decimation.add_channel('Ex', fc_data=fc_array)

add_channel(fc_name: str, fc_data: ~numpy.ndarray | None = None, fc_metadata: ~mt_metadata.processing.fourier_coefficients.fc_channel.FCChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = <class 'complex'>, **kwargs) → FCChannelDataset[source]

Add a Fourier Coefficient channel to the decimation level.

Creates a new FCChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array or created empty.

Parameters:

fc_name (str) – Name for the Fourier Coefficient channel (usually component name like ‘Ex’).
fc_data (np.ndarray, optional) – Input data with shape (n_frequencies, n_windows). Default is None (creates empty).
fc_metadata (fc.FCChannel, optional) – Metadata for the channel. Default is None.
max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
dtype (type, default=complex) – Data type for the dataset.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Returns:

Newly created FCChannelDataset object.

Return type:

FCChannelDataset

Raises:

TypeError – If fc_data type is not supported.

Notes

Data layout assumes (time, frequency) organization:

time index: window start times
frequency index: harmonic indices or float values
data: complex Fourier coefficients

If a channel with the same name already exists, the existing channel is returned instead of creating a duplicate.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> metadata = fc.FCChannel(component='Ex')

Create from numpy array:

>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)

Create empty channel (expandable):

>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)

add_feature(feature_name: str, feature_data: ndarray | None = None, feature_metadata: dict | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) → None[source]

Add a feature dataset to the decimation level.

Creates a new dataset for auxiliary features or derived quantities related to Fourier Coefficients (e.g., SNR, coherency, power, etc.).

Parameters:

feature_name (str) – Name for the feature dataset.
feature_data (np.ndarray, optional) – Input data for the feature. Default is None (creates empty).
feature_metadata (dict, optional) – Metadata dictionary for the feature. Default is None.
max_shape (tuple, default=(None, None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Notes

Feature types may include:

Power: Total power in Fourier coefficients
SNR: Signal-to-noise ratio
Coherency: Cross-component coherence
Weights: Channel-specific weights
Flags: Data quality or processing flags

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> snr_data = np.random.randn(100, 256)
>>> decimation.add_feature('snr', feature_data=snr_data)

Or create empty feature for later population:

>>> decimation.add_feature('power_Ex')

property channel_summary: DataFrame

Get a summary of all channels in this decimation level.

Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.

Returns:

DataFrame with columns:

componentstr
Channel component name (e.g., ‘Ex’, ‘Hy’)
startdatetime64[ns]
Start time of the channel data
enddatetime64[ns]
End time of the channel data
n_frequencyint64
Number of frequency bins
n_windowsint64
Number of time windows
sample_rate_decimation_levelfloat64
Decimation level sample rate (Hz)
sample_rate_window_stepfloat64
Sample rate of window stepping (Hz)
unitsstr
Physical units of the data
hdf5_referenceh5py.ref_dtype
HDF5 reference to the channel dataset

Return type:

pd.DataFrame

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> summary = decimation.channel_summary
>>> print(summary[['component', 'n_frequency', 'n_windows']])

from_dataframe(df: DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') → None[source]

Load Fourier Coefficient data from a pandas DataFrame.

Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.

Parameters:

df (pd.DataFrame) – Input DataFrame containing the coefficient data.
channel_key (str) – Name of the column containing coefficient values.
time_key (str, default='time') – Name of the time coordinate column.
frequency_key (str, default='frequency') – Name of the frequency coordinate column.

Raises:

TypeError – If df is not a pandas DataFrame.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')

from_numpy_array(nd_array: ndarray, ch_name: str | list[str]) → None[source]

Load Fourier Coefficient data from a numpy array.

Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.

Parameters:

nd_array (np.ndarray) – Input numpy array containing coefficient data.
ch_name (str or list[str]) – Channel name (for 2D array) or list of channel names (for 3D array).

Raises:

TypeError – If nd_array is not a numpy ndarray.
ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).

Examples

>>> decimation = FCDecimationGroup(h5_group)

Load single channel:

>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100)
>>> decimation.from_numpy_array(data_2d, ch_name='Ex')

Load multiple channels:

>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100)
>>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])

from_xarray(data_array: Dataset | DataArray, sample_rate_decimation_level: float) → None[source]

Load Fourier Coefficient data from an xarray DataArray or Dataset.

Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FCChannelDataset instances for each variable or the single DataArray.

Parameters:

data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).
sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).

Raises:

TypeError – If data_array is not an xarray Dataset or DataArray.

Notes

Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.

Examples

>>> import xarray as xr
>>> import numpy as np
>>> decimation = FCDecimationGroup(h5_group)

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex'
... )

Load into decimation group:

>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)

get_channel(fc_name: str) → FCChannelDataset[source]

Retrieve a Fourier Coefficient channel by name.

Parameters:

fc_name (str) – Name of the Fourier Coefficient channel to retrieve.

Returns:

The requested Fourier Coefficient channel dataset.

Return type:

FCChannelDataset

Raises:

KeyError – If the channel does not exist in this decimation level.
MTH5Error – If unable to retrieve the channel from HDF5.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> channel = decimation.get_channel('Ex')
>>> print(channel.shape)
(100, 256)

property metadata[source]: Overwrite get metadata to include channel information in the runs

remove_channel(fc_name: str) → None[source]

Remove a Fourier Coefficient channel from the decimation level.

Deletes the HDF5 dataset associated with the channel. Note that this removes the reference but does not reduce the HDF5 file size.

Parameters:: fc_name (str) – Name of the Fourier Coefficient channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Notes

Deleting a channel does not reduce the HDF5 file size; it simply removes the reference to the data. To truly reduce file size, copy the desired data to a new file.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.remove_channel('Ex')

to_xarray(channels: list[str] | None = None) → Dataset[source]

Create an xarray Dataset from Fourier Coefficient channels.

If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.

Parameters:: channels (list[str], optional) – List of channel names to include. If None, all channels are used. Default is None.
Returns:: xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.
Return type:: xr.Dataset

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> xr_data = decimation.to_xarray()
>>> print(xr_data.data_vars)
Data variables:
    Ex  (time, frequency) complex128
    Ey  (time, frequency) complex128

Get specific channels:

>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])

update_metadata() → None[source]

Update decimation level metadata from all channels.

Aggregates metadata from all FC channels in the decimation level including time period, sample rates, and window step information. Updates the internal metadata object and writes to HDF5.

Notes

Collects the following information from channels:

Time period start/end from channel data
Sample rate decimation level
Sample rate window step

Should be called after adding or modifying channels to keep metadata synchronized.

Examples

>>> decimation = FCDecimationGroup(h5_group)
>>> decimation.add_channel('Ex', fc_data=data_ex)
>>> decimation.add_channel('Ey', fc_data=data_ey)
>>> decimation.update_metadata()

class mth5.groups.FCGroup(group: Group, decimation_level_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Manage a set of Fourier Coefficients from a single processing run.

Holds Fourier Coefficient estimations organized by decimation level. Each decimation level contains channels (Ex, Ey, Hz, etc.) with complex frequency or time-frequency representations of the input signal.

All channels must use the same calibration. Recalibration requires rerunning the Fourier Coefficient estimation.

hdf5_group

The HDF5 group containing decimation levels

Type:: h5py.Group

metadata[source]

Decimation metadata including time period, sample rates, and channels

Type:: fc.Decimation

Notes

Processing run structure:

Multiple decimation levels at different sample rates
Each decimation level contains multiple channels
Each channel contains complex Fourier coefficients
Time period and sample rates define the estimation window

Examples

>>> with h5py.File('data.h5', 'r') as f:
...     fc_run = FCGroup(f['Fourier_Coefficients/run_1'])
...     print(fc_run.decimation_level_summary)

add_decimation_level(decimation_level_name: str, decimation_level_metadata: dict | Decimation | None = None) → FCDecimationGroup[source]

Add a new decimation level to the processing run.

Creates a new FCDecimationGroup for a single decimation level containing Fourier Coefficient channels at a specific sample rate.

Parameters:

decimation_level_name (str) – Identifier for the decimation level.
decimation_level_metadata (dict | fc.Decimation, optional) – Metadata for the decimation level. Can be a dictionary or fc.Decimation object. Default is None.

Returns:

Newly created decimation level group.

Return type:

FCDecimationGroup

Examples

>>> fc_run = FCGroup(h5_group)
>>> metadata = fc.Decimation(decimation_level=0)
>>> decimation = fc_run.add_decimation_level('0', metadata)

property decimation_level_summary: DataFrame

Get a summary of all decimation levels in this processing run.

Returns information about each decimation level including sample rate, decimation level value, and time span.

Returns:

Summary with columns:

decimation_level: Integer decimation level identifier
start: ISO format start time of this decimation level
end: ISO format end time of this decimation level
hdf5_reference: Reference to the HDF5 group

Return type:

pd.DataFrame

Notes

Each row represents a single decimation level containing multiple channels with Fourier coefficients at different sample rates.

Examples

>>> fc_run = FCGroup(h5_group)
>>> summary = fc_run.decimation_level_summary
>>> print(summary[['decimation_level', 'start', 'end']])
   decimation_level                start                  end
0              0     2023-01-01T00:00:00.000000  2023-01-01T01:00:00.000000
1              1     2023-01-01T00:00:00.000000  2023-01-01T02:00:00.000000

get_decimation_level(decimation_level_name: str) → FCDecimationGroup[source]

Retrieve a decimation level by name.

Parameters:: decimation_level_name (str) – Name or identifier of the decimation level.
Returns:: The requested decimation level group.
Return type:: FCDecimationGroup

Examples

>>> fc_run = FCGroup(h5_group)
>>> decimation = fc_run.get_decimation_level('0')
>>> channels = decimation.groups_list

property metadata: Decimation[source]

Get processing run metadata including all decimation levels.

Collects metadata from all decimation level groups and aggregates into a single Decimation metadata object.

Returns:: Metadata containing time period, sample rates, and all decimation level information.
Return type:: fc.Decimation

Notes

This getter automatically populates:

Time period (start and end)
List of all decimation levels and their metadata
HDF5 reference to this group

Examples

>>> fc_run = FCGroup(h5_group)
>>> metadata = fc_run.metadata
>>> print(metadata.time_period.start)
2023-01-01T00:00:00

remove_decimation_level(decimation_level_name: str) → None[source]

Remove a decimation level from the processing run.

Deletes the HDF5 group and all its channels (FCChannelDataset objects).

Parameters:: decimation_level_name (str) – Name or identifier of the decimation level to remove.

Notes

This removes the entire decimation level and all channels within it. To remove individual channels, use FCDecimationGroup.remove_channel() instead.

Examples

>>> fc_run = FCGroup(h5_group)
>>> fc_run.remove_decimation_level('0')

supports_aurora_processing_config(processing_config: aurora.config.metadata.processing.Processing, remote: bool) → bool[source]

Check if all required decimation levels exist for Aurora processing.

Performs an all-or-nothing check: returns True only if every decimation level required by the processing config is available in this FCGroup.

Uses sequential logic to short-circuit: if any required decimation level is missing, immediately returns False without checking remaining levels.

Parameters:

processing_config (aurora.config.metadata.processing.Processing) – Aurora processing configuration containing required decimation levels.
remote (bool) – Whether to check for remote processing compatibility.

Returns:

True if all required decimation levels are available and consistent, False otherwise.

Return type:

bool

Notes

Validation logic:

Extract list of decimation levels from processing config
Iterate through each required level in sequence
For each level, find a matching FCDecimation in this group
Check consistency using Aurora’s validation method
If any level is missing or inconsistent, return False immediately
Return True only if all levels pass validation

Examples

>>> fc_run = FCGroup(h5_group)
>>> config = aurora.config.metadata.processing.Processing(...)
>>> if fc_run.supports_aurora_processing_config(config, remote=False):
...     # All decimation levels are available
...     pass

update_metadata() → None[source]

Update processing run metadata from all decimation levels.

Aggregates time period information from all decimation levels and writes updated metadata to HDF5.

Notes

Collects:

Earliest start time across all decimation levels
Latest end time across all decimation levels

Should be called after adding or removing decimation levels.

Examples

>>> fc_run = FCGroup(h5_group)
>>> fc_run.add_decimation_level('0', metadata0)
>>> fc_run.add_decimation_level('1', metadata1)
>>> fc_run.update_metadata()

class mth5.groups.FeatureChannelDataset(dataset: Dataset, dataset_metadata: FeatureDecimationChannel | None = None, write_metadata: bool = True, **kwargs)[source]

Bases: object

Container for multi-dimensional Fourier Coefficients organized by time and frequency.

This class manages Fourier Coefficient data with frequency band organization, similar to FCDataset but with enhanced band tracking capabilities. The data array is organized with the following assumptions:

Data are grouped into frequency bands
Data are uniformly sampled in time (uniform FFT moving window step size)

The dataset tracks temporal evolution of frequency content across multiple windows, making it suitable for time-frequency analysis of geophysical signals.

Parameters:

dataset (h5py.Dataset) – HDF5 dataset containing the Fourier coefficient data.
dataset_metadata (FeatureDecimationChannel, optional) – Metadata for the dataset. See mt_metadata.features.FeatureDecimationChannel. If provided, must be of the same type as the internal metadata class. Default is None.
**kwargs – Additional keyword arguments for future extensibility.

hdf5_dataset

Reference to the HDF5 dataset.

Type:: h5py.Dataset

metadata

Metadata container with the following attributes:

namestr
Dataset name
time_period.startdatetime
Start time of the data acquisition
time_period.enddatetime
End time of the data acquisition
sample_rate_window_stepfloat
Sample rate of the time window stepping (Hz)
frequency_minfloat
Minimum frequency in the band (Hz)
frequency_maxfloat
Maximum frequency in the band (Hz)
unitsstr
Physical units of the coefficient data
componentstr
Component identifier (e.g., ‘Ex’, ‘Hy’)
sample_rate_decimation_levelint
Decimation level applied to acquire this data

Type:: FeatureDecimationChannel

Raises:: MTH5Error – If dataset_metadata type does not match the expected FeatureDecimationChannel type.

Examples

>>> import h5py
>>> from mt_metadata.features import FeatureDecimationChannel
>>> from mth5.groups.feature_dataset import FeatureChannelDataset

Create a feature dataset from an HDF5 group:

>>> with h5py.File('data.h5', 'r') as f:
...     h5_dataset = f['feature_group']['Ex']
...     feature = FeatureChannelDataset(h5_dataset)
...     print(f"Time windows: {feature.n_windows}")
...     print(f"Frequencies: {feature.n_frequencies}")

Access time and frequency arrays:

>>> time_array = feature.time
>>> freq_array = feature.frequency
>>> data_array = feature.to_numpy()

property frequency: ndarray

Get the frequency array for the dataset.

Returns a linearly-spaced frequency array from frequency_min to frequency_max with n_frequencies points.

Returns:: Array of float64 frequencies in Hz with shape (n_frequencies,).
Return type:: np.ndarray

Examples

>>> freq_array = feature.frequency
>>> print(freq_array.shape)
(256,)
>>> print(f"Frequency range: {freq_array[0]:.2f} - {freq_array[-1]:.2f} Hz")
Frequency range: 0.01 - 100.00 Hz

from_numpy(new_estimate: ndarray) → None[source]

Load data from a numpy array into the HDF5 dataset.

This method updates the HDF5 dataset with new data from a numpy array. The input array must match the dataset’s dtype. The HDF5 dataset will be resized if necessary to accommodate the new data.

Parameters:: new_estimate (np.ndarray) – Numpy array to write to the HDF5 dataset. Must have compatible dtype with the existing dataset.
Raises:: TypeError – If input array dtype does not match the HDF5 dataset dtype or if input cannot be converted to numpy array.

Notes

The variable ‘data’ is a builtin in numpy and cannot be used as a parameter name.

Examples

>>> import numpy as np
>>> new_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> feature.from_numpy(new_data)
>>> loaded_data = feature.to_numpy()
>>> assert loaded_data.shape == new_data.shape

from_xarray(data: DataArray, sample_rate_decimation_level: int) → None[source]

Load data and metadata from an xarray DataArray.

This method updates both the HDF5 dataset and metadata from an xarray DataArray. It extracts time coordinates, frequency range, and component information from the DataArray and its attributes.

Parameters:

data (xr.DataArray) – Input xarray DataArray with ‘time’ and ‘frequency’ coordinates. Expected dimensions are [‘time’, ‘frequency’].
sample_rate_decimation_level (int) – Decimation level applied to the original data to produce this feature dataset (integer >= 1).

Notes

Metadata stored in xarray attributes will be extracted and written to the HDF5 file. The full dataset is loaded into memory during this process.

Examples

>>> import xarray as xr
>>> import numpy as np

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex',
...     attrs={'units': 'mV/km'}
... )

Load into feature dataset:

>>> feature.from_xarray(xr_data, sample_rate_decimation_level=2)
>>> print(feature.metadata.component)
'Ex'

property n_frequencies: int

Get the number of frequency bins in the dataset.

Returns:: Number of frequency bins (second dimension of the dataset).
Return type:: int

property n_windows: int

Get the number of time windows in the dataset.

Returns:: Number of time windows (first dimension of the dataset).
Return type:: int

read_metadata() → None[source]

Read metadata from the HDF5 file into the metadata container.

This method loads all attributes from the HDF5 dataset into the metadata container, enabling validation and type checking.

Examples

>>> feature.read_metadata()
>>> print(feature.metadata.component)
'Ex'

replace_dataset(new_data_array: ndarray) → None[source]

Replace the entire HDF5 dataset with new data.

This method resizes the HDF5 dataset as needed and replaces all data. The input array must have the same dtype as the existing dataset.

Parameters:: new_data_array (np.ndarray) – New data array to replace the existing dataset. Will be converted to numpy array if necessary.
Raises:: TypeError – If input cannot be converted to a numpy array or has incompatible shape.

Examples

>>> import numpy as np
>>> new_data = np.random.randn(100, 256)
>>> feature.replace_dataset(new_data)

property time: ndarray

Get the time array for each window.

Returns an array of datetime64 values representing the start time of each time window. The time spacing is determined by the sample rate of the window stepping.

Returns:: Array of datetime64 values with shape (n_windows,) representing the start time of each window.
Return type:: np.ndarray

Examples

>>> time_array = feature.time
>>> print(time_array.shape)
(100,)
>>> print(time_array[0])
numpy.datetime64('2023-01-01T00:00:00')

to_numpy() → ndarray[source]

Convert the feature dataset to a numpy array.

Returns the dataset as a numpy array by loading it from the HDF5 file into memory. The array shape is (n_windows, n_frequencies).

Returns:: Numpy array containing all feature data with shape (n_windows, n_frequencies).
Return type:: np.ndarray

Examples

>>> data = feature.to_numpy()
>>> print(data.shape)
(100, 256)
>>> print(data.dtype)
complex128
>>> mean_amplitude = np.abs(data).mean()

to_xarray() → DataArray[source]

Convert the feature dataset to an xarray DataArray.

Returns an xarray DataArray with proper time and frequency coordinates, metadata attributes, and component naming. The entire dataset is loaded into memory.

Returns:: DataArray with dimensions [‘time’, ‘frequency’] and coordinates matching the dataset’s time and frequency arrays.
Return type:: xr.DataArray

Notes

Metadata stored in xarray attributes will not be validated if modified. The full dataset is loaded into memory; use with caution for large datasets.

Examples

>>> xr_data = feature.to_xarray()
>>> print(xr_data.dims)
('time', 'frequency')
>>> print(xr_data.name)
'Ex'
>>> subset = xr_data.sel(time=slice('2023-01-01', '2023-01-02'))

write_metadata() → None[source]

Write metadata from the metadata container to the HDF5 attributes.

This method serializes the metadata container and writes all metadata as attributes to the HDF5 dataset. Raises exceptions are caught for read-only files.

Examples

>>> feature.metadata.component = 'Ey'
>>> feature.write_metadata()

class mth5.groups.FeatureDecimationGroup(group: Group, decimation_level_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single decimation level with multiple Fourier Coefficient channels.

This class manages Fourier Coefficient data organized by frequency, time, and channel. Data is assumed to be uniformly sampled in both frequency and time domains.

Hierarchy

FeatureDecimationGroup -> FeatureChannelDataset (multiple channels)

Data Assumptions

Data are uniformly sampled in frequency domain
Data are uniformly sampled in time domain
FFT moving window has uniform step size

start time

Start time of the decimation level

Type:: datetime

end time

End time of the decimation level

Type:: datetime

channels

List of channel names in this decimation level

Type:: list

decimation_factor

Factor by which data was decimated

Type:: int

decimation_level

Level index in decimation hierarchy

Type:: int

decimation_sample_rate

Sample rate after decimation (Hz)

Type:: float

method

Method used (FFT, wavelet, etc.)

Type:: str

anti_alias_filter

Anti-aliasing filter used

Type:: optional

prewhitening_type

Type of prewhitening applied

Type:: optional

harmonics_kept

Harmonic indices kept in the data

Type:: list or ‘all’

window

Window parameters (length, overlap, type, sample rate)

Type:: dict

bands

Frequency bands in the data

Type:: list

param group:: HDF5 group object for this FeatureDecimationGroup.
type group:: h5py.Group
param decimation_level_metadata:: Metadata for the decimation level. Default is None.
type decimation_level_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> decimation = FeatureDecimationGroup(h5_group, metadata)
>>> channel = decimation.add_channel('Ex', fc_data=fc_array, fc_metadata=ch_metadata)

add_channel(fc_name: str, fc_data: ~numpy.ndarray | ~xarray.core.dataarray.DataArray | ~xarray.core.dataset.Dataset | ~pandas.DataFrame | None = None, fc_metadata: ~mt_metadata.features.feature_decimation_channel.FeatureDecimationChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = <class 'complex'>, **kwargs) → FeatureChannelDataset[source]

Add a Fourier Coefficient channel to the decimation level.

Creates a new FeatureChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array, xarray, DataFrame, or created empty.

Parameters:

fc_name (str) – Name for the Fourier Coefficient channel.
fc_data (np.ndarray, xr.DataArray, xr.Dataset, pd.DataFrame, optional) – Input data. Can be numpy array (time, frequency) or xarray/DataFrame format. Default is None (creates empty dataset).
fc_metadata (FeatureDecimationChannel, optional) – Metadata for the channel. Default is None.
max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
dtype (type, default=complex) – Data type for the dataset (e.g., complex, float, int).
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Returns:

Newly created FeatureChannelDataset object.

Return type:

FeatureChannelDataset

Raises:

TypeError – If fc_data type is not supported or metadata type mismatch.
RuntimeError or OSError – If channel already exists (will return existing channel).

Notes

Data layout assumes (time, frequency) organization:

time index: window start times
frequency index: harmonic indices or float values
data: complex Fourier coefficients

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> metadata = FeatureDecimationChannel(name='Ex')

Create from numpy array:

>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)

Create empty channel (expandable):

>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)

add_weights(weight_name: str, weight_data: ndarray | None = None, weight_metadata: object | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) → None[source]

Add weight or masking data for Fourier Coefficients.

Creates a dataset to store weights or masks for quality control, frequency band selection, or time window filtering.

Parameters:

weight_name (str) – Name for the weight dataset.
weight_data (np.ndarray, optional) – Weight values. Default is None.
weight_metadata (optional) – Metadata for the weight dataset. Default is None.
max_shape (tuple, default=(None, None, None)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use HDF5 chunking.
**kwargs – Additional keyword arguments for HDF5 dataset creation.

Notes

Weight datasets can track:

weight_channel: Per-channel weights
weight_band: Per-frequency-band weights
weight_time: Per-time-window weights

This method is a placeholder for future implementation.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.add_weights('coherency_weights', weight_data=weights)

property channel_summary: DataFrame

Get a summary of all channels in this decimation level.

Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.

Returns:

DataFrame with columns:

namestr
Channel name
startdatetime64[ns]
Start time of the channel data
enddatetime64[ns]
End time of the channel data
n_frequencyint64
Number of frequency bins
n_windowsint64
Number of time windows
sample_rate_decimation_levelfloat64
Decimation level sample rate (Hz)
sample_rate_window_stepfloat64
Sample rate of window stepping (Hz)
unitsstr
Physical units of the data
hdf5_referenceh5py.ref_dtype
HDF5 reference to the channel dataset

Return type:

pd.DataFrame

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> summary = decimation.channel_summary
>>> print(summary[['name', 'n_frequency', 'n_windows']])

from_dataframe(df: DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') → None[source]

Load Fourier Coefficient data from a pandas DataFrame.

Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.

Parameters:

df (pd.DataFrame) – Input DataFrame containing the coefficient data.
channel_key (str) – Name of the column containing coefficient values.
time_key (str, default='time') – Name of the time coordinate column.
frequency_key (str, default='frequency') – Name of the frequency coordinate column.

Raises:

TypeError – If df is not a pandas DataFrame.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')

from_numpy_array(nd_array: ndarray, ch_name: str | list) → None[source]

Load Fourier Coefficient data from a numpy array.

Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.

Parameters:

nd_array (np.ndarray) – Input numpy array containing coefficient data.
ch_name (str or list) – Channel name (for 2D array) or list of channel names (for 3D array).

Raises:

TypeError – If nd_array is not a numpy ndarray.
ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).

Examples

>>> decimation = FeatureDecimationGroup(h5_group)

Load single channel:

>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100)
>>> decimation.from_numpy_array(data_2d, ch_name='Ex')

Load multiple channels:

>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100)
>>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])

from_xarray(data_array: DataArray | Dataset, sample_rate_decimation_level: float) → None[source]

Load Fourier Coefficient data from an xarray DataArray or Dataset.

Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FeatureChannelDataset instances for each variable or the single DataArray.

Parameters:

data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).
sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).

Raises:

TypeError – If data_array is not an xarray Dataset or DataArray.

Notes

Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.

Examples

>>> import xarray as xr
>>> import numpy as np
>>> decimation = FeatureDecimationGroup(h5_group)

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex',
...     attrs={'units': 'mV/km'}
... )

Load into decimation group:

>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)

get_channel(fc_name: str) → FeatureChannelDataset[source]

Retrieve a Fourier Coefficient channel by name.

Parameters:: fc_name (str) – Name of the channel to retrieve.
Returns:: The requested FeatureChannelDataset object.
Return type:: FeatureChannelDataset
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> channel = decimation.get_channel('Ex')
>>> data = channel.to_numpy()

property metadata[source]: Overwrite get metadata to include channel information in the runs

remove_channel(fc_name: str) → None[source]

Remove a Fourier Coefficient channel from the decimation level.

Deletes the channel from the HDF5 file. Note that this removes the reference but does not reduce file size.

Parameters:: fc_name (str) – Name of the channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Notes

To reduce HDF5 file size, copy desired data to a new file.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.remove_channel('Ex')

to_xarray(channels: list | None = None) → Dataset[source]

Create an xarray Dataset from Fourier Coefficient channels.

If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.

Parameters:: channels (list, optional) – List of channel names to include. If None, all channels are used. Default is None.
Returns:: xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.
Return type:: xr.Dataset

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> xr_data = decimation.to_xarray()
>>> print(xr_data.data_vars)
Data variables:
    Ex  (time, frequency) complex128
    Ey  (time, frequency) complex128

Get specific channels:

>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])

update_metadata() → None[source]

Update metadata from all channels in the decimation level.

Scans all channels and updates the decimation-level metadata with aggregated information including time ranges and sampling rates.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.update_metadata()

class mth5.groups.FeatureFCRunGroup(group: Group, feature_run_metadata: Decimation | None = None, **kwargs)[source]

Bases: BaseGroup

Container for Fourier Coefficient features from a processing run.

This class manages Fourier Coefficient data organized by decimation levels, each containing multiple frequency channels with time-frequency data.

Hierarchy

FeatureFCRunGroup -> FeatureDecimationGroup -> FeatureChannelDataset

metadata[source]

Metadata including:

list of decimation levels
start time (earliest)
end time (latest)
method (fft, wavelet, …)
list of channels used
starting sample rate
bands used
type (TS or FC)

Type:: Decimation

param group:: HDF5 group object for this FeatureFCRunGroup.
type group:: h5py.Group
param feature_run_metadata:: Decimation metadata for the feature run. Default is None.
type feature_run_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group, feature_run_metadata=metadata)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)

add_decimation_level(decimation_level_name: str, feature_decimation_level_metadata: object | None = None) → FeatureDecimationGroup[source]

Add a decimation level group to the feature run.

Parameters:

decimation_level_name (str) – Name for the decimation level.
feature_decimation_level_metadata (optional) – Metadata for the decimation level. Default is None.

Returns:

Newly created decimation level group.

Return type:

FeatureDecimationGroup

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)
>>> print(decimation.name)
'level_0'

property decimation_level_summary: DataFrame

Get a summary of all decimation levels in the run.

Returns a pandas DataFrame with information about each decimation level including decimation factor, time range, and HDF5 reference.

Returns:

DataFrame with columns:

namestr
Decimation level name
startdatetime64[ns]
Start time of the decimation level
enddatetime64[ns]
End time of the decimation level
hdf5_referenceh5py.ref_dtype
HDF5 reference to the decimation level group

Return type:

pd.DataFrame

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> summary = fc_run.decimation_level_summary
>>> print(summary[['name', 'start', 'end']])

get_decimation_level(decimation_level_name: str) → FeatureDecimationGroup[source]

Retrieve a decimation level group by name.

Parameters:: decimation_level_name (str) – Name of the decimation level to retrieve.
Returns:: The requested decimation level group.
Return type:: FeatureDecimationGroup
Raises:: MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.get_decimation_level('level_0')

property metadata: Decimation[source]: Overwrite get metadata to include channel information in the runs

remove_decimation_level(decimation_level_name: str) → None[source]

Remove a decimation level from the feature run.

Parameters:: decimation_level_name (str) – Name of the decimation level to remove.
Raises:: MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.remove_decimation_level('level_0')

update_metadata() → None[source]

Update metadata from all decimation levels.

Scans all decimation levels and updates the run-level metadata with aggregated information including time ranges.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.update_metadata()

class mth5.groups.FeatureGroup(group: Group, feature_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for a single feature set with all associated runs and decimation levels.

This class manages feature-specific data including all processing runs and decimation levels. Features can include both Fourier Coefficient and time series data.

Hierarchy

FeatureGroup -> FeatureRunGroup ->

FC: FeatureDecimationLevel -> FeatureChannelDataset
TS: FeatureChannelDataset

param group:: HDF5 group object for this FeatureGroup.
type group:: h5py.Group
param feature_metadata:: Metadata specific to this feature. Should include description and parameters.
type feature_metadata:: optional
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Notes

Feature metadata should be specific to the feature and include descriptions of the feature and any parameters used in its computation.

Examples

>>> feature = FeatureGroup(h5_group, feature_metadata=metadata)
>>> run_group = feature.add_feature_run_group('run_1', domain='fc')

add_feature_run_group(feature_name: str, feature_run_metadata: object | None = None, domain: str = 'fc') → object[source]

Add a feature run group for a single feature.

Creates either a Fourier Coefficient run group or a time series run group based on the specified domain. The domain can be determined from the metadata or explicitly provided.

Parameters:

feature_name (str) – Name for the feature run group.
feature_run_metadata (optional) – Metadata for the feature run. If provided, domain is extracted from metadata.domain attribute. Default is None.
domain (str, default='fc') –
Domain type for the data. Must be one of:
- ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
- ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

Newly created feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:

ValueError – If domain is not recognized.
AttributeError – If metadata does not have a domain attribute when metadata is provided.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.add_feature_run_group('processing_run_1', domain='fc')
>>> ts_run = feature.add_feature_run_group('ts_analysis', domain='ts')

get_feature_run_group(feature_name: str, domain: str = 'frequency') → object[source]

Retrieve a feature run group by name and domain type.

Parameters:

feature_name (str) – Name of the feature run group to retrieve.
domain (str, default='frequency') –
Domain type. Must be one of:
- ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
- ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

The requested feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:

ValueError – If domain is not recognized.
MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.get_feature_run_group('processing_run_1', domain='fc')

remove_feature_run_group(feature_name: str) → None[source]

Remove a feature run group.

Deletes the specified feature run group and all its associated data. Note that deletion removes the reference but does not reduce HDF5 file size.

Parameters:: feature_name (str) – Name of the feature run group to remove.
Raises:: MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> feature.remove_feature_run_group('processing_run_1')

class mth5.groups.FeatureTSRunGroup(group: Group, feature_run_metadata: object | None = None, **kwargs)[source]

Bases: BaseGroup

Container for time series features from a processing or analysis run.

This class wraps a RunGroup to manage time series data features while maintaining compatibility with the feature hierarchy structure.

Parameters:

group (h5py.Group) – HDF5 group object for this FeatureTSRunGroup.
feature_run_metadata (optional) – Metadata for the feature run (same type as timeseries.Run).
**kwargs – Additional keyword arguments passed to BaseGroup.

Notes

This class uses methods from RunGroup for channel management, which may have performance implications due to multiple RunGroup instantiations.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group, feature_run_metadata=metadata)
>>> channel = ts_run.add_feature_channel('Ex', 'electric', data)

add_feature_channel(channel_name: str, channel_type: str, data: ndarray | None = None, channel_dtype: str = 'int32', shape: tuple | None = None, max_shape: tuple = (None,), chunks: bool = True, channel_metadata: object | None = None, **kwargs) → object[source]

Add a time series channel to the feature run group.

Creates a new channel for time series data with the specified properties and optional metadata. Channel metadata should be a timeseries.Channel object.

Parameters:

channel_name (str) – Name for the channel.
channel_type (str) – Type of channel (e.g., ‘electric’, ‘magnetic’).
data (np.ndarray, optional) – Initial data for the channel. Default is None.
channel_dtype (str, default='int32') – Data type for the channel.
shape (tuple, optional) – Shape of the channel data. Default is None.
max_shape (tuple, default=(None,)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use chunking for the dataset.
channel_metadata (optional) – Metadata object (timeseries.Channel type). Default is None.
**kwargs – Additional keyword arguments for dataset creation.

Returns:

Channel object from RunGroup.

Return type:

object

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.add_feature_channel(
...     'Ex', 'electric', data=np.arange(1000))

get_feature_channel(channel_name: str) → object[source]

Retrieve a feature channel by name.

Parameters:: channel_name (str) – Name of the channel to retrieve.
Returns:: Channel object from RunGroup.
Return type:: object
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.get_feature_channel('Ex')

remove_feature_channel(channel_name: str) → None[source]

Remove a feature channel from the run group.

Parameters:: channel_name (str) – Name of the channel to remove.
Raises:: MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> ts_run.remove_feature_channel('Ex')

class mth5.groups.FiltersGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Container for managing all filter types in MTH5 format.

This class provides a unified interface for organizing and accessing filters of different types. It automatically creates and manages subgroups for each filter type (ZPK, Coefficient, Time Delay, FAP, and FIR) within the HDF5 file structure.

Filter Types

zpk: Zeros, Poles, and Gain representation
coefficient: FIR coefficient filter
time_delay: Time delay filter
fap: Frequency-Amplitude-Phase (FAP) lookup table
fir: Finite Impulse Response filter

param group:: HDF5 group object for the filters container.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

zpk_group

Subgroup for zeros-poles-gain filters.

Type:: ZPKGroup

coefficient_group

Subgroup for coefficient filters.

Type:: CoefficientGroup

time_delay_group

Subgroup for time delay filters.

Type:: TimeDelayGroup

fap_group

Subgroup for frequency-amplitude-phase filters.

Type:: FAPGroup

fir_group

Subgroup for FIR filters.

Type:: FIRGroup

Examples

>>> import h5py
>>> from mth5.groups.filters import FiltersGroup
>>> with h5py.File('data.h5', 'r') as f:
...     filters = FiltersGroup(f['Filters'])
...     all_filters = filters.filter_dict
...     zpk_filter = filters.to_filter_object('my_zpk_filter')

add_filter(filter_object: object) → object[source]

Add a filter dataset based on its type.

Automatically detects the filter type and routes the filter to the appropriate subgroup. Filter names are normalized to lowercase and forward slashes are replaced with “ per “ for consistency.

Parameters:

filter_object (mt_metadata.timeseries.filters) –

An MT metadata filter object with a ‘type’ attribute. Supported types:

’zpk’, ‘poles_zeros’: Zeros-Poles-Gain filter
’coefficient’: Coefficient filter
’time_delay’, ‘time delay’: Time delay filter
’fap’, ‘frequency response table’: Frequency-Amplitude-Phase filter
’fir’: Finite Impulse Response filter

Returns:

Filter group object from the appropriate subgroup.

Return type:

object

Notes

If a filter with the same name already exists, the existing filter is returned instead of creating a duplicate.

Examples

>>> from mt_metadata.timeseries.filters import ZPK
>>> filters = FiltersGroup(h5_group)
>>> zpk_filter = ZPK(name='my_filter')
>>> added_filter = filters.add_filter(zpk_filter)

Add coefficient filter:

>>> from mt_metadata.timeseries.filters import Coefficient
>>> coeff_filter = Coefficient(name='lowpass')
>>> filters.add_filter(coeff_filter)

property filter_dict: dict[str, Any]

Get a dictionary of all filters across all filter type groups.

Aggregates filters from all subgroups (ZPK, Coefficient, Time Delay, FAP, FIR) into a single dictionary for convenient access and querying.

Returns:: Dictionary mapping filter names to filter metadata dictionaries. Each entry contains filter information including type and HDF5 reference.
Return type:: dict[str, Any]

Examples

>>> filters = FiltersGroup(h5_group)
>>> all_filters = filters.filter_dict
>>> print(list(all_filters.keys()))
['my_zpk_filter', 'lowpass_coefficient', 'time_delay_1', ...]
>>> print(all_filters['my_zpk_filter']['type'])
'zpk'

get_filter(name: str) → Dataset | Group[source]

Retrieve a filter dataset by name.

Looks up the filter by name in the aggregated filter dictionary and returns the HDF5 dataset or group object.

Parameters:: name (str) – Name of the filter to retrieve.
Returns:: HDF5 dataset or group object for the requested filter.
Return type:: h5py.Dataset or h5py.Group
Raises:: KeyError – If the filter name is not found in the filter dictionary.

Examples

>>> filters = FiltersGroup(h5_group)
>>> filter_dataset = filters.get_filter('my_zpk_filter')
>>> print(filter_dataset.attrs)

to_filter_object(name: str) → object[source]

Convert a filter HDF5 dataset to an MT metadata filter object.

Retrieves the filter metadata from the HDF5 file and converts it to the appropriate MT metadata filter class based on filter type.

Parameters:: name (str) – Name of the filter to convert.
Returns:: MT metadata filter object (ZPK, Coefficient, TimeDelay, FAP, or FIR).
Return type:: object
Raises:: KeyError – If the filter name is not found in the filter dictionary.

Examples

>>> filters = FiltersGroup(h5_group)
>>> zpk_filter = filters.to_filter_object('my_zpk_filter')
>>> print(zpk_filter.name)
'my_zpk_filter'
>>> print(type(zpk_filter))
<class 'mt_metadata.timeseries.filters.ZPK'>

Get different filter types:

>>> coeff_filter = filters.to_filter_object('lowpass_coefficient')
>>> fap_filter = filters.to_filter_object('frequency_response_1')

class mth5.groups.MagneticDataset(group: Dataset, **kwargs: Any)[source]

Bases: ChannelDataset

Specialized container for magnetic field channel data.

Inherits all functionality from ChannelDataset with magnetic field specific metadata handling.

Parameters:

group (h5py.Dataset) – HDF5 dataset containing magnetic field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.

Examples

>>> hx_dataset = run_group.get_channel('Hx')
>>> print(type(hx_dataset))
<class 'mth5.groups.channel_dataset.MagneticDataset'>
>>> print(hx_dataset.metadata.type)
'magnetic'
>>> print(hx_dataset.metadata.units)
'nT'

class mth5.groups.MasterFCGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Master container for all Fourier Coefficient estimations of time series data.

This class manages multiple Fourier Coefficient processing runs, each containing different decimation levels. No metadata is required at the master level.

Hierarchy

MasterFCGroup -> FCGroup (processing runs) -> FCDecimationGroup (decimation levels) -> FCChannelDataset (individual channels)

param group:: HDF5 group object for the master FC container.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> import h5py
>>> from mth5.groups.fourier_coefficients import MasterFCGroup
>>> with h5py.File('data.h5', 'r') as f:
...     master = MasterFCGroup(f['FC'])
...     fc_group = master.add_fc_group('processing_run_1')

add_fc_group(fc_name: str, fc_metadata: Decimation | None = None) → FCGroup[source]

Add a Fourier Coefficient processing run group.

Parameters:

fc_name (str) – Name for the FC group (usually identifies the processing run).
fc_metadata (fc.Decimation, optional) – Metadata for the FC group. Default is None.

Returns:

Newly created Fourier Coefficient group.

Return type:

FCGroup

Examples

>>> master = MasterFCGroup(h5_group)
>>> fc_group = master.add_fc_group('processing_run_1')
>>> print(fc_group.name)
'processing_run_1'

property fc_summary: DataFrame

Get a summary of all Fourier Coefficient processing runs.

Returns:: Summary information for all FC groups including names and metadata.
Return type:: pd.DataFrame

Examples

>>> master = MasterFCGroup(h5_group)
>>> summary = master.fc_summary

get_fc_group(fc_name: str) → FCGroup[source]

Retrieve a Fourier Coefficient group by name.

Parameters:: fc_name (str) – Name of the FC group to retrieve.
Returns:: The requested Fourier Coefficient group.
Return type:: FCGroup
Raises:: MTH5Error – If the FC group does not exist.

Examples

>>> master = MasterFCGroup(h5_group)
>>> fc_group = master.get_fc_group('processing_run_1')

remove_fc_group(fc_name: str) → None[source]

Remove a Fourier Coefficient group.

Deletes the specified FC group and all associated decimation levels and channels.

Parameters:: fc_name (str) – Name of the FC group to remove.
Raises:: MTH5Error – If the FC group does not exist.

Examples

>>> master = MasterFCGroup(h5_group)
>>> master.remove_fc_group('processing_run_1')

class mth5.groups.MasterFeaturesGroup(group: Group, **kwargs)[source]

Bases: BaseGroup

Master group container for features associated with Fourier Coefficients or time series.

This class manages the top-level organization of geophysical feature data, organizing it into feature-specific groups. Features can include various frequency or time-domain analyses.

Hierarchy

MasterFeatureGroup -> FeatureGroup -> FeatureRunGroup ->

FC: FeatureDecimationGroup -> FeatureChannelDataset
Time Series: FeatureChannelDataset

param group:: HDF5 group object for this MasterFeaturesGroup.
type group:: h5py.Group
param **kwargs:: Additional keyword arguments passed to BaseGroup.

Examples

>>> import h5py
>>> from mth5.groups.features import MasterFeaturesGroup
>>> with h5py.File('data.h5', 'r') as f:
...     master = MasterFeaturesGroup(f['features'])
...     feature_list = master.groups_list

add_feature_group(feature_name: str, feature_metadata: FeatureDecimationChannel | None = None) → FeatureGroup[source]

Add a feature group to the master features container.

Creates a new FeatureGroup with the specified name and optional metadata. Feature groups organize all runs and decimation levels for a particular feature.

Parameters:

feature_name (str) – Name for the feature group. Will be validated and formatted.
feature_metadata (FeatureDecimationChannel, optional) – Metadata describing the feature. Default is None.

Returns:

Newly created feature group object.

Return type:

FeatureGroup

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.add_feature_group('coherency')
>>> print(feature.name)
'coherency'

get_feature_group(feature_name: str) → FeatureGroup[source]

Retrieve a feature group by name.

Parameters:: feature_name (str) – Name of the feature group to retrieve.
Returns:: The requested feature group.
Return type:: FeatureGroup
Raises:: MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.get_feature_group('coherency')
>>> print(feature.name)
'coherency'

remove_feature_group(feature_name: str) → None[source]

Remove a feature group from the master container.

Deletes the specified feature group and its associated data from the HDF5 file. Note that this operation removes the reference but does not reduce the file size; copy desired data to a new file for size reduction.

Parameters:: feature_name (str) – Name of the feature group to remove.
Raises:: MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> master.remove_feature_group('coherency')

class mth5.groups.MasterStationGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Collection helper for all stations in a survey.

The group lives at /Survey/Stations and offers convenience accessors to add, fetch, or remove stations along with a summary table.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> _ = mth5_obj.open_mth5("/tmp/example.mth5", mode="a")
>>> stations = mth5_obj.stations_group
>>> _ = stations.add_station("MT001")
>>> stations.station_summary.head()

add_station(station_name: str, station_metadata: Station | None = None) → StationGroup[source]

Add or fetch a station group at /Survey/Stations/<name>.

Parameters:

station_name (str) – Station identifier, typically matches metadata.id.
station_metadata (mt_metadata.timeseries.Station, optional) – Metadata container to seed the station attributes.

Returns:

Convenience wrapper for the created or existing station.

Return type:

StationGroup

Raises:

ValueError – If station_name is empty.

Examples

>>> station = stations.add_station("MT001")
>>> station.metadata.id
'MT001'

get_station(station_name: str) → StationGroup[source]

Return an existing station by name.

Parameters:: station_name (str) – Name of the station to retrieve.
Returns:: Wrapper for the requested station.
Return type:: StationGroup
Raises:: MTH5Error – If the station does not exist.

Examples

>>> existing = stations.get_station("MT001")
>>> existing.name
'MT001'

remove_station(station_name: str) → None[source]

Delete a station group reference from the file.

Parameters:: station_name (str) – Existing station name.

Notes

HDF5 deletion removes the reference only; underlying storage is not reclaimed.

Examples

>>> stations.remove_station("MT001")

property station_summary: DataFrame

Return a summary DataFrame of all stations in the file.

Returns:: Columns include station, start, end, latitude, and longitude. Empty if no stations are present.
Return type:: pandas.DataFrame

Notes

Timestamps are parsed to pandas datetime64[ns] when possible.

Examples

>>> summary = stations.station_summary
>>> list(summary.columns)
['station', 'start', 'end', 'latitude', 'longitude']

class mth5.groups.MasterSurveyGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Collection helper for surveys under Experiment/Surveys.

Provides helpers to add, fetch, or remove surveys and to summarize all channels in the experiment.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> surveys = m5.surveys_group
>>> _ = surveys.add_survey("survey_01")
>>> surveys.channel_summary.head()

add_survey(survey_name: str, survey_metadata: Survey | None = None) → SurveyGroup[source]

Add or fetch a survey at /Experiment/Surveys/<name>.

Parameters:

survey_name (str) – Survey identifier; validated with validate_name.
survey_metadata (Survey, optional) – Metadata container used to seed the survey attributes.

Returns:

Wrapper for the created or existing survey.

Return type:

SurveyGroup

Raises:

ValueError – If survey_name is empty.
MTH5Error – If the provided metadata id conflicts with the group name.

Examples

>>> survey = surveys.add_survey("survey_01")
>>> survey.metadata.id
'survey_01'

property channel_summary: DataFrame

Return a DataFrame summarizing all channels across surveys.

Returns:: Columns include survey, station, run, location, component, start/end, sample info, orientation, units, and HDF5 reference.
Return type:: pandas.DataFrame

Examples

>>> summary = surveys.channel_summary
>>> set(summary.columns) >= {"survey", "station", "run", "component"}
True

get_survey(survey_name: str) → SurveyGroup[source]

Return an existing survey by name.

Parameters:: survey_name (str) – Existing survey name.
Returns:: Wrapper for the requested survey.
Return type:: SurveyGroup
Raises:: MTH5Error – If the survey does not exist.

Examples

>>> existing = surveys.get_survey("survey_01")
>>> existing.metadata.id
'survey_01'

remove_survey(survey_name: str) → None[source]

Delete a survey reference from the file.

Parameters:: survey_name (str) – Existing survey name.

Notes

HDF5 deletion removes the reference only; storage is not reclaimed.

Examples

>>> surveys.remove_survey("survey_01")

class mth5.groups.ReportsGroup(group: Group, **kwargs: Any)[source]

Bases: BaseGroup

Store report files (PDF/text) and images under /Survey/Reports.

Files are embedded into HDF5 datasets with basic metadata preserved.

Examples

>>> reports = survey.reports_group
>>> _ = reports.add_report("site_report", filename="/tmp/report.pdf")
>>> _ = reports.get_report("site_report")

add_report(report_name: str, report_metadata: dict[str, Any] | None = None, filename: str | Path | None = None) → None[source]

Add a report or image file to the group.

Parameters:

report_name (str) – Dataset name to store the file under.
report_metadata (dict, optional) – Additional attributes to attach to the dataset.
filename (str or Path, optional) – Path to the file to embed; supported types: PDF/TXT/MD and common images.

Raises:

FileNotFoundError – If filename does not exist.

Examples

>>> reports.add_report("manual", filename="docs/manual.pdf")

get_report(report_name: str, write=True) → Path[source]

Extract a stored report or image to the current working directory.

Parameters:: report_name (str) – Name of the stored dataset.
Returns:: Path to the materialized file on disk.
Return type:: pathlib.Path
Raises:: ValueError – If the stored file type is unsupported.

Examples

>>> path = reports.get_report("site_report")
>>> path.exists()
True

list_reports() → list[str][source]

List all stored reports and images in the group.

Returns:: Names of all stored datasets in the reports group.
Return type:: list of str

Examples

>>> report_names = reports.list_reports()
>>> print(report_names)
['site_report', 'manual', 'overview_image']

remove_report(report_name: str) → None[source]

Remove a stored report or image from the group.

Parameters:: report_name (str) – Name of the stored dataset to remove.

Examples

>>> reports.remove_report("manual")

class mth5.groups.RunGroup(group: Group, run_metadata: Run | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Container for a single MT measurement run with multiple channels.

Manages time series data and metadata for one measurement run within a station. A run can contain multiple channels of electric, magnetic, and auxiliary data. This class provides methods to add, retrieve, and manage individual channels, along with convenient access to station and survey metadata.

The run group is located at /Survey/Stations/{station_name}/{run_name} in the HDF5 file hierarchy.

metadata[source]

Run metadata including sample rate, time period, and channel information.

Type:: mt_metadata.timeseries.Run

channel_summary

Summary table of all channels in the run.

Type:: pd.DataFrame

groups_list

List of channel names in the run.

Type:: list[str]

Parameters:

group (h5py.Group) – HDF5 group for the run, should have path like /Survey/Stations/{station_name}/{run_name}
run_metadata (mt_metadata.timeseries.Run, optional) – Metadata container for the run. Default is None.
**kwargs (Any) – Additional keyword arguments passed to BaseGroup.

Notes

Key behaviors:

Channels can be of type: electric, magnetic, or auxiliary
All metadata updates should use the metadata object for validation
Call write_metadata() after modifying metadata to persist changes
Channel metadata is cached for performance during repeated access
Deleting a channel removes the reference but doesn’t reduce file size

Examples

Access run from an open MTH5 file:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')

Check available channels:

>>> run.groups_list
['Ex', 'Ey', 'Hx', 'Hy']

Access HDF5 group directly:

>>> run.hdf5_group.ref
<HDF5 Group Reference>

Update metadata and persist to file:

>>> run.metadata.sample_rate = 512.0
>>> run.write_metadata()

Add a channel:

>>> import numpy as np
>>> data = np.random.rand(4096)
>>> ex = run.add_channel('Ex', 'electric', data=data)

This class provides methods to add and get channels. A summary table of all existing channels in the run is also provided as a convenience look up table to make searching easier.

Parameters:

group (h5py.Group) – HDF5 group for a station, should have a path /Survey/Stations/station_name/run_name
station_metadata (mth5.metadata.Station, optional) – metadata container, defaults to None

Access RunGroup from an open MTH5 file:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')

Check what channels exist:

>>> station.groups_list
['Ex', 'Ey', 'Hx', 'Hy']

To access the hdf5 group directly use RunGroup.hdf5_group

>>> station.hdf5_group.ref
<HDF5 Group Reference>

Note

All attributes should be input into the metadata object, that way all input will be validated against the metadata standards. If you change attributes in metadata object, you should run the SurveyGroup.write_metadata() method. This is a temporary solution, working on an automatic updater if metadata is changed.

>>> run.metadata.existing_attribute = 'update_existing_attribute'
>>> run.write_metadata()

If you want to add a new attribute this should be done using the metadata.add_base_attribute method.

>>> station.metadata.add_base_attribute('new_attribute',
>>> ...                                 'new_attribute_value',
>>> ...                                 {'type':str,
>>> ...                                  'required':True,
>>> ...                                  'style':'free form',
>>> ...                                  'description': 'new attribute desc.',
>>> ...                                  'units':None,
>>> ...                                  'options':[],
>>> ...                                  'alias':[],
>>> ...                                  'example':'new attribute

Add a channel:

>>> new_channel = run.add_channel('Ex', 'electric',
>>> ...                            data=numpy.random.rand(4096))
>>> new_run
/Survey/Stations/MT001/MT001a:
=======================================
    --> Dataset: summary
    ......................
    --> Dataset: Ex
    ......................
    --> Dataset: Ey
    ......................
    --> Dataset: Hx
    ......................
    --> Dataset: Hy
    ......................

Add a channel with metadata:

>>> from mth5.metadata import Electric
>>> ex_metadata = Electric()
>>> ex_metadata.time_period.start = '2020-01-01T12:30:00'
>>> ex_metadata.time_period.end = '2020-01-03T16:30:00'
>>> new_ex = run.add_channel('Ex', 'electric',
>>> ...                       channel_metadata=ex_metadata)
>>> # to look at the metadata
>>> new_ex.metadata
{
     "electric": {
        "ac.end": 1.2,
        "ac.start": 2.3,
        ...
        }
}

See also

mth5.metadata for details on how to add metadata from various files and python objects.

Remove a channel:

>>> run.remove_channel('Ex')
>>> station
/Survey/Stations/MT001/MT001a:
=======================================
    --> Dataset: summary
    ......................
    --> Dataset: Ey
    ......................
    --> Dataset: Hx
    ......................
    --> Dataset: Hy
    ......................

Note

Deleting a station is not as simple as del(station). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the station.

Get a channel:

>>> existing_ex = stations.get_channel('Ex')
>>> existing_ex
Channel Electric:
-------------------
    data type:        Ex
    data type:        electric
    data format:      float32
    data shape:       (4096,)
    start:            1980-01-01T00:00:00+00:00
    end:              1980-01-01T00:32:+08:00
    sample rate:      8

Summary Table:

A summary table is provided to make searching easier. The table summarized all stations within a survey. To see what names are in the summary table:

>>> run.summary_table.dtype.descr
[('component', ('|S5', {'h5py_encoding': 'ascii'})),
 ('start', ('|S32', {'h5py_encoding': 'ascii'})),
 ('end', ('|S32', {'h5py_encoding': 'ascii'})),
 ('n_samples', '<i4'),
 ('measurement_type', ('|S12', {'h5py_encoding': 'ascii'})),
 ('units', ('|S25', {'h5py_encoding': 'ascii'})),
 ('hdf5_reference', ('|O', {'ref': h5py.h5r.Reference}))]

Note

When a run is added an entry is added to the summary table, where the information is pulled from the metadata.

>>> new_run.summary_table
index | component | start | end | n_samples | measurement_type | units |
hdf5_reference
--------------------------------------------------------------------------
-------------

add_channel(channel_name, channel_type, data, channel_dtype='int32', shape=None, max_shape=(None,), chunks=True, channel_metadata=None, **kwargs)[source]

Add a channel to the run.

Parameters:

channel_name (str) – Name of the channel (e.g., ‘ex’, ‘ey’, ‘hx’, ‘hy’, ‘hz’).
channel_type (str) – Type of channel: ‘electric’, ‘magnetic’, or ‘auxiliary’.
data (numpy.ndarray or None) – Time series data for the channel. If None, an empty resizable dataset will be created.
channel_dtype (str, optional) – Data type for the channel if data is None, by default “int32”.
shape (tuple of int, optional) – Initial shape of the dataset. If None and data is None, shape is estimated from metadata or set to (1,), by default None.
max_shape (tuple of int or None, optional) – Maximum shape the dataset can be resized to. Use None for unlimited growth in that dimension, by default (None,).
chunks (bool or int, optional) – Enable chunked storage. If True, uses automatic chunking. If int, uses that chunk size, by default True.
channel_metadata (mt_metadata.timeseries.Electric, Magnetic, or Auxiliary, optional) – Metadata object for the channel, by default None.
**kwargs (dict) – Additional keyword arguments.

Returns:

The created channel dataset object.

Return type:

ElectricDataset or MagneticDataset or AuxiliaryDataset

Raises:

MTH5Error – If channel_type is not one of: electric, magnetic, auxiliary.

Examples

Add a channel with data:

>>> import numpy as np
>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='a')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> data = np.random.rand(4096)
>>> ex = run.add_channel('ex', 'electric', data)
>>> print(ex.metadata.component)
ex

Add a channel with metadata:

>>> from mt_metadata.timeseries import Electric
>>> ex_meta = Electric()
>>> ex_meta.time_period.start = '2020-01-01T12:30:00'
>>> ex_meta.sample_rate = 256.0
>>> ex = run.add_channel('ex', 'electric', None,
...                      channel_metadata=ex_meta)
>>> print(ex.metadata.sample_rate)
256.0

Add a channel with custom shape:

>>> ex = run.add_channel('ex', 'electric', None,
...                      shape=(8192,), channel_dtype='float32')
>>> print(ex.hdf5_dataset.shape)
(8192,)

property channel_summary: DataFrame

Get summary of all channels in the run as a DataFrame.

Returns:: DataFrame with columns: component, start, end, n_samples, sample_rate, measurement_type, units, hdf5_reference.
Return type:: pandas.DataFrame

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> summary = run.channel_summary
>>> print(summary[['component', 'sample_rate', 'n_samples']])
  component  sample_rate  n_samples
0        ex        256.0      65536
1        ey        256.0      65536
2        hx        256.0      65536
3        hy        256.0      65536

from_channel_ts(channel_ts_obj: ChannelTS) → ElectricDataset | MagneticDataset | AuxiliaryDataset[source]

Create a channel dataset from a ChannelTS timeseries object.

Converts a single ChannelTS object with time series data and metadata into an HDF5 channel dataset. Handles filter registration and updates run metadata with channel information.

Parameters:: channel_ts_obj (ChannelTS) – ChannelTS object containing time series data and metadata.
Returns:: Created channel dataset object.
Return type:: ElectricDataset | MagneticDataset | AuxiliaryDataset
Raises:: MTH5Error – If input is not a ChannelTS object.

Notes

Registers filters from channel response if present
Validates and corrects station/run ID mismatches
Updates run metadata recorded channel lists
Automatically determines channel type from metadata

Examples

>>> from mt_timeseries import ChannelTS
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> channel = ChannelTS.from_file("ex_timeseries.txt")
>>> ex = run.from_channel_ts(channel)
>>> print(ex.metadata.component)
ex

from_runts(run_ts_obj: RunTS, **kwargs: Any) → list[ElectricDataset | MagneticDataset | AuxiliaryDataset][source]

Create channel datasets from a RunTS timeseries object.

Converts a RunTS object with multiple channels and metadata into HDF5 channel datasets and updates run metadata accordingly.

Parameters:

run_ts_obj (RunTS) – RunTS object containing multiple channels and metadata.
**kwargs (Any) – Additional keyword arguments.

Returns:

List of created channel dataset objects.

Return type:

list[ElectricDataset | MagneticDataset | AuxiliaryDataset]

Raises:

MTH5Error – If input is not a RunTS object.

Notes

Updates run metadata from input object
Validates station and run IDs match current context
Creates appropriate channel type based on channel metadata
Automatically registers recorded channels in run metadata

Examples

>>> from mt_timeseries import RunTS
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> runts = RunTS.from_file("timeseries_data.txt")
>>> channels = run.from_runts(runts)
>>> print(f"Created {len(channels)} channels")
Created 4 channels

get_channel(channel_name: str) → ElectricDataset | MagneticDataset | AuxiliaryDataset | ChannelDataset[source]

Get a channel from an existing name.

Returns the appropriate channel dataset container based on the channel type (electric, magnetic, or auxiliary).

Parameters:: channel_name (str) – Name of the channel to retrieve (e.g., ‘ex’, ‘ey’, ‘hx’).
Returns:: Channel dataset object containing the channel data and metadata.
Return type:: ElectricDataset or MagneticDataset or AuxiliaryDataset or ChannelDataset
Raises:: MTH5Error – If the channel does not exist in the run.

Examples

Attempting to get a non-existent channel:

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> ex = run.get_channel('ex')
MTH5Error: ex does not exist, check groups_list for existing names

Check available channels first:

>>> run.groups_list
['ey', 'hx', 'hz']

Get an existing channel:

>>> ey = run.get_channel('ey')
>>> print(ey)
Channel Electric:
-------------------
        component:        ey
        data type:        electric
        data format:      float32
        data shape:       (4096,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:01+00:00
        sample rate:      4096

has_data() → bool[source]

Check if the run contains any non-empty, non-zero data.

Verifies that all channels in the run have valid data (non-zero and non-empty arrays). Returns False if any channel lacks data.

Returns:: True if all channels have data, False if any channel is empty or all zeros.
Return type:: bool

Notes

A channel is considered to have data if its has_data() method returns True, meaning it contains non-zero values.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> if run.has_data():
...     print("Run contains valid data")
...     runts = run.to_runts()

property metadata: Run[source]

Get run metadata including all channel information.

This property dynamically reads and caches channel metadata from HDF5, ensuring the run metadata always reflects the current state of channels.

Returns:: Run metadata object with all channels included.
Return type:: metadata.Run

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run_meta = run.metadata
>>> print(run_meta.channels_recorded_electric)
['ex', 'ey']
>>> print(run_meta.sample_rate)
256.0

plot(start: str | None = None, end: str | None = None, n_samples: int | None = None) → Any[source]

Create a matplotlib plot of all channels in the run.

Generates a multi-panel plot showing all channels in the run using the RunTS plotting functionality.

Parameters:

start (str, optional) – Start time for time slice in ISO format. If None, plots entire channel data. Default is None.
end (str, optional) – End time for time slice in ISO format. Only used if start is specified. Default is None.
n_samples (int, optional) – Number of samples to extract from start. If both end and n_samples are specified, end takes precedence. Default is None.

Returns:

Matplotlib figure or axes object (depends on RunTS.plot() implementation).

Return type:

Any

Notes

Creates separate subplots for each channel type (electric, magnetic, auxiliary)
Time slice parameters work the same as to_runts()
Requires matplotlib to be installed

Examples

Plot entire run:

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> fig = run.plot()
>>> fig.show()

Plot time slice:

>>> fig = run.plot(start='2023-01-01T12:00:00',
...                end='2023-01-01T13:00:00')

recache_channel_metadata() → None[source]

Clear and rebuild the channel metadata cache from current HDF5 data.

This method reads all channel metadata from HDF5 storage and updates the internal cache. Useful when channel metadata has been modified externally or needs to be synchronized.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.recache_channel_metadata()
>>> # Cache is now synchronized with HDF5 storage

remove_channel(channel_name: str) → None[source]

Remove a channel from the run.

Deleting a channel is not as simple as del(channel). In HDF5, this does not free up memory; it simply removes the reference to that channel. The common way to get around this is to copy what you want into a new file, or overwrite the channel.

Parameters:: channel_name (str) – Name of the existing channel to remove.

Notes

Deleting a channel does not reduce the HDF5 file size. It simply removes the reference. If file size reduction is your goal, copy what you want into another file.

Todo: Need to remove summary table entry as well.

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a')
>>> run.remove_channel('ex')

property station_metadata: Station

Get station metadata with current run included.

Returns:: Station metadata object containing this run’s information.
Return type:: metadata.Station

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> station_meta = run.station_metadata
>>> print(station_meta.id)
MT001

property survey_metadata: Survey

Get survey metadata with current station and run included.

Returns:: Survey metadata object containing the full hierarchy.
Return type:: metadata.Survey

Examples

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5("example.h5", mode='r')
>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> survey_meta = run.survey_metadata
>>> print(survey_meta.id)
CONUS_South

to_runts(start: str | None = None, end: str | None = None, n_samples: int | None = None) → RunTS[source]

Convert run to a RunTS timeseries object.

Combines all channels in the run into a RunTS object which handles multi-channel time series data with associated metadata.

Parameters:

start (str, optional) – Start time for time slice in ISO format (e.g., ‘2023-01-01T12:00:00’). If None, uses entire channel data. Default is None.
end (str, optional) – End time for time slice in ISO format. Only used if start is specified. Default is None.
n_samples (int, optional) – Number of samples to extract from start. If both end and n_samples are specified, end takes precedence. Default is None.

Returns:

RunTS object containing all channels with full run and station metadata.

Return type:

RunTS

Notes

Includes run, station, and survey metadata in the output
Skips the ‘summary’ group which is not a channel
If start is specified, performs time slicing; otherwise returns full data

Examples

Convert entire run to RunTS:

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> runts = run.to_runts()
>>> print(runts.channels)
['ex', 'ey', 'hx', 'hy']

Time slice the run:

>>> runts = run.to_runts(start='2023-01-01T12:00:00',
...                       end='2023-01-01T13:00:00')
>>> print(runts.ex.ts.shape)
(1024,)

update_metadata() → None[source]

Update run metadata from all channels and persist to HDF5.

Aggregates metadata from all channels including time period and sample rate, then writes updated metadata to HDF5 attributes.

Raises:: Exception – May raise exceptions if no channels exist (logs warning).

Notes

Updates:

Time period start from minimum of all channels
Time period end from maximum of all channels
Sample rate from first channel (assumes uniform across channels)

Should be called after adding or removing channels to maintain consistency between channel and run metadata.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.add_channel('ex', 'electric', data=ex_data)
>>> run.add_channel('ey', 'electric', data=ey_data)
>>> run.update_metadata()  # Updates time period and sample rate

update_run_metadata() → None[source]

Update metadata and table entries (Deprecated). .. deprecated:

Use update_metadata() instead.

Raises:: DeprecationWarning – Always raised to indicate this method should not be used.

write_metadata() → None[source]

Write run metadata to HDF5 attributes.

Converts metadata object to dictionary and writes all attributes to the HDF5 group.

Examples

>>> run = mth5_obj.get_run("MT001", "MT001a")
>>> run.metadata.sample_rate = 512.0
>>> run.write_metadata()
>>> # Metadata is now persisted to HDF5 file

class mth5.groups.StandardsGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Container for metadata standards documentation stored in the HDF5 file.

Stores metadata standards used throughout the survey in a standardized summary table. This enables users to understand metadata directly from the file without requiring external documentation.

The standards are organized in a summary table at /Survey/Standards/summary with columns for attribute name, type, requirements, style, units, and descriptions.

summary_table

The standards summary table with metadata definitions.

Type:: MTH5Table

Notes

Standards include definitions for:

Survey, Station, Run, Electric, Magnetic, Auxiliary metadata
Filter types: Coefficient, FIR, FrequencyResponseTable, PoleZero, TimeDelay
Processing standards from aurora and fourier_coefficients modules

Examples

>>> with MTH5('survey.mth5') as mth5_obj:
...     standards = mth5_obj.standards_group
...     summary = standards.summary_table
...     print(summary.array.dtype.names)
('attribute', 'type', 'required', 'style', 'units', 'description', ...)

Get information about a specific attribute:

>>> standards.get_attribute_information('survey.release_license')
survey.release_license
--------------------------
        type          : string
        required      : True
        style         : controlled vocabulary
        ...

get_attribute_information(attribute_name: str) → None[source]

Print detailed information about a metadata attribute.

Retrieves and displays all metadata standards information for the specified attribute from the standards summary table.

Parameters:: attribute_name (str) – Name of the attribute to describe (e.g., ‘survey.release_license’).
Raises:: MTH5TableError – If the attribute is not found in the standards summary table.

Notes

Prints formatted output including:

Data type
Whether attribute is required
Style (e.g., controlled vocabulary)
Units
Description
Valid options
Aliases
Example values
Default value

Examples

>>> standards = mth5_obj.standards_group
>>> standards.get_attribute_information('survey.release_license')
survey.release_license
--------------------------
        type          : string
        required      : True
        style         : controlled vocabulary
        units         :
        description   : How the data can be used. The options are based on
                 Creative Commons licenses.
        options       : CC-0,CC-BY,CC-BY-SA,CC-BY-ND,CC-BY-NC-SA
        alias         :
        example       : CC-0
        default       : CC-0

get_standards_summary(modules: list[str] | None = None) → ndarray[source]

Get standards for specified metadata modules.

Retrieves and concatenates standards arrays from one or more metadata modules for inclusion in the standards table.

Parameters:: modules (list[str], optional) – List of module names to include (e.g., ‘timeseries’, ‘filters’). If None, uses default modules: common, timeseries, timeseries.filters, transfer_functions.tf, features, features.weights, processing, processing.fourier_coefficients, processing.aurora. Default is None.
Returns:: Concatenated numpy structured array containing standards for all requested modules with dtype matching STANDARDS_DTYPE.
Return type:: np.ndarray

Examples

>>> standards = StandardsGroup(group)
>>> ts_standards = standards.get_standards_summary(['timeseries'])
>>> print(ts_standards.shape)
(45,)

Get all default modules:

>>> all_standards = standards.get_standards_summary()

initialize_group() → None[source]

Initialize the standards group and create the summary table.

Creates the summary table dataset in the HDF5 file and populates it with metadata standards from all default modules. Sets appropriate HDF5 attributes and writes the group metadata.

Notes

Initialization process:

Creates HDF5 dataset for summary table with maximum expandable shape
Applies compression if configured in dataset_options
Sets HDF5 attributes: type, last_updated, reference
Populates table with standards from all default modules
Writes group metadata to HDF5

The summary table uses STANDARDS_DTYPE and supports up to 1000 rows.

Examples

>>> mth5_obj.initialize_group()
>>> summary_table = mth5_obj.standards_group.summary_table
>>> print(summary_table.array.shape)
(342,)

property summary_table: MTH5Table

summary_table_from_array(array: ndarray) → None[source]

Populate summary table from a numpy structured array.

Converts a structured numpy array into rows in the HDF5 summary table.

Parameters:: array (np.ndarray) – Structured numpy array with dtype matching STANDARDS_DTYPE. Each row represents one metadata attribute definition.

Notes

Iterates through all rows of the structured array and adds them sequentially to the summary table using add_row().

Examples

>>> standards = StandardsGroup(group)
>>> standards_array = standards.get_standards_summary()
>>> standards.summary_table_from_array(standards_array)

summary_table_from_dict(summary_dict: dict[str, Any]) → None[source]

Populate summary table from a dictionary of metadata standards.

Converts a flattened dictionary of metadata standards into rows in the HDF5 summary table.

Parameters:: summary_dict (dict[str, Any]) – Flattened dictionary of all metadata standards. Keys are attribute names, values are dictionaries with type, required, style, units, description, etc.

Notes

Processes dictionary values:

Lists are converted to comma-separated strings
None values become empty strings
Bytes are decoded to UTF-8

Examples

>>> standards = StandardsGroup(group)
>>> metadata = summarize_metadata_standards()
>>> standards.summary_table_from_dict(metadata)

class mth5.groups.StationGroup(group: Group, station_metadata: Station | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Utility wrapper for a single station at /Survey/Stations/<id>.

Station groups manage run collections, metadata propagation, and provide summary utilities for quick inspection.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> station = m5.stations_group.add_station("MT001")
>>> _ = station.add_run("MT001a")
>>> station.run_summary.shape[0] >= 1
True

add_run(run_name: str, run_metadata: Run | None = None) → RunGroup[source]

Add a run under this station.

Parameters:

run_name (str) – Run identifier (for example id + suffix).
run_metadata (mt_metadata.timeseries.Run, optional) – Metadata container to seed the run attributes.

Returns:

Wrapper for the created or existing run.

Return type:

RunGroup

Examples

>>> run = station.add_run("MT001a")
>>> run.metadata.id
'MT001a'

property features_group: MasterFeaturesGroup: Convenience accessor for /Station/Features.

property fourier_coefficients_group: MasterFCGroup: Convenience accessor for /Station/Fourier_Coefficients.

get_run(run_name: str) → RunGroup[source]

Return a run by name.

Parameters:: run_name (str) – Existing run name.
Returns:: Wrapper for the requested run.
Return type:: RunGroup
Raises:: MTH5Error – If the run does not exist.

Examples

>>> existing_run = station.get_run("MT001a")
>>> existing_run.name
'MT001a'

initialize_group(**kwargs: Any) → None[source]

Create default subgroups and write metadata.

Parameters:: **kwargs – Additional attributes to set on the instance before initialization.

Examples

>>> station.initialize_group()

locate_run(sample_rate: float, start: str | MTime) → DataFrame | None[source]

Locate runs matching a sample rate and start time.

Parameters:

sample_rate (float) – Sample rate in samples per second.
start (str or MTime) – Start time string or MTime instance.

Returns:

Matching rows from run_summary or None when no match exists.

Return type:

pandas.DataFrame or None

Examples

>>> station.locate_run(256.0, "2020-01-01T00:00:00")

make_run_name(alphabet: bool = False) → str | None[source]

Generate the next run name using an alphabetic or numeric suffix.

Parameters:: alphabet (bool, default False) – If True use letters (a, b, …); otherwise use numeric suffixes (001).
Returns:: Proposed run name or None if generation fails.
Return type:: str or None

Examples

>>> station.metadata.id = "MT001"
>>> station.make_run_name()
'MT001a'

property master_station_group: MasterStationGroup: Shortcut to the containing master station group.

property metadata: Station[source]: Station metadata enriched with run information.

property name: str

remove_run(run_name: str) → None[source]

Remove a run from this station.

Parameters:: run_name (str) – Existing run name.

Notes

Deleting removes the reference only; storage is not reclaimed.

Examples

>>> station.remove_run("MT001a")

property run_summary: DataFrame

Return a summary of runs belonging to the station.

Returns:: Columns include id, start, end, components, measurement_type, sample_rate, and hdf5_reference.
Return type:: pandas.DataFrame

Notes

Channel lists stored as byte arrays or JSON strings are normalized before summarization.

Examples

>>> station.run_summary.head()

property survey_metadata: Survey: Return survey metadata with this station appended.

property transfer_functions_group: TransferFunctionsGroup: Convenience accessor for /Station/Transfer_Functions.

update_metadata() → None[source]

Synchronize station metadata from contained runs.

Notes

The station time_period is set to the min/max of all runs, and channels_recorded combines all recorded components.

Examples

>>> _ = station.update_metadata()
>>> station.metadata.time_period.start
'2020-01-01T00:00:00'

update_station_metadata() → None[source]

Deprecated alias for update_metadata().

Raises:: DeprecationWarning – Always raised to direct callers to update_metadata.

Examples

>>> station.update_station_metadata()
Traceback (most recent call last):
...
DeprecationWarning: 'update_station_metadata' has been deprecated use 'update_metadata()'

class mth5.groups.SurveyGroup(group: Group, survey_metadata: Survey | None = None, **kwargs: Any)[source]

Bases: BaseGroup

Wrapper for a single survey at Experiment/Surveys/<id>.

Handles survey-level metadata, child groups (stations, reports, filters, standards), and synchronization utilities.

Examples

>>> survey = surveys.add_survey("survey_01")
>>> survey.metadata.id
'survey_01'

property filters_group: FiltersGroup: Convenience accessor for /Survey/Filters group.

initialize_group(**kwargs: Any) → None[source]

Create default subgroups and write survey metadata.

Parameters:: **kwargs – Additional attributes to set on the instance before initialization.

Examples

>>> survey.initialize_group()

property metadata: Survey[source]: Survey metadata enriched with station and filter information.

property reports_group: ReportsGroup: Convenience accessor for /Survey/Reports group.

property standards_group: StandardsGroup: Convenience accessor for /Survey/Standards group.

property stations_group: MasterStationGroup

update_metadata(survey_dict: dict[str, Any] | None = None) → None[source]

Synchronize survey metadata from station summaries.

Parameters:: survey_dict (dict, optional) – Additional metadata values to merge before synchronization.

Notes

Updates survey start/end dates and bounding box from station summaries, then writes metadata to HDF5.

Examples

>>> _ = survey.update_metadata()
>>> survey.metadata.time_period.start_date
'2020-01-01'

update_survey_metadata(survey_dict: dict[str, Any] | None = None) → None[source]

Deprecated alias for update_metadata().

Raises:: DeprecationWarning – Always raised to direct callers to update_metadata.

Examples

>>> survey.update_survey_metadata()
Traceback (most recent call last):
...
DeprecationWarning: 'update_survey_metadata' has been deprecated use 'update_metadata()'

write_metadata() → None[source]: Write HDF5 attributes from the survey metadata object.

class mth5.groups.TransferFunctionGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Wrapper for a single transfer function estimation.

Add a statistical estimate dataset.

Parameters:

estimate_name (str) – Dataset name.
estimate_data (numpy.ndarray or xarray.DataArray, optional) – Estimate values; if None a placeholder array is created.
estimate_metadata (StatisticalEstimate, optional) – Metadata describing the estimate.
max_shape (tuple of int or None, default (None, None, None)) – Maximum shape for resizable datasets.
chunks (bool, default True) – Chunking flag forwarded to HDF5 dataset creation.

Returns:

Wrapper combining dataset and metadata.

Return type:

EstimateDataset

Raises:

TypeError – If estimate_data is not array-like.

Examples

>>> est = tf_group.add_statistical_estimate("transfer_function")
>>> isinstance(est, EstimateDataset)
True

from_tf_object(tf_obj: TF, update_metadata: bool = True) → None[source]

Populate datasets from a TF object.

Parameters:

tf_obj (TF) – Transfer function object containing estimates and metadata.
update_metadata (bool, default True) – If True write transfer function metadata to HDF5.

Raises:

ValueError – If tf_obj is not a TF instance.

Examples

>>> tf_group.from_tf_object(tf_obj)

get_estimate(estimate_name: str) → EstimateDataset[source]: Return a statistical estimate dataset by name.

has_estimate(estimate: str) → bool[source]: Return True if an estimate exists and is populated.

property period: ndarray | None: Return period array stored in period dataset, if present.

remove_estimate(estimate_name: str) → None[source]: Remove a statistical estimate dataset reference.

to_tf_object() → TF[source]

Convert this group into a populated TF object.

Returns:: TF instance with survey, station, runs, channels, period, and estimate datasets applied.
Return type:: mt_metadata.transfer_functions.core.TF
Raises:: ValueError – If no period dataset is present.

Examples

>>> tf_obj = tf_group.to_tf_object()

class mth5.groups.TransferFunctionsGroup(group: Any, **kwargs: Any)[source]

Bases: BaseGroup

Container for transfer functions under a station.

Each child group is a single transfer function estimation managed by TransferFunctionGroup.

Examples

>>> from mth5 import mth5
>>> m5 = mth5.MTH5()
>>> _ = m5.open_mth5("/tmp/example.mth5", mode="a")
>>> station = m5.stations_group.add_station("mt01")
>>> tf_group = station.transfer_functions_group
>>> tf_group.groups_list
[]

add_transfer_function(name: str, tf_object: TF | None = None) → TransferFunctionGroup[source]

Add a transfer function group under this station.

Parameters:

name (str) – Transfer function identifier.
tf_object (TF, optional) – Transfer function instance to seed metadata and datasets.

Returns:

Wrapper for the created or existing transfer function.

Return type:

TransferFunctionGroup

Examples

>>> tf_group = station.transfer_functions_group
>>> _ = tf_group.add_transfer_function("mt01_4096")

get_tf_object(tf_id: str) → TF[source]

Return a populated mt_metadata.transfer_functions.core.TF.

Parameters:: tf_id (str) – Transfer function name to convert.
Returns:: Transfer function populated with metadata and estimates.
Return type:: mt_metadata.transfer_functions.core.TF

Examples

>>> tf_obj = tf_group.get_tf_object("mt01_4096")

get_transfer_function(tf_id: str) → TransferFunctionGroup[source]

Return an existing transfer function by id.

Parameters:: tf_id (str) – Name of the transfer function.
Returns:: Wrapper for the requested transfer function.
Return type:: TransferFunctionGroup
Raises:: MTH5Error – If the transfer function does not exist.

Examples

>>> existing = station.transfer_functions_group.get_transfer_function("mt01_4096")
>>> existing.name
'mt01_4096'

remove_transfer_function(tf_id: str) → None[source]

Delete a transfer function reference from the station.

Parameters:: tf_id (str) – Transfer function name.

Notes

HDF5 deletion removes the reference only; storage is not reclaimed.

Examples

>>> tf_group.remove_transfer_function("mt01_4096")

tf_summary(as_dataframe: bool = True) → DataFrame | ndarray[source]

Summarize transfer functions stored for the station.

Parameters:: as_dataframe (bool, default True) – If True return a pandas DataFrame, otherwise a NumPy structured array.
Returns:: Summary rows including station reference, location, and TF metadata.
Return type:: pandas.DataFrame or numpy.ndarray

Examples

>>> summary = tf_group.tf_summary()
>>> summary.columns[:4].tolist()
['station_hdf5_reference', 'station', 'latitude', 'longitude']