mth5.groups.features

Created on Fri Dec 13 12:40:34 2024

@author: jpeacock

Attributes

TIME_DOMAIN

FREQUENCY_DOMAIN

Classes

MasterFeaturesGroup

Master group container for features associated with Fourier Coefficients or time series.

FeatureGroup

Container for a single feature set with all associated runs and decimation levels.

FeatureTSRunGroup

Container for time series features from a processing or analysis run.

FeatureFCRunGroup

Container for Fourier Coefficient features from a processing run.

FeatureDecimationGroup

Container for a single decimation level with multiple Fourier Coefficient channels.

Module Contents

mth5.groups.features.TIME_DOMAIN = ['ts', 'time', 'time series', 'time_series'][source]
mth5.groups.features.FREQUENCY_DOMAIN = ['fc', 'frequency', 'fourier', 'fourier_domain'][source]
class mth5.groups.features.MasterFeaturesGroup(group: h5py.Group, **kwargs)[source]

Bases: mth5.groups.BaseGroup

Master group container for features associated with Fourier Coefficients or time series.

This class manages the top-level organization of geophysical feature data, organizing it into feature-specific groups. Features can include various frequency or time-domain analyses.

Hierarchy

MasterFeatureGroup -> FeatureGroup -> FeatureRunGroup ->

  • FC: FeatureDecimationGroup -> FeatureChannelDataset

  • Time Series: FeatureChannelDataset

param group:

HDF5 group object for this MasterFeaturesGroup.

type group:

h5py.Group

param **kwargs:

Additional keyword arguments passed to BaseGroup.

Examples

>>> import h5py
>>> from mth5.groups.features import MasterFeaturesGroup
>>> with h5py.File('data.h5', 'r') as f:
...     master = MasterFeaturesGroup(f['features'])
...     feature_list = master.groups_list
add_feature_group(feature_name: str, feature_metadata: mt_metadata.features.FeatureDecimationChannel | None = None) FeatureGroup[source]

Add a feature group to the master features container.

Creates a new FeatureGroup with the specified name and optional metadata. Feature groups organize all runs and decimation levels for a particular feature.

Parameters:
  • feature_name (str) – Name for the feature group. Will be validated and formatted.

  • feature_metadata (FeatureDecimationChannel, optional) – Metadata describing the feature. Default is None.

Returns:

Newly created feature group object.

Return type:

FeatureGroup

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.add_feature_group('coherency')
>>> print(feature.name)
'coherency'
get_feature_group(feature_name: str) FeatureGroup[source]

Retrieve a feature group by name.

Parameters:

feature_name (str) – Name of the feature group to retrieve.

Returns:

The requested feature group.

Return type:

FeatureGroup

Raises:

MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> feature = master.get_feature_group('coherency')
>>> print(feature.name)
'coherency'
remove_feature_group(feature_name: str) None[source]

Remove a feature group from the master container.

Deletes the specified feature group and its associated data from the HDF5 file. Note that this operation removes the reference but does not reduce the file size; copy desired data to a new file for size reduction.

Parameters:

feature_name (str) – Name of the feature group to remove.

Raises:

MTH5Error – If the feature group does not exist.

Examples

>>> master = MasterFeaturesGroup(h5_group)
>>> master.remove_feature_group('coherency')
class mth5.groups.features.FeatureGroup(group: h5py.Group, feature_metadata: object | None = None, **kwargs)[source]

Bases: mth5.groups.BaseGroup

Container for a single feature set with all associated runs and decimation levels.

This class manages feature-specific data including all processing runs and decimation levels. Features can include both Fourier Coefficient and time series data.

Hierarchy

FeatureGroup -> FeatureRunGroup ->

  • FC: FeatureDecimationLevel -> FeatureChannelDataset

  • TS: FeatureChannelDataset

param group:

HDF5 group object for this FeatureGroup.

type group:

h5py.Group

param feature_metadata:

Metadata specific to this feature. Should include description and parameters.

type feature_metadata:

optional

param **kwargs:

Additional keyword arguments passed to BaseGroup.

Notes

Feature metadata should be specific to the feature and include descriptions of the feature and any parameters used in its computation.

Examples

>>> feature = FeatureGroup(h5_group, feature_metadata=metadata)
>>> run_group = feature.add_feature_run_group('run_1', domain='fc')
add_feature_run_group(feature_name: str, feature_run_metadata: object | None = None, domain: str = 'fc') object[source]

Add a feature run group for a single feature.

Creates either a Fourier Coefficient run group or a time series run group based on the specified domain. The domain can be determined from the metadata or explicitly provided.

Parameters:
  • feature_name (str) – Name for the feature run group.

  • feature_run_metadata (optional) – Metadata for the feature run. If provided, domain is extracted from metadata.domain attribute. Default is None.

  • domain (str, default='fc') –

    Domain type for the data. Must be one of:

    • ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients

    • ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

Newly created feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:
  • ValueError – If domain is not recognized.

  • AttributeError – If metadata does not have a domain attribute when metadata is provided.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.add_feature_run_group('processing_run_1', domain='fc')
>>> ts_run = feature.add_feature_run_group('ts_analysis', domain='ts')
get_feature_run_group(feature_name: str, domain: str = 'frequency') object[source]

Retrieve a feature run group by name and domain type.

Parameters:
  • feature_name (str) – Name of the feature run group to retrieve.

  • domain (str, default='frequency') –

    Domain type. Must be one of:

    • ’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients

    • ’ts’, ‘time’, ‘time series’, ‘time_series’: Time series

Returns:

The requested feature run group.

Return type:

FeatureFCRunGroup or FeatureTSRunGroup

Raises:
  • ValueError – If domain is not recognized.

  • MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> fc_run = feature.get_feature_run_group('processing_run_1', domain='fc')
remove_feature_run_group(feature_name: str) None[source]

Remove a feature run group.

Deletes the specified feature run group and all its associated data. Note that deletion removes the reference but does not reduce HDF5 file size.

Parameters:

feature_name (str) – Name of the feature run group to remove.

Raises:

MTH5Error – If the feature run group does not exist.

Examples

>>> feature = FeatureGroup(h5_group)
>>> feature.remove_feature_run_group('processing_run_1')
class mth5.groups.features.FeatureTSRunGroup(group: h5py.Group, feature_run_metadata: object | None = None, **kwargs)[source]

Bases: mth5.groups.BaseGroup

Container for time series features from a processing or analysis run.

This class wraps a RunGroup to manage time series data features while maintaining compatibility with the feature hierarchy structure.

Parameters:
  • group (h5py.Group) – HDF5 group object for this FeatureTSRunGroup.

  • feature_run_metadata (optional) – Metadata for the feature run (same type as timeseries.Run).

  • **kwargs – Additional keyword arguments passed to BaseGroup.

Notes

This class uses methods from RunGroup for channel management, which may have performance implications due to multiple RunGroup instantiations.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group, feature_run_metadata=metadata)
>>> channel = ts_run.add_feature_channel('Ex', 'electric', data)
add_feature_channel(channel_name: str, channel_type: str, data: numpy.ndarray | None = None, channel_dtype: str = 'int32', shape: tuple | None = None, max_shape: tuple = (None,), chunks: bool = True, channel_metadata: object | None = None, **kwargs) object[source]

Add a time series channel to the feature run group.

Creates a new channel for time series data with the specified properties and optional metadata. Channel metadata should be a timeseries.Channel object.

Parameters:
  • channel_name (str) – Name for the channel.

  • channel_type (str) – Type of channel (e.g., ‘electric’, ‘magnetic’).

  • data (np.ndarray, optional) – Initial data for the channel. Default is None.

  • channel_dtype (str, default='int32') – Data type for the channel.

  • shape (tuple, optional) – Shape of the channel data. Default is None.

  • max_shape (tuple, default=(None,)) – Maximum shape for expandable dimensions.

  • chunks (bool, default=True) – Whether to use chunking for the dataset.

  • channel_metadata (optional) – Metadata object (timeseries.Channel type). Default is None.

  • **kwargs – Additional keyword arguments for dataset creation.

Returns:

Channel object from RunGroup.

Return type:

object

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.add_feature_channel(
...     'Ex', 'electric', data=np.arange(1000))
get_feature_channel(channel_name: str) object[source]

Retrieve a feature channel by name.

Parameters:

channel_name (str) – Name of the channel to retrieve.

Returns:

Channel object from RunGroup.

Return type:

object

Raises:

MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> channel = ts_run.get_feature_channel('Ex')
remove_feature_channel(channel_name: str) None[source]

Remove a feature channel from the run group.

Parameters:

channel_name (str) – Name of the channel to remove.

Raises:

MTH5Error – If the channel does not exist.

Examples

>>> ts_run = FeatureTSRunGroup(h5_group)
>>> ts_run.remove_feature_channel('Ex')
class mth5.groups.features.FeatureFCRunGroup(group: h5py.Group, feature_run_metadata: mt_metadata.processing.fourier_coefficients.decimation.Decimation | None = None, **kwargs)[source]

Bases: mth5.groups.BaseGroup

Container for Fourier Coefficient features from a processing run.

This class manages Fourier Coefficient data organized by decimation levels, each containing multiple frequency channels with time-frequency data.

Hierarchy

FeatureFCRunGroup -> FeatureDecimationGroup -> FeatureChannelDataset

metadata[source]

Metadata including:

  • list of decimation levels

  • start time (earliest)

  • end time (latest)

  • method (fft, wavelet, …)

  • list of channels used

  • starting sample rate

  • bands used

  • type (TS or FC)

Type:

Decimation

param group:

HDF5 group object for this FeatureFCRunGroup.

type group:

h5py.Group

param feature_run_metadata:

Decimation metadata for the feature run. Default is None.

type feature_run_metadata:

optional

param **kwargs:

Additional keyword arguments passed to BaseGroup.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group, feature_run_metadata=metadata)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)
metadata() mt_metadata.processing.fourier_coefficients.decimation.Decimation[source]

Overwrite get metadata to include channel information in the runs

property decimation_level_summary: pandas.DataFrame[source]

Get a summary of all decimation levels in the run.

Returns a pandas DataFrame with information about each decimation level including decimation factor, time range, and HDF5 reference.

Returns:

DataFrame with columns:

  • namestr

    Decimation level name

  • startdatetime64[ns]

    Start time of the decimation level

  • enddatetime64[ns]

    End time of the decimation level

  • hdf5_referenceh5py.ref_dtype

    HDF5 reference to the decimation level group

Return type:

pd.DataFrame

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> summary = fc_run.decimation_level_summary
>>> print(summary[['name', 'start', 'end']])
add_decimation_level(decimation_level_name: str, feature_decimation_level_metadata: object | None = None) FeatureDecimationGroup[source]

Add a decimation level group to the feature run.

Parameters:
  • decimation_level_name (str) – Name for the decimation level.

  • feature_decimation_level_metadata (optional) – Metadata for the decimation level. Default is None.

Returns:

Newly created decimation level group.

Return type:

FeatureDecimationGroup

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)
>>> print(decimation.name)
'level_0'
get_decimation_level(decimation_level_name: str) FeatureDecimationGroup[source]

Retrieve a decimation level group by name.

Parameters:

decimation_level_name (str) – Name of the decimation level to retrieve.

Returns:

The requested decimation level group.

Return type:

FeatureDecimationGroup

Raises:

MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> decimation = fc_run.get_decimation_level('level_0')
remove_decimation_level(decimation_level_name: str) None[source]

Remove a decimation level from the feature run.

Parameters:

decimation_level_name (str) – Name of the decimation level to remove.

Raises:

MTH5Error – If the decimation level does not exist.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.remove_decimation_level('level_0')
update_metadata() None[source]

Update metadata from all decimation levels.

Scans all decimation levels and updates the run-level metadata with aggregated information including time ranges.

Examples

>>> fc_run = FeatureFCRunGroup(h5_group)
>>> fc_run.update_metadata()
class mth5.groups.features.FeatureDecimationGroup(group: h5py.Group, decimation_level_metadata: object | None = None, **kwargs)[source]

Bases: mth5.groups.BaseGroup

Container for a single decimation level with multiple Fourier Coefficient channels.

This class manages Fourier Coefficient data organized by frequency, time, and channel. Data is assumed to be uniformly sampled in both frequency and time domains.

Hierarchy

FeatureDecimationGroup -> FeatureChannelDataset (multiple channels)

Data Assumptions

  1. Data are uniformly sampled in frequency domain

  2. Data are uniformly sampled in time domain

  3. FFT moving window has uniform step size

start time

Start time of the decimation level

Type:

datetime

end time

End time of the decimation level

Type:

datetime

channels

List of channel names in this decimation level

Type:

list

decimation_factor

Factor by which data was decimated

Type:

int

decimation_level

Level index in decimation hierarchy

Type:

int

decimation_sample_rate

Sample rate after decimation (Hz)

Type:

float

method

Method used (FFT, wavelet, etc.)

Type:

str

anti_alias_filter

Anti-aliasing filter used

Type:

optional

prewhitening_type

Type of prewhitening applied

Type:

optional

harmonics_kept

Harmonic indices kept in the data

Type:

list or ‘all’

window

Window parameters (length, overlap, type, sample rate)

Type:

dict

bands

Frequency bands in the data

Type:

list

param group:

HDF5 group object for this FeatureDecimationGroup.

type group:

h5py.Group

param decimation_level_metadata:

Metadata for the decimation level. Default is None.

type decimation_level_metadata:

optional

param **kwargs:

Additional keyword arguments passed to BaseGroup.

Examples

>>> decimation = FeatureDecimationGroup(h5_group, metadata)
>>> channel = decimation.add_channel('Ex', fc_data=fc_array, fc_metadata=ch_metadata)
metadata()[source]

Overwrite get metadata to include channel information in the runs

property channel_summary: pandas.DataFrame[source]

Get a summary of all channels in this decimation level.

Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.

Returns:

DataFrame with columns:

  • namestr

    Channel name

  • startdatetime64[ns]

    Start time of the channel data

  • enddatetime64[ns]

    End time of the channel data

  • n_frequencyint64

    Number of frequency bins

  • n_windowsint64

    Number of time windows

  • sample_rate_decimation_levelfloat64

    Decimation level sample rate (Hz)

  • sample_rate_window_stepfloat64

    Sample rate of window stepping (Hz)

  • unitsstr

    Physical units of the data

  • hdf5_referenceh5py.ref_dtype

    HDF5 reference to the channel dataset

Return type:

pd.DataFrame

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> summary = decimation.channel_summary
>>> print(summary[['name', 'n_frequency', 'n_windows']])
from_dataframe(df: pandas.DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') None[source]

Load Fourier Coefficient data from a pandas DataFrame.

Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.

Parameters:
  • df (pd.DataFrame) – Input DataFrame containing the coefficient data.

  • channel_key (str) – Name of the column containing coefficient values.

  • time_key (str, default='time') – Name of the time coordinate column.

  • frequency_key (str, default='frequency') – Name of the frequency coordinate column.

Raises:

TypeError – If df is not a pandas DataFrame.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')
from_xarray(data_array: xarray.DataArray | xarray.Dataset, sample_rate_decimation_level: float) None[source]

Load Fourier Coefficient data from an xarray DataArray or Dataset.

Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FeatureChannelDataset instances for each variable or the single DataArray.

Parameters:
  • data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).

  • sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).

Raises:

TypeError – If data_array is not an xarray Dataset or DataArray.

Notes

Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.

Examples

>>> import xarray as xr
>>> import numpy as np
>>> decimation = FeatureDecimationGroup(h5_group)

Create sample xarray data:

>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]')
>>> freqs = np.linspace(0.01, 100, 256)
>>> data_array = np.random.randn(len(times), len(freqs)) + \
...              1j * np.random.randn(len(times), len(freqs))
>>> xr_data = xr.DataArray(
...     data_array,
...     dims=['time', 'frequency'],
...     coords={'time': times, 'frequency': freqs},
...     name='Ex',
...     attrs={'units': 'mV/km'}
... )

Load into decimation group:

>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)
to_xarray(channels: list | None = None) xarray.Dataset[source]

Create an xarray Dataset from Fourier Coefficient channels.

If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.

Parameters:

channels (list, optional) – List of channel names to include. If None, all channels are used. Default is None.

Returns:

xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.

Return type:

xr.Dataset

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> xr_data = decimation.to_xarray()
>>> print(xr_data.data_vars)
Data variables:
    Ex  (time, frequency) complex128
    Ey  (time, frequency) complex128

Get specific channels:

>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])
from_numpy_array(nd_array: numpy.ndarray, ch_name: str | list) None[source]

Load Fourier Coefficient data from a numpy array.

Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.

Parameters:
  • nd_array (np.ndarray) – Input numpy array containing coefficient data.

  • ch_name (str or list) – Channel name (for 2D array) or list of channel names (for 3D array).

Raises:
  • TypeError – If nd_array is not a numpy ndarray.

  • ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).

Examples

>>> decimation = FeatureDecimationGroup(h5_group)

Load single channel:

>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100)
>>> decimation.from_numpy_array(data_2d, ch_name='Ex')

Load multiple channels:

>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100)
>>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])
add_channel(fc_name: str, fc_data: numpy.ndarray | xarray.DataArray | xarray.Dataset | pandas.DataFrame | None = None, fc_metadata: mt_metadata.features.FeatureDecimationChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = complex, **kwargs) mth5.groups.FeatureChannelDataset[source]

Add a Fourier Coefficient channel to the decimation level.

Creates a new FeatureChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array, xarray, DataFrame, or created empty.

Parameters:
  • fc_name (str) – Name for the Fourier Coefficient channel.

  • fc_data (np.ndarray, xr.DataArray, xr.Dataset, pd.DataFrame, optional) – Input data. Can be numpy array (time, frequency) or xarray/DataFrame format. Default is None (creates empty dataset).

  • fc_metadata (FeatureDecimationChannel, optional) – Metadata for the channel. Default is None.

  • max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).

  • chunks (bool, default=True) – Whether to use HDF5 chunking.

  • dtype (type, default=complex) – Data type for the dataset (e.g., complex, float, int).

  • **kwargs – Additional keyword arguments for HDF5 dataset creation.

Returns:

Newly created FeatureChannelDataset object.

Return type:

FeatureChannelDataset

Raises:
  • TypeError – If fc_data type is not supported or metadata type mismatch.

  • RuntimeError or OSError – If channel already exists (will return existing channel).

Notes

Data layout assumes (time, frequency) organization:

  • time index: window start times

  • frequency index: harmonic indices or float values

  • data: complex Fourier coefficients

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> metadata = FeatureDecimationChannel(name='Ex')

Create from numpy array:

>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256)
>>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)

Create empty channel (expandable):

>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)
get_channel(fc_name: str) mth5.groups.FeatureChannelDataset[source]

Retrieve a Fourier Coefficient channel by name.

Parameters:

fc_name (str) – Name of the channel to retrieve.

Returns:

The requested FeatureChannelDataset object.

Return type:

FeatureChannelDataset

Raises:

MTH5Error – If the channel does not exist.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> channel = decimation.get_channel('Ex')
>>> data = channel.to_numpy()
remove_channel(fc_name: str) None[source]

Remove a Fourier Coefficient channel from the decimation level.

Deletes the channel from the HDF5 file. Note that this removes the reference but does not reduce file size.

Parameters:

fc_name (str) – Name of the channel to remove.

Raises:

MTH5Error – If the channel does not exist.

Notes

To reduce HDF5 file size, copy desired data to a new file.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.remove_channel('Ex')
update_metadata() None[source]

Update metadata from all channels in the decimation level.

Scans all channels and updates the decimation-level metadata with aggregated information including time ranges and sampling rates.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.update_metadata()
add_weights(weight_name: str, weight_data: numpy.ndarray | None = None, weight_metadata: object | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) None[source]

Add weight or masking data for Fourier Coefficients.

Creates a dataset to store weights or masks for quality control, frequency band selection, or time window filtering.

Parameters:
  • weight_name (str) – Name for the weight dataset.

  • weight_data (np.ndarray, optional) – Weight values. Default is None.

  • weight_metadata (optional) – Metadata for the weight dataset. Default is None.

  • max_shape (tuple, default=(None, None, None)) – Maximum shape for expandable dimensions.

  • chunks (bool, default=True) – Whether to use HDF5 chunking.

  • **kwargs – Additional keyword arguments for HDF5 dataset creation.

Notes

Weight datasets can track:

  • weight_channel: Per-channel weights

  • weight_band: Per-frequency-band weights

  • weight_time: Per-time-window weights

This method is a placeholder for future implementation.

Examples

>>> decimation = FeatureDecimationGroup(h5_group)
>>> decimation.add_weights('coherency_weights', weight_data=weights)