mth5.groups.features
Created on Fri Dec 13 12:40:34 2024
@author: jpeacock
Attributes
Classes
Master group container for features associated with Fourier Coefficients or time series. |
|
Container for a single feature set with all associated runs and decimation levels. |
|
Container for time series features from a processing or analysis run. |
|
Container for Fourier Coefficient features from a processing run. |
|
Container for a single decimation level with multiple Fourier Coefficient channels. |
Module Contents
- class mth5.groups.features.MasterFeaturesGroup(group: h5py.Group, **kwargs)[source]
Bases:
mth5.groups.BaseGroupMaster group container for features associated with Fourier Coefficients or time series.
This class manages the top-level organization of geophysical feature data, organizing it into feature-specific groups. Features can include various frequency or time-domain analyses.
Hierarchy
MasterFeatureGroup -> FeatureGroup -> FeatureRunGroup ->
FC: FeatureDecimationGroup -> FeatureChannelDataset
Time Series: FeatureChannelDataset
- param group:
HDF5 group object for this MasterFeaturesGroup.
- type group:
h5py.Group
- param **kwargs:
Additional keyword arguments passed to BaseGroup.
Examples
>>> import h5py >>> from mth5.groups.features import MasterFeaturesGroup >>> with h5py.File('data.h5', 'r') as f: ... master = MasterFeaturesGroup(f['features']) ... feature_list = master.groups_list
- add_feature_group(feature_name: str, feature_metadata: mt_metadata.features.FeatureDecimationChannel | None = None) FeatureGroup[source]
Add a feature group to the master features container.
Creates a new FeatureGroup with the specified name and optional metadata. Feature groups organize all runs and decimation levels for a particular feature.
- Parameters:
feature_name (str) – Name for the feature group. Will be validated and formatted.
feature_metadata (FeatureDecimationChannel, optional) – Metadata describing the feature. Default is None.
- Returns:
Newly created feature group object.
- Return type:
Examples
>>> master = MasterFeaturesGroup(h5_group) >>> feature = master.add_feature_group('coherency') >>> print(feature.name) 'coherency'
- get_feature_group(feature_name: str) FeatureGroup[source]
Retrieve a feature group by name.
- Parameters:
feature_name (str) – Name of the feature group to retrieve.
- Returns:
The requested feature group.
- Return type:
- Raises:
MTH5Error – If the feature group does not exist.
Examples
>>> master = MasterFeaturesGroup(h5_group) >>> feature = master.get_feature_group('coherency') >>> print(feature.name) 'coherency'
- remove_feature_group(feature_name: str) None[source]
Remove a feature group from the master container.
Deletes the specified feature group and its associated data from the HDF5 file. Note that this operation removes the reference but does not reduce the file size; copy desired data to a new file for size reduction.
- Parameters:
feature_name (str) – Name of the feature group to remove.
- Raises:
MTH5Error – If the feature group does not exist.
Examples
>>> master = MasterFeaturesGroup(h5_group) >>> master.remove_feature_group('coherency')
- class mth5.groups.features.FeatureGroup(group: h5py.Group, feature_metadata: object | None = None, **kwargs)[source]
Bases:
mth5.groups.BaseGroupContainer for a single feature set with all associated runs and decimation levels.
This class manages feature-specific data including all processing runs and decimation levels. Features can include both Fourier Coefficient and time series data.
Hierarchy
FeatureGroup -> FeatureRunGroup ->
FC: FeatureDecimationLevel -> FeatureChannelDataset
TS: FeatureChannelDataset
- param group:
HDF5 group object for this FeatureGroup.
- type group:
h5py.Group
- param feature_metadata:
Metadata specific to this feature. Should include description and parameters.
- type feature_metadata:
optional
- param **kwargs:
Additional keyword arguments passed to BaseGroup.
Notes
Feature metadata should be specific to the feature and include descriptions of the feature and any parameters used in its computation.
Examples
>>> feature = FeatureGroup(h5_group, feature_metadata=metadata) >>> run_group = feature.add_feature_run_group('run_1', domain='fc')
- add_feature_run_group(feature_name: str, feature_run_metadata: object | None = None, domain: str = 'fc') object[source]
Add a feature run group for a single feature.
Creates either a Fourier Coefficient run group or a time series run group based on the specified domain. The domain can be determined from the metadata or explicitly provided.
- Parameters:
feature_name (str) – Name for the feature run group.
feature_run_metadata (optional) – Metadata for the feature run. If provided, domain is extracted from metadata.domain attribute. Default is None.
domain (str, default='fc') –
Domain type for the data. Must be one of:
’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
’ts’, ‘time’, ‘time series’, ‘time_series’: Time series
- Returns:
Newly created feature run group.
- Return type:
- Raises:
ValueError – If domain is not recognized.
AttributeError – If metadata does not have a domain attribute when metadata is provided.
Examples
>>> feature = FeatureGroup(h5_group) >>> fc_run = feature.add_feature_run_group('processing_run_1', domain='fc') >>> ts_run = feature.add_feature_run_group('ts_analysis', domain='ts')
- get_feature_run_group(feature_name: str, domain: str = 'frequency') object[source]
Retrieve a feature run group by name and domain type.
- Parameters:
feature_name (str) – Name of the feature run group to retrieve.
domain (str, default='frequency') –
Domain type. Must be one of:
’fc’, ‘frequency’, ‘fourier’, ‘fourier_domain’: Fourier Coefficients
’ts’, ‘time’, ‘time series’, ‘time_series’: Time series
- Returns:
The requested feature run group.
- Return type:
- Raises:
ValueError – If domain is not recognized.
MTH5Error – If the feature run group does not exist.
Examples
>>> feature = FeatureGroup(h5_group) >>> fc_run = feature.get_feature_run_group('processing_run_1', domain='fc')
- remove_feature_run_group(feature_name: str) None[source]
Remove a feature run group.
Deletes the specified feature run group and all its associated data. Note that deletion removes the reference but does not reduce HDF5 file size.
- Parameters:
feature_name (str) – Name of the feature run group to remove.
- Raises:
MTH5Error – If the feature run group does not exist.
Examples
>>> feature = FeatureGroup(h5_group) >>> feature.remove_feature_run_group('processing_run_1')
- class mth5.groups.features.FeatureTSRunGroup(group: h5py.Group, feature_run_metadata: object | None = None, **kwargs)[source]
Bases:
mth5.groups.BaseGroupContainer for time series features from a processing or analysis run.
This class wraps a RunGroup to manage time series data features while maintaining compatibility with the feature hierarchy structure.
- Parameters:
group (h5py.Group) – HDF5 group object for this FeatureTSRunGroup.
feature_run_metadata (optional) – Metadata for the feature run (same type as timeseries.Run).
**kwargs – Additional keyword arguments passed to BaseGroup.
Notes
This class uses methods from RunGroup for channel management, which may have performance implications due to multiple RunGroup instantiations.
Examples
>>> ts_run = FeatureTSRunGroup(h5_group, feature_run_metadata=metadata) >>> channel = ts_run.add_feature_channel('Ex', 'electric', data)
- add_feature_channel(channel_name: str, channel_type: str, data: numpy.ndarray | None = None, channel_dtype: str = 'int32', shape: tuple | None = None, max_shape: tuple = (None,), chunks: bool = True, channel_metadata: object | None = None, **kwargs) object[source]
Add a time series channel to the feature run group.
Creates a new channel for time series data with the specified properties and optional metadata. Channel metadata should be a timeseries.Channel object.
- Parameters:
channel_name (str) – Name for the channel.
channel_type (str) – Type of channel (e.g., ‘electric’, ‘magnetic’).
data (np.ndarray, optional) – Initial data for the channel. Default is None.
channel_dtype (str, default='int32') – Data type for the channel.
shape (tuple, optional) – Shape of the channel data. Default is None.
max_shape (tuple, default=(None,)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use chunking for the dataset.
channel_metadata (optional) – Metadata object (timeseries.Channel type). Default is None.
**kwargs – Additional keyword arguments for dataset creation.
- Returns:
Channel object from RunGroup.
- Return type:
object
Examples
>>> ts_run = FeatureTSRunGroup(h5_group) >>> channel = ts_run.add_feature_channel( ... 'Ex', 'electric', data=np.arange(1000))
- get_feature_channel(channel_name: str) object[source]
Retrieve a feature channel by name.
- Parameters:
channel_name (str) – Name of the channel to retrieve.
- Returns:
Channel object from RunGroup.
- Return type:
object
- Raises:
MTH5Error – If the channel does not exist.
Examples
>>> ts_run = FeatureTSRunGroup(h5_group) >>> channel = ts_run.get_feature_channel('Ex')
- remove_feature_channel(channel_name: str) None[source]
Remove a feature channel from the run group.
- Parameters:
channel_name (str) – Name of the channel to remove.
- Raises:
MTH5Error – If the channel does not exist.
Examples
>>> ts_run = FeatureTSRunGroup(h5_group) >>> ts_run.remove_feature_channel('Ex')
- class mth5.groups.features.FeatureFCRunGroup(group: h5py.Group, feature_run_metadata: mt_metadata.processing.fourier_coefficients.decimation.Decimation | None = None, **kwargs)[source]
Bases:
mth5.groups.BaseGroupContainer for Fourier Coefficient features from a processing run.
This class manages Fourier Coefficient data organized by decimation levels, each containing multiple frequency channels with time-frequency data.
Hierarchy
FeatureFCRunGroup -> FeatureDecimationGroup -> FeatureChannelDataset
- metadata[source]
Metadata including:
list of decimation levels
start time (earliest)
end time (latest)
method (fft, wavelet, …)
list of channels used
starting sample rate
bands used
type (TS or FC)
- Type:
Decimation
- param group:
HDF5 group object for this FeatureFCRunGroup.
- type group:
h5py.Group
- param feature_run_metadata:
Decimation metadata for the feature run. Default is None.
- type feature_run_metadata:
optional
- param **kwargs:
Additional keyword arguments passed to BaseGroup.
Examples
>>> fc_run = FeatureFCRunGroup(h5_group, feature_run_metadata=metadata) >>> decimation = fc_run.add_decimation_level('level_0', dec_metadata)
- metadata() mt_metadata.processing.fourier_coefficients.decimation.Decimation[source]
Overwrite get metadata to include channel information in the runs
- property decimation_level_summary: pandas.DataFrame[source]
Get a summary of all decimation levels in the run.
Returns a pandas DataFrame with information about each decimation level including decimation factor, time range, and HDF5 reference.
- Returns:
DataFrame with columns:
- namestr
Decimation level name
- startdatetime64[ns]
Start time of the decimation level
- enddatetime64[ns]
End time of the decimation level
- hdf5_referenceh5py.ref_dtype
HDF5 reference to the decimation level group
- Return type:
pd.DataFrame
Examples
>>> fc_run = FeatureFCRunGroup(h5_group) >>> summary = fc_run.decimation_level_summary >>> print(summary[['name', 'start', 'end']])
- add_decimation_level(decimation_level_name: str, feature_decimation_level_metadata: object | None = None) FeatureDecimationGroup[source]
Add a decimation level group to the feature run.
- Parameters:
decimation_level_name (str) – Name for the decimation level.
feature_decimation_level_metadata (optional) – Metadata for the decimation level. Default is None.
- Returns:
Newly created decimation level group.
- Return type:
Examples
>>> fc_run = FeatureFCRunGroup(h5_group) >>> decimation = fc_run.add_decimation_level('level_0', dec_metadata) >>> print(decimation.name) 'level_0'
- get_decimation_level(decimation_level_name: str) FeatureDecimationGroup[source]
Retrieve a decimation level group by name.
- Parameters:
decimation_level_name (str) – Name of the decimation level to retrieve.
- Returns:
The requested decimation level group.
- Return type:
- Raises:
MTH5Error – If the decimation level does not exist.
Examples
>>> fc_run = FeatureFCRunGroup(h5_group) >>> decimation = fc_run.get_decimation_level('level_0')
- remove_decimation_level(decimation_level_name: str) None[source]
Remove a decimation level from the feature run.
- Parameters:
decimation_level_name (str) – Name of the decimation level to remove.
- Raises:
MTH5Error – If the decimation level does not exist.
Examples
>>> fc_run = FeatureFCRunGroup(h5_group) >>> fc_run.remove_decimation_level('level_0')
- class mth5.groups.features.FeatureDecimationGroup(group: h5py.Group, decimation_level_metadata: object | None = None, **kwargs)[source]
Bases:
mth5.groups.BaseGroupContainer for a single decimation level with multiple Fourier Coefficient channels.
This class manages Fourier Coefficient data organized by frequency, time, and channel. Data is assumed to be uniformly sampled in both frequency and time domains.
Hierarchy
FeatureDecimationGroup -> FeatureChannelDataset (multiple channels)
Data Assumptions
Data are uniformly sampled in frequency domain
Data are uniformly sampled in time domain
FFT moving window has uniform step size
- start time
Start time of the decimation level
- Type:
datetime
- end time
End time of the decimation level
- Type:
datetime
- channels
List of channel names in this decimation level
- Type:
list
- decimation_factor
Factor by which data was decimated
- Type:
int
- decimation_level
Level index in decimation hierarchy
- Type:
int
- decimation_sample_rate
Sample rate after decimation (Hz)
- Type:
float
- method
Method used (FFT, wavelet, etc.)
- Type:
str
- anti_alias_filter
Anti-aliasing filter used
- Type:
optional
- prewhitening_type
Type of prewhitening applied
- Type:
optional
- harmonics_kept
Harmonic indices kept in the data
- Type:
list or ‘all’
- window
Window parameters (length, overlap, type, sample rate)
- Type:
dict
- bands
Frequency bands in the data
- Type:
list
- param group:
HDF5 group object for this FeatureDecimationGroup.
- type group:
h5py.Group
- param decimation_level_metadata:
Metadata for the decimation level. Default is None.
- type decimation_level_metadata:
optional
- param **kwargs:
Additional keyword arguments passed to BaseGroup.
Examples
>>> decimation = FeatureDecimationGroup(h5_group, metadata) >>> channel = decimation.add_channel('Ex', fc_data=fc_array, fc_metadata=ch_metadata)
- property channel_summary: pandas.DataFrame[source]
Get a summary of all channels in this decimation level.
Returns a pandas DataFrame with detailed information about each Fourier Coefficient channel including time ranges, dimensions, and sampling rates.
- Returns:
DataFrame with columns:
- namestr
Channel name
- startdatetime64[ns]
Start time of the channel data
- enddatetime64[ns]
End time of the channel data
- n_frequencyint64
Number of frequency bins
- n_windowsint64
Number of time windows
- sample_rate_decimation_levelfloat64
Decimation level sample rate (Hz)
- sample_rate_window_stepfloat64
Sample rate of window stepping (Hz)
- unitsstr
Physical units of the data
- hdf5_referenceh5py.ref_dtype
HDF5 reference to the channel dataset
- Return type:
pd.DataFrame
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> summary = decimation.channel_summary >>> print(summary[['name', 'n_frequency', 'n_windows']])
- from_dataframe(df: pandas.DataFrame, channel_key: str, time_key: str = 'time', frequency_key: str = 'frequency') None[source]
Load Fourier Coefficient data from a pandas DataFrame.
Assumes the channel_key column contains complex coefficient values organized with time and frequency dimensions.
- Parameters:
df (pd.DataFrame) – Input DataFrame containing the coefficient data.
channel_key (str) – Name of the column containing coefficient values.
time_key (str, default='time') – Name of the time coordinate column.
frequency_key (str, default='frequency') – Name of the frequency coordinate column.
- Raises:
TypeError – If df is not a pandas DataFrame.
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> decimation.from_dataframe(df, channel_key='Ex', time_key='time')
- from_xarray(data_array: xarray.DataArray | xarray.Dataset, sample_rate_decimation_level: float) None[source]
Load Fourier Coefficient data from an xarray DataArray or Dataset.
Automatically extracts metadata (time, frequency, units) from the xarray object and creates appropriate FeatureChannelDataset instances for each variable or the single DataArray.
- Parameters:
data_array (xr.DataArray or xr.Dataset) – Input xarray object with ‘time’ and ‘frequency’ coordinates and dimensions [‘time’, ‘frequency’] (or transposed variant).
sample_rate_decimation_level (float) – Sample rate of the decimation level (Hz).
- Raises:
TypeError – If data_array is not an xarray Dataset or DataArray.
Notes
Automatically handles both (time, frequency) and (frequency, time) dimension ordering. Units are extracted from xarray attributes if available.
Examples
>>> import xarray as xr >>> import numpy as np >>> decimation = FeatureDecimationGroup(h5_group)
Create sample xarray data:
>>> times = np.arange('2023-01-01', '2023-01-02', dtype='datetime64[s]') >>> freqs = np.linspace(0.01, 100, 256) >>> data_array = np.random.randn(len(times), len(freqs)) + \ ... 1j * np.random.randn(len(times), len(freqs)) >>> xr_data = xr.DataArray( ... data_array, ... dims=['time', 'frequency'], ... coords={'time': times, 'frequency': freqs}, ... name='Ex', ... attrs={'units': 'mV/km'} ... )
Load into decimation group:
>>> decimation.from_xarray(xr_data, sample_rate_decimation_level=0.5)
- to_xarray(channels: list | None = None) xarray.Dataset[source]
Create an xarray Dataset from Fourier Coefficient channels.
If no channels are specified, all channels in the decimation level are included. Each channel becomes a data variable in the resulting Dataset.
- Parameters:
channels (list, optional) – List of channel names to include. If None, all channels are used. Default is None.
- Returns:
xarray Dataset with channels as data variables and ‘time’ and ‘frequency’ as shared coordinates.
- Return type:
xr.Dataset
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> xr_data = decimation.to_xarray() >>> print(xr_data.data_vars) Data variables: Ex (time, frequency) complex128 Ey (time, frequency) complex128
Get specific channels:
>>> subset = decimation.to_xarray(channels=['Ex', 'Ey'])
- from_numpy_array(nd_array: numpy.ndarray, ch_name: str | list) None[source]
Load Fourier Coefficient data from a numpy array.
Assumes array shape is either (n_frequencies, n_windows) for a single channel or (n_channels, n_frequencies, n_windows) for multiple channels.
- Parameters:
nd_array (np.ndarray) – Input numpy array containing coefficient data.
ch_name (str or list) – Channel name (for 2D array) or list of channel names (for 3D array).
- Raises:
TypeError – If nd_array is not a numpy ndarray.
ValueError – If array shape is not (n_frequencies, n_windows) or (n_channels, n_frequencies, n_windows).
Examples
>>> decimation = FeatureDecimationGroup(h5_group)
Load single channel:
>>> data_2d = np.random.randn(256, 100) + 1j * np.random.randn(256, 100) >>> decimation.from_numpy_array(data_2d, ch_name='Ex')
Load multiple channels:
>>> data_3d = np.random.randn(2, 256, 100) + 1j * np.random.randn(2, 256, 100) >>> decimation.from_numpy_array(data_3d, ch_name=['Ex', 'Ey'])
- add_channel(fc_name: str, fc_data: numpy.ndarray | xarray.DataArray | xarray.Dataset | pandas.DataFrame | None = None, fc_metadata: mt_metadata.features.FeatureDecimationChannel | None = None, max_shape: tuple = (None, None), chunks: bool = True, dtype: type = complex, **kwargs) mth5.groups.FeatureChannelDataset[source]
Add a Fourier Coefficient channel to the decimation level.
Creates a new FeatureChannelDataset for a single channel at a single decimation level. Input data can be provided as numpy array, xarray, DataFrame, or created empty.
- Parameters:
fc_name (str) – Name for the Fourier Coefficient channel.
fc_data (np.ndarray, xr.DataArray, xr.Dataset, pd.DataFrame, optional) – Input data. Can be numpy array (time, frequency) or xarray/DataFrame format. Default is None (creates empty dataset).
fc_metadata (FeatureDecimationChannel, optional) – Metadata for the channel. Default is None.
max_shape (tuple, default=(None, None)) – Maximum shape for HDF5 dataset dimensions (expandable if None).
chunks (bool, default=True) – Whether to use HDF5 chunking.
dtype (type, default=complex) – Data type for the dataset (e.g., complex, float, int).
**kwargs – Additional keyword arguments for HDF5 dataset creation.
- Returns:
Newly created FeatureChannelDataset object.
- Return type:
- Raises:
TypeError – If fc_data type is not supported or metadata type mismatch.
RuntimeError or OSError – If channel already exists (will return existing channel).
Notes
Data layout assumes (time, frequency) organization:
time index: window start times
frequency index: harmonic indices or float values
data: complex Fourier coefficients
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> metadata = FeatureDecimationChannel(name='Ex')
Create from numpy array:
>>> fc_data = np.random.randn(100, 256) + 1j * np.random.randn(100, 256) >>> channel = decimation.add_channel('Ex', fc_data=fc_data, fc_metadata=metadata)
Create empty channel (expandable):
>>> channel = decimation.add_channel('Ex', fc_metadata=metadata)
- get_channel(fc_name: str) mth5.groups.FeatureChannelDataset[source]
Retrieve a Fourier Coefficient channel by name.
- Parameters:
fc_name (str) – Name of the channel to retrieve.
- Returns:
The requested FeatureChannelDataset object.
- Return type:
- Raises:
MTH5Error – If the channel does not exist.
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> channel = decimation.get_channel('Ex') >>> data = channel.to_numpy()
- remove_channel(fc_name: str) None[source]
Remove a Fourier Coefficient channel from the decimation level.
Deletes the channel from the HDF5 file. Note that this removes the reference but does not reduce file size.
- Parameters:
fc_name (str) – Name of the channel to remove.
- Raises:
MTH5Error – If the channel does not exist.
Notes
To reduce HDF5 file size, copy desired data to a new file.
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> decimation.remove_channel('Ex')
- update_metadata() None[source]
Update metadata from all channels in the decimation level.
Scans all channels and updates the decimation-level metadata with aggregated information including time ranges and sampling rates.
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> decimation.update_metadata()
- add_weights(weight_name: str, weight_data: numpy.ndarray | None = None, weight_metadata: object | None = None, max_shape: tuple = (None, None, None), chunks: bool = True, **kwargs) None[source]
Add weight or masking data for Fourier Coefficients.
Creates a dataset to store weights or masks for quality control, frequency band selection, or time window filtering.
- Parameters:
weight_name (str) – Name for the weight dataset.
weight_data (np.ndarray, optional) – Weight values. Default is None.
weight_metadata (optional) – Metadata for the weight dataset. Default is None.
max_shape (tuple, default=(None, None, None)) – Maximum shape for expandable dimensions.
chunks (bool, default=True) – Whether to use HDF5 chunking.
**kwargs – Additional keyword arguments for HDF5 dataset creation.
Notes
Weight datasets can track:
weight_channel: Per-channel weights
weight_band: Per-frequency-band weights
weight_time: Per-time-window weights
This method is a placeholder for future implementation.
Examples
>>> decimation = FeatureDecimationGroup(h5_group) >>> decimation.add_weights('coherency_weights', weight_data=weights)