mth5.groups.fc_dataset

Created on Thu Mar 10 09:02:16 2022

@author: jpeacock

Classes

FCChannelDataset

Container for Fourier coefficients (FC) from windowed FFT analysis.

Module Contents

class mth5.groups.fc_dataset.FCChannelDataset(dataset: h5py.Dataset, dataset_metadata: mt_metadata.processing.fourier_coefficients.FCChannel | None = None, **kwargs: Any)[source]

Container for Fourier coefficients (FC) from windowed FFT analysis.

Holds multi-dimensional Fourier coefficient data representing time-frequency analysis results. Data is uniformly sampled in both frequency (via harmonic index) and time (via uniform FFT window step size).

Parameters:
  • dataset (h5py.Dataset) – HDF5 dataset containing the Fourier coefficient data.

  • dataset_metadata (FCChannel | None, optional) – Metadata object containing FC channel properties like start time, end time, sample rates, units, and frequency method. If provided, metadata will be written to HDF5 attributes. Defaults to None.

  • **kwargs (Any) – Additional keyword arguments (reserved for future use).

hdf5_dataset

Weak reference to the HDF5 dataset.

Type:

h5py.Dataset

metadata[source]

Metadata container for the Fourier coefficients.

Type:

FCChannel

logger[source]

Logger instance for reporting messages.

Type:

loguru.logger

Raises:
  • MTH5Error – If dataset_metadata is provided but is not of type FCChannel.

  • TypeError – If input data cannot be converted to numpy array or has incompatible dtype/shape.

Notes

The data array has shape (n_windows, n_frequencies) where: - n_windows: Number of time windows in the FFT moving window analysis - n_frequencies: Number of frequency bins determined by window size

Data is typically complex-valued representing Fourier coefficients. Time windows are uniformly spaced with interval 1/sample_rate_window_step. Frequencies are uniformly spaced from frequency_min to frequency_max.

Metadata includes: - Time period (start and end) - Acquisition and decimated sample rates - Window sample rate (delta_t within window) - Units - Frequency method (integer harmonic index calculation) - Component name (channel designation)

Examples

Create an FC dataset from HDF5 group:

>>> import h5py
>>> import numpy as np
>>> from mt_metadata.processing.fourier_coefficients import FCChannel
>>> with h5py.File('fc.h5', 'w') as f:
...     # Create 2D array: 50 time windows, 256 frequencies
...     data = np.random.rand(50, 256) + 1j * np.random.rand(50, 256)
...     dset = f.create_dataset('Ex', data=data, dtype=np.complex128)
...     # Create FCChannelDataset
...     fc = FCChannelDataset(dset, write_metadata=True)

Convert to xarray and access time-frequency data:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)  # ('time', 'frequency')
>>> # Access data at specific time and frequency
>>> subset = xr_data.sel(time='2023-01-01T12:00:00', method='nearest')

Inspect properties:

>>> print(f"Windows: {fc.n_windows}, Frequencies: {fc.n_frequencies}")
>>> print(f"Frequency range: {fc.frequency.min():.2f}-{fc.frequency.max():.2f} Hz")
logger[source]
metadata[source]
read_metadata() None[source]

Read metadata from HDF5 attributes into metadata container.

Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.

Return type:

None

Notes

This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.

Examples

Reload metadata from HDF5 after external modification:

>>> # Metadata was modified in HDF5
>>> fc.read_metadata()  # Reload changes
>>> print(fc.metadata.component)  # Access updated component
write_metadata() None[source]

Write metadata from container to HDF5 dataset attributes.

Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility. Always ensures ‘mth5_type’ attribute is set to ‘FCChannel’.

Return type:

None

Notes

All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates. Read-only files will silently skip writes.

Examples

Save updated metadata to HDF5:

>>> fc.metadata.component = "Ey"
>>> fc.write_metadata()  # Persist to file
>>> # Verify write
>>> print(fc.hdf5_dataset.attrs['component'])
b'Ey'
property n_windows: int[source]

Number of time windows in the FFT analysis.

Returns:

Number of time windows (first dimension of data array).

Return type:

int

Notes

This corresponds to the number of rows in the 2D spectrogram data. Each window represents a uniform time interval determined by the window step size (1/sample_rate_window_step).

Examples

>>> print(f"Time windows: {fc.n_windows}")
Time windows: 50
property time: numpy.ndarray[source]

Time array including the start of each time window.

Generates uniformly spaced time coordinates based on the start time, window step rate, and number of windows. Uses metadata time period to determine bounds.

Returns:

Array of datetime64 values for each window start time.

Return type:

np.ndarray

Notes

Time coordinates are generated using make_dt_coordinates, which ensures consistency between specified start/end times and the number of windows.

Examples

Access time array for time-based indexing:

>>> time_array = fc.time
>>> print(time_array.shape)  # (n_windows,)
>>> print(time_array[0])  # First window time
2023-01-01T00:00:00.000000
property n_frequencies: int[source]

Number of frequency bins in the Fourier analysis.

Returns:

Number of frequency bins (second dimension of data array).

Return type:

int

Notes

This corresponds to the number of columns in the 2D spectrogram data. Determined by the FFT window size and relates to the frequency resolution of the analysis.

Examples

>>> print(f"Frequency bins: {fc.n_frequencies}")
Frequency bins: 256
property frequency: numpy.ndarray[source]

Frequency array from metadata frequency bounds.

Generates uniformly spaced frequency coordinates based on the metadata frequency range and number of frequency bins.

Returns:

Array of frequency values, linearly spaced from frequency_min to frequency_max.

Return type:

np.ndarray

Notes

Frequencies represent harmonic indices or actual frequency values depending on the frequency method specified in metadata. Spacing is determined by n_frequencies bins over the range.

Examples

Access frequency array for frequency-based indexing:

>>> freq_array = fc.frequency
>>> print(freq_array.shape)  # (n_frequencies,)
>>> print(f"Frequency range: {freq_array.min():.2f} to {freq_array.max():.2f} Hz")
Frequency range: 0.00 to 64.00 Hz
replace_dataset(new_data_array: numpy.ndarray) None[source]

Replace entire dataset with new data.

Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.

Parameters:

new_data_array (np.ndarray) – New FC data to store. Should have shape (n_windows, n_frequencies) and typically complex-valued.

Return type:

None

Raises:

TypeError – If input cannot be converted to numpy array.

Notes

If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.

Examples

Replace FC data with new analysis results:

>>> import numpy as np
>>> new_fc = np.random.rand(30, 256) + 1j * np.random.rand(30, 256)
>>> fc.replace_dataset(new_fc)
>>> print(fc.to_numpy().shape)
(30, 256)

Replace with data from list (auto-converted to array):

>>> data_list = [[[1+1j, 2+2j]], [[3+3j, 4+4j]]] * 15
>>> fc.replace_dataset(data_list)
>>> fc.to_numpy().shape
(30, 2)
to_xarray() xarray.DataArray[source]

Convert FC data to xarray DataArray.

Creates an xarray DataArray with proper coordinates for time and frequency. Includes metadata as attributes.

Returns:

DataArray with dimensions (time, frequency) and coordinates from metadata and computed properties.

Return type:

xr.DataArray

Notes

Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.

Examples

Convert to xarray with automatic coordinates:

>>> xr_data = fc.to_xarray()
>>> print(xr_data.dims)
('time', 'frequency')
>>> print(xr_data.shape)
(50, 256)

Select data by time and frequency range:

>>> subset = xr_data.sel(
...     time=slice('2023-01-01T00:00:00', '2023-01-01T12:00:00'),
...     frequency=slice(0, 10)
... )
>>> print(subset.shape)  # Subset shape
to_numpy() numpy.ndarray[source]

Convert FC data to numpy array.

Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.

Returns:

2D complex array with shape (n_windows, n_frequencies).

Return type:

np.ndarray

Notes

For large spectrograms, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access to subsets.

Examples

Get full FC data as numpy array:

>>> data = fc.to_numpy()
>>> print(data.shape)
(50, 256)
>>> print(data.dtype)
complex128

Access specific time window and frequency:

>>> data = fc.to_numpy()
>>> # Get first 10 windows, frequency bin 100
>>> subset = data[:10, 100]
>>> print(subset.shape)
(10,)
from_numpy(new_estimate: numpy.ndarray) None[source]

Load FC data from numpy array.

Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.

Parameters:

new_estimate (np.ndarray) – FC data to load. Should have shape (n_windows, n_frequencies). Typically complex-valued array.

Return type:

None

Raises:

TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.

Notes

‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match. Dtype compatibility is strictly enforced.

Examples

Load FC data from numpy array:

>>> import numpy as np
>>> new_data = np.random.rand(25, 128) + 1j * np.random.rand(25, 128)
>>> fc.from_numpy(new_data)
>>> print(fc.to_numpy().shape)
(25, 128)

Load with magnitude and phase separation:

>>> magnitude = np.random.rand(20, 256)
>>> phase = np.random.rand(20, 256) * 2 * np.pi
>>> fc_data = magnitude * np.exp(1j * phase)
>>> fc.from_numpy(fc_data)
from_xarray(data: xarray.DataArray, sample_rate_decimation_level: int | float) None[source]

Load FC data from xarray DataArray.

Updates metadata from xarray coordinates and attributes, then stores the data. Computes frequency and time parameters from the provided xarray object.

Parameters:
  • data (xr.DataArray) – DataArray containing FC data. Expected dimensions: (time, frequency).

  • sample_rate_decimation_level (int | float) – Decimation level applied to original sample rate. Used to track processing history.

Return type:

None

Notes

This will update time_period (start/end), frequency bounds, window step rate, decimation level, component name, and units from the xarray object. All changes are persisted to HDF5.

Examples

Load FC data from modified xarray:

>>> xr_data = fc.to_xarray()
>>> # Modify data (e.g., apply filter)
>>> modified = xr_data * np.hamming(256)  # Apply frequency window
>>> fc.from_xarray(modified, sample_rate_decimation_level=4)
>>> print(fc.metadata.sample_rate_decimation_level)
4

Load with updated metadata from another analysis:

>>> import xarray as xr
>>> import pandas as pd
>>> time_coords = pd.date_range('2023-01-01', periods=30, freq='1H')
>>> freq_coords = np.arange(0, 128)
>>> new_fc = xr.DataArray(
...     data=np.random.rand(30, 128) + 1j * np.random.rand(30, 128),
...     coords={'time': time_coords, 'frequency': freq_coords},
...     dims=['time', 'frequency'],
...     name='Ey',
...     attrs={'units': 'mV/km'}
... )
>>> fc.from_xarray(new_fc, sample_rate_decimation_level=1)
>>> print(fc.metadata.component)
Ey