mth5.groups.channel_dataset
Created on Sat May 27 10:03:23 2023
@author: jpeacock
Attributes
Classes
A container for channel time series data stored in HDF5 format. |
|
Specialized container for electric field channel data. |
|
Specialized container for magnetic field channel data. |
|
Specialized container for auxiliary channel data. |
Module Contents
- class mth5.groups.channel_dataset.ChannelDataset(dataset: h5py.Dataset | None, dataset_metadata: mt_metadata.base.MetadataBase | None = None, write_metadata: bool = True, **kwargs: Any)[source]
A container for channel time series data stored in HDF5 format.
This class provides a flexible interface to work with magnetotelluric channel data, allowing conversion to various formats (xarray, pandas, numpy) while maintaining metadata integrity.
- Parameters:
dataset (h5py.Dataset or None) – HDF5 dataset object containing the channel time series data.
dataset_metadata (MetadataBase, optional) – Metadata container for Electric, Magnetic, or Auxiliary channel types. Default is None.
write_metadata (bool, optional) – Whether to write metadata to the HDF5 dataset on initialization. Default is True.
**kwargs (dict) – Additional keyword arguments to set as instance attributes.
- hdf5_dataset
Weak reference to the underlying HDF5 dataset.
- Type:
h5py.Dataset
- Raises:
MTH5Error – If the dataset is not of the correct type or metadata validation fails.
See also
ElectricDatasetSpecialized container for electric field channels.
MagneticDatasetSpecialized container for magnetic field channels.
AuxiliaryDatasetSpecialized container for auxiliary channels.
Examples
>>> from mth5 import mth5 >>> mth5_obj = mth5.MTH5() >>> mth5_obj.open_mth5(r"/test.mth5", mode='a') >>> run = mth5_obj.stations_group.get_station('MT001').get_run('MT001a') >>> channel = run.get_channel('Ex') >>> channel Channel Electric: ------------------- component: Ex data type: electric data format: float32 data shape: (4096,) start: 1980-01-01T00:00:00+00:00 end: 1980-01-01T00:00:01+00:00 sample rate: 4096
Access time series data
>>> ts_data = channel.to_channel_ts() >>> print(f"Mean: {ts_data.ts.mean():.2f}, Std: {ts_data.ts.std():.2f}")
Convert to xarray for time-based indexing
>>> xr_data = channel.to_xarray() >>> subset = xr_data.sel(time=slice('1980-01-01T00:00:00', '1980-01-01T00:00:10'))
- property run_metadata: mt_metadata.timeseries.Run[source]
Get the run-level metadata containing this channel.
- Returns:
Run metadata object with channel information included.
- Return type:
metadata.Run
Examples
>>> run_meta = channel.run_metadata >>> print(run_meta.id) 'MT001a' >>> print(run_meta.channels_recorded_electric) ['Ex', 'Ey']
- property station_metadata: mt_metadata.timeseries.Station[source]
Get the station-level metadata containing this channel.
- Returns:
Station metadata object with run and channel information.
- Return type:
metadata.Station
Examples
>>> station_meta = channel.station_metadata >>> print(f"{station_meta.id}: {station_meta.location.latitude}, {station_meta.location.longitude}") 'MT001: 40.5, -112.3'
- property survey_metadata: mt_metadata.timeseries.Survey[source]
Get the survey-level metadata containing this channel.
- Returns:
Complete survey metadata hierarchy including this channel.
- Return type:
metadata.Survey
Examples
>>> survey_meta = channel.survey_metadata >>> print(survey_meta.id) 'MT Survey 2023' >>> print(f"Stations: {len(survey_meta.stations)}") Stations: 15
- property survey_id: str[source]
Get the survey identifier.
- Returns:
Survey ID string.
- Return type:
str
Examples
>>> print(channel.survey_id) 'MT_Survey_2023'
- property channel_response: mt_metadata.timeseries.filters.ChannelResponse[source]
Get the complete channel response from applied filters.
Constructs a ChannelResponse object by retrieving all filters referenced in the channel metadata from the survey’s Filters group.
- Returns:
Channel response object containing all applied filters in sequence.
- Return type:
ChannelResponse
Notes
Filters are applied in the order specified by their sequence_number. Filter names are normalized by replacing ‘/’ with ‘ per ‘ and converting to lowercase.
Examples
>>> response = channel.channel_response >>> print(f"Number of filters: {len(response.filters_list)}") Number of filters: 3 >>> for filt in response.filters_list: ... print(f"{filt.name}: {filt.type}") zpk: zpk coefficient: coefficient time delay: time_delay
- property start: mt_metadata.common.mttime.MTime[source]
Get the start time of the channel data.
- Returns:
Start time from metadata.time_period.start.
- Return type:
MTime
Examples
>>> print(channel.start) 1980-01-01T00:00:00+00:00 >>> print(channel.start.iso_str) '1980-01-01T00:00:00.000000+00:00'
- property end: mt_metadata.common.mttime.MTime[source]
Calculate the end time based on start time, sample rate, and number of samples.
- Returns:
Calculated end time of the data.
- Return type:
MTime
Notes
End time is calculated as: start + (n_samples - 1) / sample_rate The -1 ensures the last sample falls exactly at the end time.
Examples
>>> print(f"Duration: {channel.end - channel.start} seconds") Duration: 3600.0 seconds >>> print(channel.end.iso_str) '1980-01-01T01:00:00.000000+00:00'
- property sample_rate: float[source]
Get the sample rate in samples per second.
- Returns:
Sample rate in Hz.
- Return type:
float
Examples
>>> print(f"Sample rate: {channel.sample_rate} Hz") Sample rate: 256.0 Hz
- property n_samples: int[source]
Get the total number of samples in the dataset.
- Returns:
Number of data points in the time series.
- Return type:
int
Examples
>>> print(f"Total samples: {channel.n_samples:,}") Total samples: 921,600 >>> duration = channel.n_samples / channel.sample_rate >>> print(f"Duration: {duration/3600:.1f} hours") Duration: 1.0 hours
- property time_index: pandas.DatetimeIndex[source]
Create a time index for the dataset based on metadata.
- Returns:
Pandas datetime index spanning the entire dataset.
- Return type:
pd.DatetimeIndex
Notes
The time index is useful for time-based queries and slicing operations. It is generated dynamically from start time, sample rate, and number of samples.
Examples
>>> time_idx = channel.time_index >>> print(time_idx[0], time_idx[-1]) 1980-01-01 00:00:00 1980-01-01 00:59:59.996093750 >>> print(f"Index length: {len(time_idx)}") Index length: 921600
- read_metadata() None[source]
Read metadata from HDF5 attributes into the metadata container.
Loads all HDF5 attributes from the dataset and converts them to the appropriate Python types before populating the metadata object.
For older MTH5 files, this method attempts to coerce values to the expected types based on the metadata schema to maintain backwards compatibility.
Notes
This method automatically validates metadata through the metadata container’s validators. Type coercion is applied to handle older file formats that may have stored metadata with different types.
Examples
>>> channel.read_metadata() >>> print(channel.metadata.component) 'Ex' >>> print(channel.metadata.sample_rate) 256.0
Handles type coercion for older files
>>> # If sample_rate was stored as string '256.0' in old file >>> channel.read_metadata() >>> print(type(channel.metadata.sample_rate)) <class 'float'>
- write_metadata() None[source]
Write metadata from the container to HDF5 dataset attributes.
Converts all metadata values to numpy-compatible types before writing to HDF5 attributes. Falls back to string conversion if direct conversion fails.
Notes
This method is automatically called during initialization and when metadata is updated.
Examples
>>> channel.metadata.component = 'Ey' >>> channel.metadata.measurement_azimuth = 90.0 >>> channel.write_metadata()
- replace_dataset(new_data_array: numpy.ndarray) None[source]
Replace the entire dataset with new data.
- Parameters:
new_data_array (np.ndarray) – New data array with shape (npts,). Must be 1-dimensional.
- Raises:
TypeError – If new_data_array cannot be converted to numpy array.
Notes
The HDF5 dataset will be resized if the new array has a different shape. All existing data will be overwritten.
Examples
Replace with synthetic data
>>> import numpy as np >>> new_data = np.sin(2 * np.pi * 1.0 * np.linspace(0, 10, 2560)) >>> channel.replace_dataset(new_data) >>> print(f"New shape: {channel.hdf5_dataset.shape}") New shape: (2560,)
Replace with processed data
>>> original = channel.hdf5_dataset[:] >>> filtered = np.convolve(original, np.ones(5)/5, mode='same') >>> channel.replace_dataset(filtered)
- extend_dataset(new_data_array: numpy.ndarray, start_time: str | mt_metadata.common.mttime.MTime, sample_rate: float, fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) None[source]
Extend or prepend data to the existing dataset with gap handling.
Intelligently adds new data before, after, or within the existing time series. Handles time alignment, overlaps, and gaps with configurable fill strategies.
- Parameters:
new_data_array (np.ndarray) – New data array with shape (npts,).
start_time (str or MTime) – Start time of the new data array in UTC.
sample_rate (float) – Sample rate of the new data array in Hz. Must match existing sample rate.
fill (str, float, int, or None, optional) –
Strategy for filling data gaps:
None : Raise MTH5Error if gap exists (default)
’mean’ : Fill with mean of both datasets within fill_window
’median’ : Fill with median of both datasets within fill_window
’nan’ : Fill with NaN values
numeric value : Fill with specified constant
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Exceeding this raises MTH5Error. Default is 1 second.
fill_window (int, optional) – Number of points from each dataset edge to estimate fill values. Default is 10 points.
- Raises:
MTH5Error – If sample rates don’t match, gap exceeds max_gap_seconds, or fill strategy is invalid.
TypeError – If new_data_array cannot be converted to numpy array.
Notes
Prepend: New data start < existing start
Append: New data start > existing end
Overwrite: New data overlaps existing data
The dataset is automatically resized to accommodate new data.
Examples
Append data with a small gap
>>> ex = mth5_obj.get_channel('MT001', 'MT001a', 'Ex') >>> print(f"Original: {ex.n_samples} samples, ends {ex.end}") Original: 4096 samples, ends 2015-01-08T19:32:09.500000+00:00 >>> new_data = np.random.randn(4096) >>> new_start = (ex.end + 0.5).isoformat() # 0.5s gap >>> ex.extend_dataset(new_data, new_start, ex.sample_rate, ... fill='median', max_gap_seconds=2) >>> print(f"Extended: {ex.n_samples} samples, ends {ex.end}") Extended: 8200 samples, ends 2015-01-08T19:40:42.500000+00:00
Prepend data seamlessly
>>> prepend_data = np.random.randn(2048) >>> prepend_start = (ex.start - 2048/ex.sample_rate).isoformat() >>> ex.extend_dataset(prepend_data, prepend_start, ex.sample_rate) >>> print(f"New start: {ex.start}")
Overwrite section of existing data
>>> replacement_data = np.zeros(1024) >>> replace_start = (ex.start + 1.0).isoformat() # 1s after start >>> ex.extend_dataset(replacement_data, replace_start, ex.sample_rate)
- has_data() bool[source]
Check if the channel contains non-zero data.
- Returns:
True if dataset has non-zero values, False if all zeros or empty.
- Return type:
bool
Examples
>>> if channel.has_data(): ... print("Channel has valid data") ... else: ... print("Channel is empty or all zeros") Channel has valid data
>>> empty_channel.has_data() False
- to_channel_ts() mth5.timeseries.ChannelTS[source]
Convert the dataset to a ChannelTS object with full metadata.
- Returns:
Time series object with data, metadata, and channel response.
- Return type:
Notes
Data is loaded into memory. The resulting ChannelTS object is independent of the HDF5 file and can be modified without affecting the original dataset.
Examples
>>> ts = channel.to_channel_ts() >>> print(f"Type: {type(ts)}") Type: <class 'mth5.timeseries.channel_ts.ChannelTS'> >>> print(f"Shape: {ts.ts.shape}, Mean: {ts.ts.mean():.2f}") Shape: (4096,), Mean: 0.15
Process the time series
>>> filtered_ts = ts.low_pass_filter(cutoff=10.0) >>> detrended_ts = ts.detrend('linear') >>> ts.plot()
- to_xarray() xarray.DataArray[source]
Convert the dataset to an xarray DataArray with time coordinates.
- Returns:
DataArray with time index and metadata as attributes.
- Return type:
xr.DataArray
Notes
Data is loaded into memory. Metadata is stored in the attrs dictionary and will not be validated if modified.
Examples
>>> xr_data = channel.to_xarray() >>> print(xr_data) <xarray.DataArray (time: 4096)> array([0.931, 0.142, ..., 0.882]) Coordinates: * time (time) datetime64[ns] 1980-01-01 ... 1980-01-01T00:00:15.996 .. attribute:: component
Ex
- ...
Use xarray’s powerful selection
>>> morning = xr_data.sel(time=slice('1980-01-01T06:00', '1980-01-01T12:00')) >>> daily_mean = xr_data.resample(time='1D').mean() >>> xr_data.plot()
- to_dataframe() pandas.DataFrame[source]
Convert the dataset to a pandas DataFrame with time index.
- Returns:
DataFrame with ‘data’ column and time index. Metadata stored in attrs.
- Return type:
pd.DataFrame
Notes
Data is loaded into memory. Metadata is stored in the experimental attrs attribute and will not be validated if modified.
Examples
>>> df = channel.to_dataframe() >>> print(df.head()) data time 1980-01-01 00:00:00 0.931 1980-01-01 00:00:00 0.142 ...
Use pandas operations
>>> df['data'].describe() >>> df.resample('1H').mean() >>> df.plot(y='data', figsize=(12, 4))
Access metadata
>>> print(df.attrs['component']) 'Ex' >>> print(df.attrs['sample_rate']) 256.0
- to_numpy() numpy.recarray[source]
Convert the dataset to a numpy structured array with time and data columns.
- Returns:
Record array with ‘time’ and ‘channel_data’ fields.
- Return type:
np.recarray
Notes
Data is loaded into memory. The ‘data’ name is avoided as it’s a builtin to numpy.
Examples
>>> arr = channel.to_numpy() >>> print(arr.dtype.names) ('time', 'channel_data') >>> print(arr['time'][0]) 1980-01-01T00:00:00.000000000 >>> print(arr['channel_data'].mean()) 0.152
Access fields
>>> times = arr['time'] >>> data = arr['channel_data'] >>> import matplotlib.pyplot as plt >>> plt.plot(times, data)
- from_channel_ts(channel_ts_obj: mth5.timeseries.ChannelTS, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) None[source]
Populate the dataset from a ChannelTS object.
- Parameters:
channel_ts_obj (ChannelTS) – Time series object containing data and metadata.
how ({'replace', 'extend'}, optional) –
Method for adding data:
’replace’ : Replace entire dataset (default)
’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
None : Raise error on gaps (default)
’mean’ : Fill with mean of both datasets
’median’ : Fill with median of both datasets
’nan’ : Fill with NaN
numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.
- Raises:
TypeError – If channel_ts_obj is not a ChannelTS instance.
MTH5Error – If time alignment or metadata validation fails.
Examples
Replace entire dataset
>>> from mth5.timeseries import ChannelTS >>> import numpy as np >>> ts = ChannelTS( ... channel_type='electric', ... data=np.random.randn(1000), ... channel_metadata={'electric': { ... 'component': 'ex', ... 'sample_rate': 256.0 ... }} ... ) >>> channel.from_channel_ts(ts, how='replace') >>> print(channel.n_samples) 1000
Extend existing dataset
>>> new_ts = ChannelTS( ... channel_type='electric', ... data=np.random.randn(500), ... channel_metadata={'electric': { ... 'component': 'ex', ... 'sample_rate': 256.0, ... 'time_period.start': channel.end.isoformat() ... }} ... ) >>> channel.from_channel_ts(new_ts, how='extend', fill='median') >>> print(channel.n_samples) 1500
- from_xarray(data_array: xarray.DataArray, how: str = 'replace', fill: str | float | int | None = None, max_gap_seconds: float | int = 1, fill_window: int = 10) None[source]
Populate the dataset from an xarray DataArray.
- Parameters:
data_array (xr.DataArray) – DataArray with time coordinate and metadata in attrs.
how ({'replace', 'extend'}, optional) –
Method for adding data:
’replace’ : Replace entire dataset (default)
’extend’ : Append/prepend to existing data with gap handling
fill (str, float, int, or None, optional) –
Gap filling strategy (only used with how=’extend’):
None : Raise error on gaps (default)
’mean’ : Fill with mean of both datasets
’median’ : Fill with median of both datasets
’nan’ : Fill with NaN
numeric : Fill with constant value
max_gap_seconds (float or int, optional) – Maximum allowed gap in seconds. Default is 1.
fill_window (int, optional) – Points to use for estimating fill values. Default is 10.
- Raises:
TypeError – If data_array is not an xarray.DataArray.
MTH5Error – If time alignment fails.
Examples
Replace from xarray
>>> import xarray as xr >>> import numpy as np >>> import pandas as pd >>> time = pd.date_range('2020-01-01', periods=1000, freq='0.004S') >>> data = xr.DataArray( ... np.random.randn(1000), ... coords=[('time', time)], ... attrs={'component': 'ex', 'sample_rate': 256.0} ... ) >>> channel.from_xarray(data, how='replace') >>> print(channel.n_samples) 1000
Extend from xarray with gap
>>> time2 = pd.date_range('2020-01-01T00:00:05', periods=500, freq='0.004S') >>> data2 = xr.DataArray(np.random.randn(500), coords=[('time', time2)]) >>> channel.from_xarray(data2, how='extend', fill='mean')
- property channel_entry: numpy.ndarray[source]
Create a structured array entry for channel summary tables.
- Returns:
Structured array with dtype=CHANNEL_DTYPE containing channel metadata and HDF5 references for survey-wide summaries.
- Return type:
np.ndarray
Notes
This entry includes survey ID, station ID, run ID, location, component, time period, sample rate, and HDF5 references for navigation.
Examples
>>> entry = channel.channel_entry >>> print(entry['component'][0]) 'Ex' >>> print(entry['sample_rate'][0]) 256.0 >>> print(entry['station'][0]) 'MT001'
- time_slice(start: str | mt_metadata.common.mttime.MTime, end: str | mt_metadata.common.mttime.MTime | None = None, n_samples: int | None = None, return_type: str = 'channel_ts') mth5.timeseries.ChannelTS | xarray.DataArray | pandas.DataFrame | numpy.ndarray[source]
Extract a time slice from the channel dataset.
- Parameters:
start (str or MTime) – Start time of the slice in UTC.
end (str or MTime, optional) – End time of the slice. Mutually exclusive with n_samples.
n_samples (int, optional) – Number of samples to extract. Mutually exclusive with end.
return_type ({'channel_ts', 'xarray', 'pandas', 'numpy'}, optional) – Format for returned data. Default is ‘channel_ts’.
- Returns:
Time slice in the requested format with appropriate metadata.
- Return type:
ChannelTS or xr.DataArray or pd.DataFrame or np.ndarray
- Raises:
ValueError – If both end and n_samples are provided or neither is provided.
Notes
If the requested slice extends beyond available data, it will be automatically truncated with a warning.
Regional HDF5 references are used when possible for efficiency.
Examples
Extract by number of samples
>>> ex = mth5_obj.get_channel('FL001', 'FL001a', 'Ex') >>> ex_slice = ex.time_slice("2015-01-08T19:49:15", n_samples=4096) >>> print(type(ex_slice)) <class 'mth5.timeseries.channel_ts.ChannelTS'> >>> print(f"Slice shape: {ex_slice.ts.shape}") Slice shape: (4096,) >>> ex_slice.plot()
Extract by time range
>>> ex_slice = ex.time_slice( ... "2015-01-08T19:49:15", ... end="2015-01-08T20:49:15" ... ) >>> print(f"Duration: {ex_slice.end - ex_slice.start} seconds") Duration: 3600.0 seconds
Return as xarray for analysis
>>> xr_slice = ex.time_slice( ... "2015-01-08T19:49:15", ... n_samples=1000, ... return_type='xarray' ... ) >>> print(xr_slice.mean().values) 0.152 >>> xr_slice.plot()
Return as pandas for tabular ops
>>> df_slice = ex.time_slice( ... "2015-01-08T19:49:15", ... n_samples=500, ... return_type='pandas' ... ) >>> df_slice['data'].describe() >>> df_slice.resample('10S').mean()
Return as numpy for computation
>>> np_slice = ex.time_slice( ... "2015-01-08T19:49:15", ... n_samples=100, ... return_type='numpy' ... ) >>> np.fft.fft(np_slice)
- get_index_from_time(given_time: str | mt_metadata.common.mttime.MTime) int[source]
Calculate the array index for a given time.
- Parameters:
given_time (str or MTime) – Time to convert to index.
- Returns:
Array index corresponding to the given time.
- Return type:
int
Notes
Index is calculated as: (time - start_time) * sample_rate and rounded to nearest integer.
Examples
>>> idx = channel.get_index_from_time('1980-01-01T00:00:10') >>> print(f"Index for 10 seconds: {idx}") Index for 10 seconds: 2560 >>> # With 256 Hz sample rate: 10 * 256 = 2560
>>> start_idx = channel.get_index_from_time(channel.start) >>> print(start_idx) 0
- get_index_from_end_time(given_time: str | mt_metadata.common.mttime.MTime) int[source]
Get the end index value (inclusive) for a given time.
- Parameters:
given_time (str or MTime) – Time to convert to end index.
- Returns:
Array index + 1 for inclusive slicing.
- Return type:
int
Notes
Adds 1 to the calculated index to make it suitable for inclusive end slicing (e.g., array[start:end]).
Examples
>>> end_idx = channel.get_index_from_end_time('1980-01-01T00:00:10') >>> data_slice = channel.hdf5_dataset[0:end_idx] >>> # Includes sample at exactly 10 seconds
- class mth5.groups.channel_dataset.ElectricDataset(group: h5py.Dataset, **kwargs: Any)[source]
Bases:
ChannelDatasetSpecialized container for electric field channel data.
Inherits all functionality from ChannelDataset with electric field specific metadata handling.
- Parameters:
group (h5py.Dataset) – HDF5 dataset containing electric field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.
Examples
>>> ex_dataset = run_group.get_channel('Ex') >>> print(type(ex_dataset)) <class 'mth5.groups.channel_dataset.ElectricDataset'> >>> print(ex_dataset.metadata.type) 'electric' >>> print(ex_dataset.metadata.units) 'mV/km'
- class mth5.groups.channel_dataset.MagneticDataset(group: h5py.Dataset, **kwargs: Any)[source]
Bases:
ChannelDatasetSpecialized container for magnetic field channel data.
Inherits all functionality from ChannelDataset with magnetic field specific metadata handling.
- Parameters:
group (h5py.Dataset) – HDF5 dataset containing magnetic field data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.
Examples
>>> hx_dataset = run_group.get_channel('Hx') >>> print(type(hx_dataset)) <class 'mth5.groups.channel_dataset.MagneticDataset'> >>> print(hx_dataset.metadata.type) 'magnetic' >>> print(hx_dataset.metadata.units) 'nT'
- class mth5.groups.channel_dataset.AuxiliaryDataset(group: h5py.Dataset, **kwargs: Any)[source]
Bases:
ChannelDatasetSpecialized container for auxiliary channel data.
Inherits all functionality from ChannelDataset with auxiliary channel specific metadata handling. Used for temperature, battery voltage, etc.
- Parameters:
group (h5py.Dataset) – HDF5 dataset containing auxiliary data.
**kwargs (dict) – Additional keyword arguments passed to ChannelDataset.
Examples
>>> temp_dataset = run_group.get_channel('Temperature') >>> print(type(temp_dataset)) <class 'mth5.groups.channel_dataset.AuxiliaryDataset'> >>> print(temp_dataset.metadata.type) 'auxiliary' >>> print(temp_dataset.metadata.units) 'celsius'