mth5.groups.estimate_dataset
Created on Thu Mar 10 09:02:16 2022
@author: jpeacock
Classes
Container for statistical estimates of transfer functions. |
Module Contents
- class mth5.groups.estimate_dataset.EstimateDataset(dataset: h5py.Dataset, dataset_metadata: mt_metadata.transfer_functions.tf.statistical_estimate.StatisticalEstimate | None = None, write_metadata: bool = True, **kwargs: Any)[source]
Container for statistical estimates of transfer functions.
This class holds multi-dimensional statistical estimates for transfer functions with full metadata management. Estimates are stored as HDF5 datasets with dimensions for period, output channels, and input channels.
- Parameters:
dataset (h5py.Dataset) – HDF5 dataset containing the statistical estimate data.
dataset_metadata (mt_metadata.transfer_functions.tf.StatisticalEstimate, optional) – Metadata object for the estimate. If provided and write_metadata is True, the metadata will be written to the HDF5 attributes. Defaults to None.
write_metadata (bool, optional) – If True, write metadata to the HDF5 dataset attributes. Defaults to True.
**kwargs (Any) – Additional keyword arguments (reserved for future use).
- hdf5_dataset
Weak reference to the HDF5 dataset.
- Type:
h5py.Dataset
- Raises:
MTH5Error – If dataset_metadata is provided but is not of type StatisticalEstimate or a compatible metadata class.
TypeError – If input data cannot be converted to numpy array or has wrong dtype/shape.
Notes
The estimate data is stored in 3D form with shape: (n_periods, n_output_channels, n_input_channels)
Metadata is automatically synchronized between the pydantic model and HDF5 attributes on initialization and after any modifications.
Examples
Create an estimate dataset from an HDF5 group:
>>> import h5py >>> import numpy as np >>> from mt_metadata.transfer_functions.tf.statistical_estimate import StatisticalEstimate >>> # Create HDF5 file with estimate dataset >>> with h5py.File('estimate.h5', 'w') as f: ... # Create dataset with shape (10 periods, 2 outputs, 2 inputs) ... data = np.random.rand(10, 2, 2) ... dset = f.create_dataset('estimate', data=data) ... # Create EstimateDataset ... est = EstimateDataset(dset, write_metadata=True)
Convert estimate to xarray and back:
>>> periods = np.logspace(-3, 3, 10) # 10 periods from 1e-3 to 1e3 s >>> xr_data = est.to_xarray(periods) >>> # Modify xarray coordinates >>> new_xr = xr_data.rename({'output': 'new_output', 'input': 'new_input'}) >>> est.from_xarray(new_xr) # Load modified data back
Access estimate data in different formats:
>>> # Get numpy array >>> np_data = est.to_numpy() >>> print(np_data.shape) # (10, 2, 2) >>> # Get xarray with proper coordinates >>> xr_data = est.to_xarray(periods) >>> print(xr_data.dims) # ('period', 'output', 'input')
- read_metadata() None[source]
Read metadata from HDF5 attributes into metadata container.
Reads all attributes from the HDF5 dataset and loads them into the internal metadata object for validation and access.
- Return type:
None
Notes
This is automatically called during initialization if ‘mth5_type’ attribute exists in the HDF5 dataset.
Examples
Reload metadata from HDF5 after external modification:
>>> # Metadata was modified in HDF5 >>> est.read_metadata() # Reload changes >>> print(est.metadata.name) # Access updated name
- write_metadata() None[source]
Write metadata from container to HDF5 dataset attributes.
Converts the pydantic metadata model to a dictionary and writes each field as an HDF5 attribute. Values are converted to appropriate numpy types for compatibility.
- Return type:
None
Notes
All existing attributes with the same names will be overwritten. This is called automatically during initialization and after metadata updates.
Examples
Save updated metadata to HDF5:
>>> est.metadata.name = "Updated Estimate" >>> est.write_metadata() # Persist to file >>> # Verify write >>> print(est.hdf5_dataset.attrs['name']) b'Updated Estimate'
- replace_dataset(new_data_array: numpy.ndarray) None[source]
Replace entire dataset with new data.
Resizes the HDF5 dataset if necessary and replaces all data. Converts input to numpy array if needed.
- Parameters:
new_data_array (np.ndarray) – New estimate data to store. Should have shape (n_periods, n_output_channels, n_input_channels).
- Return type:
None
- Raises:
TypeError – If input cannot be converted to numpy array.
Notes
If new data has different shape, HDF5 dataset will be resized. This is generally safe but may fragment the HDF5 file.
Examples
Replace estimate with new data:
>>> import numpy as np >>> new_estimate = np.random.rand(10, 2, 2) # 10 periods, 2 channels >>> est.replace_dataset(new_estimate) >>> print(est.to_numpy().shape) (10, 2, 2)
Replace with data from list (auto-converted to array):
>>> data_list = [[[1, 2], [3, 4]]] * 5 # 5 periods >>> est.replace_dataset(data_list) >>> est.to_numpy().shape (5, 2, 2)
- to_xarray(period: numpy.ndarray | list) xarray.DataArray[source]
Convert estimate to xarray DataArray.
Creates an xarray DataArray with proper coordinates for periods, output channels, and input channels. Includes metadata as attributes.
- Parameters:
period (np.ndarray | list) – Period values for coordinate. Should have length equal to estimate first dimension (n_periods).
- Returns:
DataArray with dimensions (period, output, input) and coordinates from metadata.
- Return type:
xr.DataArray
Notes
Metadata changes in xarray are not validated and will not be synchronized back to HDF5 without explicit call to from_xarray(). Data is loaded entirely into memory.
Examples
Convert to xarray with logarithmic period spacing:
>>> import numpy as np >>> periods = np.logspace(-2, 3, 10) # 10 periods from 0.01 to 1000 >>> xr_data = est.to_xarray(periods) >>> print(xr_data.dims) ('period', 'output', 'input') >>> print(xr_data.coords['period'].values) [1.00e-02 3.16e-02 ... 1.00e+03]
Select data by period range:
>>> subset = xr_data.sel(period=slice(0.1, 100)) >>> print(subset.shape) (8, 2, 2)
- to_numpy() numpy.ndarray[source]
Convert estimate to numpy array.
Returns the HDF5 dataset as a numpy array. Data is loaded entirely into memory.
- Returns:
3D array with shape (n_periods, n_output_channels, n_input_channels).
- Return type:
np.ndarray
Notes
For large estimates, this loads all data into RAM. Consider using HDF5 slicing for memory-efficient access.
Examples
Get full estimate as numpy array:
>>> data = est.to_numpy() >>> print(data.shape) (10, 2, 2) >>> print(data.dtype) float64
Access specific period and channels:
>>> data = est.to_numpy() >>> # Get first 5 periods, output channel 0, input channel 1 >>> subset = data[:5, 0, 1] >>> print(subset.shape) (5,)
- from_numpy(new_estimate: numpy.ndarray) None[source]
Load estimate data from numpy array.
Validates dtype and shape compatibility, resizes dataset if needed, and stores the data.
- Parameters:
new_estimate (np.ndarray) – Estimate data to load. Must be convertible to numpy array. Preferred shape: (n_periods, n_output_channels, n_input_channels).
- Return type:
None
- Raises:
TypeError – If dtype doesn’t match existing dataset or input cannot be converted to numpy array.
Notes
‘data’ is a built-in Python function and cannot be used as parameter name. The dataset will be resized if shape doesn’t match.
Examples
Load estimate from numpy array:
>>> import numpy as np >>> new_data = np.random.rand(5, 2, 2) >>> est.from_numpy(new_data) >>> print(est.to_numpy().shape) (5, 2, 2)
Load with automatic dtype conversion:
>>> float_data = np.array([[[1.0, 2.0]]], dtype=np.float64) >>> est.from_numpy(float_data)
- from_xarray(data: xarray.DataArray) None[source]
Load estimate data from xarray DataArray.
Updates metadata from xarray coordinates and attributes, then stores the data.
- Parameters:
data (xr.DataArray) – DataArray containing estimate. Expected dimensions: (period, output, input).
- Return type:
None
Notes
This will update output_channels, input_channels, name, and data_type from the xarray object. All changes are persisted to HDF5.
Examples
Load estimate from modified xarray:
>>> xr_data = est.to_xarray(periods) >>> # Modify data and metadata >>> modified = xr_data * 2 # Scale by 2 >>> est.from_xarray(modified) >>> print(est.to_numpy()[0, 0, 0]) # Verify scale
Rename channels and reload:
>>> xr_data = est.to_xarray(periods) >>> new_xr = xr_data.rename({ ... 'output': ['Ex', 'Ey'], ... 'input': ['Bx', 'By'] ... }) >>> est.from_xarray(new_xr) >>> print(est.metadata.output_channels) ['Ex', 'Ey']