mth5.timeseries.spectre package

Submodules

mth5.timeseries.spectre.helpers module

This is a placeholder module for functions that are used in testing and development of spectrograms.

mth5.timeseries.spectre.helpers.add_fcs_to_mth5(m: MTH5, fc_decimations: str | list | None = None, groupby_columns: List[str] = ['survey', 'station', 'sample_rate']) None[source]

Add Fourier Coefficient Levels ot an existing MTH5.

TODO: This method currently loops the heirarcy of the h5, and then calls an operator. How about making a single table that represents the loop up front and then looping once that table instead of this nested loop business? We would need a function that takes as input the groupby_columns.

Notes:

  • This module computes the FCs differently than the legacy aurora pipeline. It uses scipy.signal.spectrogram.

There is a test in Aurora to confirm that there are equivalent if we are not using fancy pre-whitening.

Parameters:
  • m (MTH5 object) – The mth5 file, open in append mode.

  • fc_decimations (Optional[Union[str, list]]) – This specifies the scheme to use for decimating the time series when building the FC layer. None: Just use default (something like four decimation levels, decimated by 4 each time say.) String: Controlled Vocabulary, values are a work in progress, that will allow custom definition of the fc_decimations for some common cases. For example, say you have stored already decimated time series, then you want simply the zeroth decimation for each run, because the decimated time series live under another run container, and that will get its own FCs. This is experimental. List: (UNTESTED) – This means that the user thought about the decimations that they want to create and is passing them explicitly. – probably will need to be a dictionary actually, since this would get redefined at each sample rate.

mth5.timeseries.spectre.helpers.calibrate_stft_obj(stft_obj: Dataset, run_obj: RunGroup, units: Literal['MT', 'SI'] = 'MT', channel_scale_factors: dict | None = None) Dataset[source]

Calibrates frequency domain data into MT units.

Development Notes:

The calibration often raises a runtime warning due to DC term in calibration response = 0. TODO: It would be nice to suppress this, maybe by only calibrating the non-dc terms and directly assigning np.nan to the dc component when DC-response is zero.

Parameters:
  • stft_obj (xarray.core.dataset.Dataset) – Time series of Fourier coefficients to be calibrated

  • run_obj (mth5.groups.master_station_run_channel.RunGroup) – Provides information about filters for calibration

  • units (string) – usually “MT”, contemplating supporting “SI”

  • scale_factors (dict or None) – keyed by channel, supports a single scalar to apply to that channels data Useful for debugging. Should not be used in production and should throw a warning if it is not None

Returns:

stft_obj – Time series of calibrated Fourier coefficients

Return type:

xarray.core.dataset.Dataset

mth5.timeseries.spectre.helpers.read_back_fcs(m: MTH5 | Path | str, mode: str = 'r', groupby_columns: List[str] = ['survey', 'station', 'sample_rate']) None[source]

Loops over stations in the channel summary of input (m) grouping by common sample_rate. Then loop over the runs in the corresponding FC Group. Finally, within an fc_group, loop decimation levels and read data to xarray. Log info about the shape of the xarray.

This is a helper function for tests. It was used as a sanity check while debugging the FC files, and also is a good example for how to access the data at each level for each channel.

Development Notes: The Time axis of the FC array changes from decimation_level to decimation_level. The frequency axis will shape will depend on the window length that was used to perform STFT. This is currently storing all (positive frequency) fcs by default, but future versions can also have selected bands within an FC container.

Parameters:
  • m (Union[MTH5, pathlib.Path, str]) – Either a path to an mth5, or an MTH5 object that the FCs will be read back from.

  • mode (str) – The mode to open the MTH5 file in. Defualts to (r)ead only.

mth5.timeseries.spectre.multiple_station module

Work In progress

This module is concerned with working with Fourier coefficient data

TODO: 2. Give MultivariateDataset a covariance() method

Tools include prototypes for - extracting portions of an FC Run Time Series - merging multiple stations runs together into an xarray - relabelling channels to avoid namespace clashes for multi-station data

class mth5.timeseries.spectre.multiple_station.FCRunChunk(survey_id: str = 'none', station_id: str = '', run_id: str = '', decimation_level_id: str = '0', start: str = '', end: str = '', channels: Tuple[str] = ())[source]

Bases: object

This class formalizes the required metadata to specify a chunk of a timeseries of Fourier coefficients.

This may move to mt_metadata – for now just use a dataclass as a prototype.

channels: Tuple[str] = ()[source]
decimation_level_id: str = '0'[source]
property duration: Timestamp[source]
end: str = ''[source]
property end_timestamp: Timestamp[source]
run_id: str = ''[source]
start: str = ''[source]
property start_timestamp: Timestamp[source]
station_id: str = ''[source]
survey_id: str = 'none'[source]
class mth5.timeseries.spectre.multiple_station.MultivariateDataset(dataset: Dataset, label_scheme: MultivariateLabelScheme | None = None)[source]

Bases: Spectrogram

Here is a container for a multivariate spectral dataset. The xarray is the main underlying item, but it will be useful to have functions that, for example returns a list of the associated stations, or that return a list of channels that are associated with a station, etc.

This is intended to be used as a multivariate spectral dotaset at one frequency band.

TODO: Consider making this an extension of Spectrogram TODO: Rename this class to MultivariateSpectrogram.

archive_cross_powers(tf_station: str, with_fcs: bool = True)[source]
tf_station: str

This tells us under which station we should store the output of this function. TODO: Consider moving this to another function which performs archiving in future.

with_fcs: bool

If True, the features are packed into the same hdf5-group as the FCs, as its own dataset. If False: the features are packed into the hdf5 features-group.

property channels: list[source]

returns a list of channels in the dataarray

cross_power(aweights: ndarray | None = None, bias: bool | None = True) DataArray[source]

Calculate the cross-power from a multivariate, complex-valued array of Fourier coefficients.

For a multivaraiate FC Dataset with n_time time windows, this returns an array with the same number of time windows. At each time _t_, the result is a covariance matrix.

Caveats and Notes:
  • This method calls numpy.cov, which means that the cross-power is computes as X@XH (rather than

XH@X). Sometimes X*XH is referred to as the Vozoff convention, whereas XH*X could be the Bendat & Piersol convention. - np.cov subtracts the meas before computing the cross terms. - This methos will use the entire band of the spectrogram.

Parameters:
  • X (xr.DataArray) – Multivariate time series as an xarray

  • aweights (Optional[np.ndarray]) – This is a “passthrough” parameter to numpy.cov These relative weights are typically large for observations considered “important” and smaller for observations considered less “important”. If ddof=0 the array of weights can be used to assign probabilities to observation vectors.

  • bias (bool) – bias=True normalizes by N instead of (N-1).

Return type:

xr.DataArray

Returns:

The covariance matrix of the data in xarray form.

property label_scheme: MultivariateLabelScheme[source]
property num_channels: int[source]

returns a count of the total number of channels in the dataset

station_channels(station: str) List[str][source]
This is a utility function that provides a way to access channel_names in a multivariate array associated

with a particular station.

The list is accessed via the self._station_channels attr, which gets set here if it has not

been initialized previously. self._station_channels is a dict keyed by station_id, with value is a list of channel names for that station.

Parameters:

station (str) – The name of the station.

Return type:

List[str]

Returns:

list of channel names for the input station.

property stations: List[str][source]

Parses the channel names, extracts the station names

return a unique list of stations preserving order.

class mth5.timeseries.spectre.multiple_station.MultivariateLabelScheme(label_elements: tuple = ('station', 'component'), join_char: str = '_')[source]

Bases: object

Class to store information about how a multivariate (MV) dataset will be lablelled.

Has a scheme to handle the how channels will be named.

This is just a place holder to manage possible future complexity.

It seemed like a good idea to formalize the fact that we take, by default f”{station}_{component}” as the MV channel label. It also seemed like a good idea to record what the join character is. In the event that we wind up with station names that have underscores in them, then we could, for example, set the join character to “__”.

TODO: Consider rename default to (“station”, “data_var”) instead of (“station”, “component”)

:param : :type : type label_elements: tuple :param : :type : param label_elements: This is meant to tell what information is being concatenated into an MV channel label. :param : :type : type join_char: str :param : :type : param join_char: The string that is used to join the label elements.

property id: str[source]
join(elements: list | tuple) str[source]

Join the label elements to a string

Parameters:

elements (tuple) – Expected to be the label elements, default are (station, component)

Returns:

The name of the channel (in a multiple-station context).

Return type:

str

join_char: str = '_'[source]
label_elements: tuple = ('station', 'component')[source]
split(mv_channel_name) dict[source]

Splits a multi-station channel name and returns a dict of strings, keyed by self.label_elements. This method is basically the reverse of self.join

Parameters:

mv_channel_name (str) – a multivariate channel name string

Returns:

Channel name as a dictionary.

Return type:

dict

mth5.timeseries.spectre.multiple_station.apply_masks_and_weights()[source]
mth5.timeseries.spectre.multiple_station.calculate_mask_from_feature(feature_series, threshold_obj)[source]
mth5.timeseries.spectre.multiple_station.calculate_weight_from_feature(feature_series, threshold_obj)[source]

This calculates a weighting function based on the thresholds and possibly some other info, such as the distribution of the features.

The weigth function is interpolated over the range of the feature values and then evaluated at the feature values.

Parameters:
  • feature_series

  • threshold_obj

mth5.timeseries.spectre.multiple_station.make_multistation_spectrogram(m: MTH5, fc_run_chunks: list, label_scheme: MultivariateLabelScheme | None = MultivariateLabelScheme(label_elements=('station', 'component'), join_char='_'), rtype: Literal['xrds'] | None = None) Dataset | MultivariateDataset[source]

See notes in mth5 issue #209. Takes a list of FCRunChunks and returns the largest contiguous block of multichannel FC data available.

|----------Station 1 ------------|

|----------Station 2 ------------|

|--------------------Station 3 ----------------------|

Handle additional runs in a separate call to this function and then concatenate time series afterwards.

Input must specify N (station-run-start-end-channel_list) tuples. If channel_list is not provided, get all channels. If start-end are not provided, read the whole run – warn if runs are not all synchronous, and truncate all to max(starts), min(ends) after the start and end times are sorted out.

Station IDs must be unique.

Parameters:
  • m (mth5.mth5.MTH5) – The mth5 object to get the FCs from.

  • fc_run_chunks (list) – Each element of this describes a chunk of a run to load from stored FCs.

  • label_scheme (Optional[MultivariateLabelScheme]) – Specifies how the channels are to be named in the multivariate xarray.

  • rtype – Specifies whether to return an xarray or a MultivariateDataset. Currently only supports “xrds”,

otherwise will return MultivariateDataset. :type rtype: Optional[Literal[“xrds”]]

Return type:

Union[xarray.Dataset, MultivariateDataset]:

Returns:

The multivariate dataset, either as an xarray or as a MultivariateDataset

mth5.timeseries.spectre.multiple_station.merge_masks()[source]

calcualtes a “final mask” that is loaded and applied to the data input to regression

mth5.timeseries.spectre.multiple_station.merge_weights()[source]
calcualtes a “final mask” that is loaded and applied to the data

input to regression

mth5.timeseries.spectre.spectrogram module

Module contains a class that represents a spectrogram. i.e. A 2D time series of Fourier coefficients with axes time and the other frequency. The datasets are xarray/dataframe and are fundmentally multivariate.

class mth5.timeseries.spectre.spectrogram.Spectrogram(dataset: Dataset | None = None)[source]

Bases: object

Class to contain methods for STFT objects.

TODO: Add OLS Z-estimates – actually, these are properties of cross powers, not direct properties of spectrograms. TODO: Add Sims/Vozoff Z-estimates – actually, these are properties of cross powers as well. Note Coherence is similarly, a property of cross powers. There are in fact, very few features that we would derive from an unaveraged spectrogram. Pretty much everything except statistical moments comes from cross powers.

Development Notes: - The spectrogram class is fundamental to MT Processing, and normally appears during the STFT operation. - The extract_band method returns another Spectrogram, having the same time axis as the parent object, but only a slice of the frequency range. Both of these have in common that their frequency axes are uniformly spaced, delta-f, where delta-f is dictated by the time series sample rate and the FFT window lenght. - There is a sibling spectral-time-series container that should be considered. Call it for now, a FrequencyChunkedSpectrogram (or an AveragedSpectrogram). This is a container similar to spectrogram, but the frequencies are not uniformly spaced (instead, often logartihmically spaced), they are made from one or more (possibly multivariate) spectrograms, and a FrequencyBands object. The key difference is that in a FrequencyChunkedSpectrogram object has a non-uniform spaced the Frequency axis which was prescribed by a metadata object. Most features, as well as TFs have a FrequencyChunkedSpectrogram representation, where final TFs are just time-averaged a FrequencyChunkedSpectrograms.

TODO: consider factoring a simpler class that does not make the uniform frequency axis assumption. Spectrogram would extend this class and add the _frequency_increment property (taken from the differece in the first two values of the frequency axis), and num_harmoincs in band.

covariance_matrix(band_data: Spectrogram | None = None, method: str = 'numpy_cov') DataArray[source]

TODO: Add tests for this WIP Work-in-progress method Compute full covariance matrix for spectrogram data.

For complex-valued data, the result is a Hermitian matrix where: - diagonal elements are real-valued variances - off-diagonal element [i,j] is E[ch_i * conj(ch_j)] - off-diagonal element [j,i] is the complex conjugate of [i,j]

Parameters:
  • band_data (Spectrogram, optional) – If provided, compute covariance for this data If None, use the full spectrogram

  • method (str) – Computation method. Currently only supports ‘numpy_cov’

Returns:

Hermitian covariance matrix with proper channel labeling For channels i,j: matrix[i,j] = E[ch_i * conj(ch_j)]

Return type:

xr.DataArray

cross_power_label(ch1: str, ch2: str, join_char: str = '_')[source]

joins channel names with join_char

cross_powers(frequency_bands: FrequencyBands, channel_pairs: List[Tuple[str, str]] | None = None)[source]

Compute cross powers between channel pairs for given frequency bands.

TODO: Add handling for case when band in frequency_bands is not contained in self.frequencies.

Parameters:
  • frequency_bands (FrequencyBands) – The frequency bands to compute cross powers for. Each element of this iterable tells the lower and upper bounds of the cross-power calculation bands. These may become objects with information about tapers as ewwll.

  • channel_pairs (list of tuples, optional) – List of channel pairs to compute cross powers for. If None, all possible pairs will be used.

Returns:

Dataset containing cross powers for all channel pairs. Each variable is named by the channel pair (e.g. ‘ex_hy’) and contains a 2D array with dimensions (frequency, time). All variables share common frequency and time coordinates.

Return type:

xr.Dataset

property dataarray[source]

returns the underlying xarray data

property dataset[source]

returns the underlying xarray data

extract_band(frequency_band: Band, channels: list | None = None, epsilon: float | None = None)[source]

Returns another instance of Spectrogram, with the frequency axis reduced to the input band.

Parameters:
  • frequency_band

  • channels

Returns:

spectrogram – Returns a Spectrogram object with only the extracted band for a dataset

Return type:

aurora.time_series.spectrogram.Spectrogram

flatten(chunk_by: Literal['time', 'frequency'] = 'time') Dataset[source]

Reshape the 2D spectrogram into a 1D flattened xarray (time-chunked by default).

Parameters:

chunk_by (Literal["time", "frequency"]) – Reshaping the 2D spectrogram can be done two ways, (basically “row-major”, or column-major). In xarray, but we either keep frequency constant and iterate over time, or keep time constant and iterate over frequency (in the inner loop).

Returns:

  • xarray.Dataset (The dataset from the band spectrogram, stacked.)

  • Development Notes

  • The flattening used in tf calculation by default is opposite to here

  • dataset.stack(observation=(“frequency”, “time”))

  • However, for feature extraction, it may make sense to swap the order

  • xrds = band_spectrogram.dataset.stack(observation=(“time”, “frequency”))

  • This is like chunking into time windows and allows individual features to be computed on each time window – if desired.

  • Still need to split the time series though–Splitting to time would be a reshape by (last_freq_index-first_freq_index).

  • Using pure xarray this may not matter but if we drop down into numpy it could be useful.

property frequency_axis[source]

returns the frequency axis of the underlying xarray

property frequency_band: Band[source]

returns a frequency band object representing the spectrograms band (assumes continuous)

property frequency_increment[source]

returns the “delta f” of the frequency axis - assumes uniformly sampled in frequency domain

num_harmonics_in_band(frequency_band: Band, epsilon: float = 1e-07) int[source]

Returns the number of harmonics within the frequency band in the underlying dataset

Parameters:
  • frequency_band

  • stft_obj

Returns:

num_harmonics – The number of harmonics in the underlying dataset within the given frequency band.

Return type:

int

property time_axis[source]

returns the time axis of the underlying xarray

mth5.timeseries.spectre.spectrogram.extract_band(frequency_band: Band, fft_obj: Dataset | DataArray, channels: list | None = None, epsilon: float = 1e-07) Dataset | DataArray[source]

Extracts a frequency band from xr.DataArray representing a spectrogram.

TODO: Update variable names.

Development Notes:

Base dataset object should be a xr.DataArray (not xr.Dataset) - drop=True does not play nice with h5py and Dataset, results in a type error. File “stringsource”, line 2, in h5py.h5r.Reference.__reduce_cython__ TypeError: no default __reduce__ due to non-trivial __cinit__ However, it works OK with DataArray.

Parameters:
  • frequency_band (mt_metadata.common.band.Band) – Specifies interval corresponding to a frequency band

  • fft_obj (xarray.core.dataset.Dataset) – Short-time-Fourier-transformed datat. Can be multichannel.

  • channels (list) – Channel names to extract.

  • epsilon (float) – Use this when you are worried about missing a frequency due to round off error. This is in general not needed if we use a df/2 pad around true harmonics.

Returns:

extracted_band – The frequencies within the band passed into this function

Return type:

xr.DataArray

Module contents

Allows access to classes that we want to import without full pathing to module.