mth5.data package

Submodules

mth5.data.make_mth5_from_asc module

Created on Fri Jun 25 16:03:21 2021

@author: jpeacock

This module is concerned with creating mth5 files from the synthetic test data

that originally came from EMTF – test1.asc and test2.asc. Each ascii file represents five channels of data sampled at 1Hz at a synthetic station.

TODO: Separate the handling of legacy EMTF data files, such as

reading into a dataframe from oddly delimited data, as well as flipping polarities of the electric channels (possibly due to a baked in sign convention error in the legacy data), so that a simple dataframe can be passed. That will make the methods here more

easily generalize to work with other dataframes. That would be useful in future when we creating synthetic data at arbitrary sample rate.

Development Notes:

Mirroring the original ascii files are: data/test1.h5 data/test2.h5 data/test12rr.h5

Also created are some files with the same data but other channel_nomenclature schemes: data/test12rr_LEMI34.h5 data/test1_LEMI12.h5

  • 20231103: Added an 8Hz up-sampled version of test1. No spectral content was added

so the band between the old and new Nyquist frequencies is bogus.

mth5.data.make_mth5_from_asc.create_mth5_synthetic_file(station_cfgs: List[SyntheticStation], mth5_name: Path | str, target_folder: str | Path | None = '', source_folder: Path | str = '', plot: bool = False, add_nan_values: bool = False, file_version: Literal['0.1.0', '0.2.0'] = '0.1.0', force_make_mth5: bool = True, survey_metadata: Survey | None = None)[source]

Creates an MTH5 from synthetic data.

Development Notes:

20250203: This function could be made more general, so that it operates on dataframes and legacy emtf ascii files.

Parameters:

station_cfgs – Iterable of objects of type SyntheticStation. These are one-off

data structure used to hold information mth5 needs to initialize, specifically sample_rate, filters, etc. :type station_cfgs: List[SyntheticStation] :param mth5_name: Where the mth5 will be stored. This is generated by the station_config, but may change in this method based on add_nan_values or channel_nomenclature :type mth5_name: Union[pathlib.Path, str] :param target_folder: Where the mth5 file will be stored :type target_folder: Optional[Union[pathlib.Path, str]] :param source_folder: Where the ascii source data are stored :type source_folder: Optional[Union[pathlib.Path, str]] = “”, :param plot: Set to false unless you want to look at a plot of the time series :type plot: bool :param add_nan_values: If true, some np.nan are sprinkled into the time series. Intended to be used for tests. :type add_nan_values: bool :param file_version: One of the supported mth5 file versions. This is the version of mth5 to create. :type file_version: Literal[“0.1.0”, “0.2.0”] = “0.1.0”, :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :type force_make_mth5: bool :param survey_metadata: Option to provide survey metadata, otherwise it will be created. :type survey_metadata: Survey :return: The path to the stored h5 file. :rtype: mth5_path: pathlib.Path

mth5.data.make_mth5_from_asc.create_run_ts_from_synthetic_run(run: SyntheticRun, df: DataFrame, channel_nomenclature: ChannelNomenclature) RunTS[source]

Loop over channels of synthetic data in df and make ChannelTS objects.

Parameters:
  • run (mth5.data.station_config.SyntheticRun) – One-off data structure with information mth5 needs to initialize. Specifically sample_rate, filters.

  • df (pandas.DataFrame) – time series data in columns labelled from [“ex”, “ey”, “hx”, “hy”, “hz”]

:param channel_nomenclature : Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable from channel_nomenclature.py module in mt_metadata. Supported values include [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] :type channel_nomenclature : string

Return runts:

MTH5 run time series object, data and metadata bound into one.

Rtype runts:

RunTS

mth5.data.make_mth5_from_asc.create_test12rr_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file with data from two stations station named “test1” and “test2”.

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.create_test1_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file for a single station named “test1”.

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[Union[str, pathlib.Path]] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[Union[str, pathlib.Path]] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.create_test1_h5_with_nan(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file for a single station named “test1” with some nan values.

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.create_test2_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file for a single station named “test2”.

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.create_test3_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file for a single station named “test3”. This example has several runs and can be used to test looping over runs.

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.create_test4_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]

Creates an MTH5 file for a single station named “test1”, data are up-sampled to 8Hz from original 1 Hz.

Note: Because the 8Hz data are derived from the 1Hz, only frequencies below 0.5Hz will have valid TFs that yield the apparent resistivity of the synthetic data (100 Ohm-m).

Parameters:
  • file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create

  • channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable

from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file

mth5.data.make_mth5_from_asc.get_time_series_dataframe(run: SyntheticRun, source_folder: Path | str, add_nan_values: bool | None = False) DataFrame[source]

Returns time series data in a dataframe with columns named for EM field component.

Up-samples data to run.sample_rate, which is treated as in integer. Only tested for 8, to make 8Hz data for testing. If run.sample_rate is default (1.0) then no up-sampling takes place.

TODO: Move noise, and nan addition out of this method.

Parameters:
  • run (mth5.data.station_config.SyntheticRun) – Information needed to define/create the run

  • source_folder (Optional[Union[pathlib.Path, str]]) – Where to load the ascii time series from. This overwrites any

previous value that may have been stored in the SyntheticRun :type add_nan_values: bool :param add_nan_values: If True, add some NaN, if False, do not add Nan. :rtype df: pandas.DataFrame :return df: The time series data for the synthetic run

mth5.data.make_mth5_from_asc.main(file_version='0.1.0')[source]

Allow the module to be called from the command line

mth5.data.paths module

Sets up paths for synthetic data testing.

class mth5.data.paths.SyntheticTestPaths(sandbox_path: Path | None = None, ascii_data_path: Path | None = None)[source]

Bases: object

This class was created to workaround installations with read-only access to the folder containing mth5. Normally, the mth5 data/ folder can be used to store mth5 test data generated when running tests or examples. If data/ is read-only, then this class allows setting “sandbox_path”, a writable folder for tests or examples.

mkdirs() None[source]

Makes the directories that the tests will write results to.

writability_check() None[source]

Check if the path is writable, and Placeholder

Tried adding the second solution from here: https://stackoverflow.com/questions/2113427/determining-whether-a-directory-is-writeable

If dirs are not writeable, consider HOME = pathlib.Path().home() workaround_sandbox = HOME.joinpath(“.cache”, “aurora”, “sandbox”)

mth5.data.station_config module

This module contains tools for building MTH5 files from synthetic data.

Development Notes:
  • These tools are a work in progress and ideally will be able to yield

generalize to more than just the legacy EMTF ascii datasets that they initially served.

Definitions used in the creation of synthetic mth5 files.

Survey level: ‘mth5_path’, Path to output h5

Station level: mt_metadata Station() object with station info.
  • the id field (name of the station) is required.

  • other station metadata can be added

  • channel_nomenclature - The channel_nomenclature was previously stored at the run level. It makes more sense to store

    this info at the station level, as the only reason the nomenclature would change (that I can think of) would be if the acquistion system changed, in which case it would make the most sense to initialize a new station object.

Run level: ‘columns’, :channel names as a list; [“hx”, “hy”, “hz”, “ex”, “ey”] Run level: ‘raw_data_path’, Path to ascii data source Run level: ‘noise_scalars’, dict keyed by channel, default is zero, Run level: ‘nan_indices’, iterable of integers, where to put nan [ Run level: ‘filters’, dict of filters keyed by columns Run level: ‘run_id’, name of the run Run level: ‘sample_rate’, 1.0

class mth5.data.station_config.LegacyEMTFAsciiFile(file_path: Path)[source]

Bases: object

This class can be used to interact with the legacy synthetic data files that were originally in EMTF.

Development Notes:

As of 2025-02-03 the only LegacyEMTFAsciiFile date sources are sampled at 1Hz. One-off upsampling can be handled in this class if the requested sample rate differs.

IMPLICIT_SAMPLE_RATE = 1.0[source]
load_dataframe(channel_names: list, sample_rate: float) DataFrame[source]

Loads an EMTF legacy ASCII time series into a dataframe.

These files have an awkward whitespace separator, and also need to have the

electric field channels inverted to fix a phase swap.

Parameters:
  • channel_names (list) – The names of the channels in the legacy EMTF file, in order.

  • sample_rate (float) – The sample rate of the output time series in Hz.

Return df:

The labelled time series from the legacy EMTF file.

Rtype df:

pd.DataFrame

class mth5.data.station_config.SyntheticRun(id: str, sample_rate: float, channels: List[str], raw_data_path: str | Path | None = None, noise_scalars: dict | None = None, nan_indices: dict | None = None, filters: dict | None = None, start: str | None = None, timeseries_dataframe: DataFrame | None = None, data_source: str = 'legacy emtf ascii')[source]

Bases: object

Place to store information that will be needed to initialize and MTH5 Run object.

Initially this class worked only with the synthetic ASCII data from legacy EMTF.

class mth5.data.station_config.SyntheticStation(station_metadata: Station, mth5_name: str | Path | None = None, channel_nomenclature_keyword: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default)[source]

Bases: object

Class used to contain information needed to generate MTH5 file from synthetic data.

TODO: could add channel_nomenclature to this obj (instead of run) but would need to decide that

runs cannot change channel nomenclature first. If that were decided, the channel_map() could go here as well.

property channel_nomenclature[source]
mth5.data.station_config.main()[source]
mth5.data.station_config.make_filters(as_list: bool | None = False) dict | list[source]

Creates a collection of filters

Because the synthetic data from EMTF are already in mV/km and nT, no calibration filters are required.

The filters here are placeholders to show where instrument response function information would get assigned.

Parameters:

as_list (bool) – If True we return a list, False return a dict

Return filters_list:

Filters for populating the filters lists of synthetic data

Rtype filters_list:

Union[List, Dict]

mth5.data.station_config.make_station_01(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]

This method prepares the metadata needed to generate an mth5 with syntheric data.

Parameters:

channel_nomenclature (str) – Must be one of the nomenclatures defined in SupportedNomenclatureEnum

Returns:

Object with all info needed to generate MTH5 file from synthetic data.

Return type:

SyntheticStation

mth5.data.station_config.make_station_02(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]

Just like station 1, but the data are different

Parameters:

channel_nomenclature (SupportedNomenclatureEnum) – Must be one of the nomenclatures defined in SupportedNomenclatureEnum

Returns:

Object with all info needed to generate MTH5 file from synthetic data.

Return type:

SyntheticStation

mth5.data.station_config.make_station_03(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]

Create a synthetic station with multiple runs. Rather than generate fresh synthetic data, we just reuse test1.asc for each run.

Parameters:

channel_nomenclature (SupportedNomenclatureEnum) – Literal, Must be one of the nomenclatures defined in “channel_nomenclatures.json”

Return type:

SyntheticStation

Returns:

Object with all info needed to generate MTH5 file from synthetic data.

mth5.data.station_config.make_station_04(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]

Just like station 01, but data are resampled to 8Hz

Parameters:

channel_nomenclature (SupportedNomenclatureEnum) – Literal, Must be one of the nomenclatures defined in “channel_nomenclatures.json”

Return type:

SyntheticStation

Returns:

Object with all info needed to generate MTH5 file from synthetic data.

Module contents