mth5.data package
Submodules
mth5.data.make_mth5_from_asc module
Created on Fri Jun 25 16:03:21 2021
@author: jpeacock
- This module is concerned with creating mth5 files from the synthetic test data
that originally came from EMTF – test1.asc and test2.asc. Each ascii file represents five channels of data sampled at 1Hz at a synthetic station.
- TODO: Separate the handling of legacy EMTF data files, such as
reading into a dataframe from oddly delimited data, as well as flipping polarities of the electric channels (possibly due to a baked in sign convention error in the legacy data), so that a simple dataframe can be passed. That will make the methods here more
easily generalize to work with other dataframes. That would be useful in future when we creating synthetic data at arbitrary sample rate.
- Development Notes:
Mirroring the original ascii files are: data/test1.h5 data/test2.h5 data/test12rr.h5
Also created are some files with the same data but other channel_nomenclature schemes: data/test12rr_LEMI34.h5 data/test1_LEMI12.h5
20231103: Added an 8Hz up-sampled version of test1. No spectral content was added
so the band between the old and new Nyquist frequencies is bogus.
- mth5.data.make_mth5_from_asc.create_mth5_synthetic_file(station_cfgs: List[SyntheticStation], mth5_name: Path | str, target_folder: str | Path | None = '', source_folder: Path | str = '', plot: bool = False, add_nan_values: bool = False, file_version: Literal['0.1.0', '0.2.0'] = '0.1.0', force_make_mth5: bool = True, survey_metadata: Survey | None = None)[source]
Creates an MTH5 from synthetic data.
- Development Notes:
20250203: This function could be made more general, so that it operates on dataframes and legacy emtf ascii files.
- Parameters:
station_cfgs – Iterable of objects of type SyntheticStation. These are one-off
data structure used to hold information mth5 needs to initialize, specifically sample_rate, filters, etc. :type station_cfgs: List[SyntheticStation] :param mth5_name: Where the mth5 will be stored. This is generated by the station_config, but may change in this method based on add_nan_values or channel_nomenclature :type mth5_name: Union[pathlib.Path, str] :param target_folder: Where the mth5 file will be stored :type target_folder: Optional[Union[pathlib.Path, str]] :param source_folder: Where the ascii source data are stored :type source_folder: Optional[Union[pathlib.Path, str]] = “”, :param plot: Set to false unless you want to look at a plot of the time series :type plot: bool :param add_nan_values: If true, some np.nan are sprinkled into the time series. Intended to be used for tests. :type add_nan_values: bool :param file_version: One of the supported mth5 file versions. This is the version of mth5 to create. :type file_version: Literal[“0.1.0”, “0.2.0”] = “0.1.0”, :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :type force_make_mth5: bool :param survey_metadata: Option to provide survey metadata, otherwise it will be created. :type survey_metadata: Survey :return: The path to the stored h5 file. :rtype: mth5_path: pathlib.Path
- mth5.data.make_mth5_from_asc.create_run_ts_from_synthetic_run(run: SyntheticRun, df: DataFrame, channel_nomenclature: ChannelNomenclature) RunTS[source]
Loop over channels of synthetic data in df and make ChannelTS objects.
- Parameters:
run (mth5.data.station_config.SyntheticRun) – One-off data structure with information mth5 needs to initialize. Specifically sample_rate, filters.
df (pandas.DataFrame) – time series data in columns labelled from [“ex”, “ey”, “hx”, “hy”, “hz”]
:param channel_nomenclature : Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable from channel_nomenclature.py module in mt_metadata. Supported values include [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] :type channel_nomenclature : string
- Return runts:
MTH5 run time series object, data and metadata bound into one.
- Rtype runts:
RunTS
- mth5.data.make_mth5_from_asc.create_test12rr_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file with data from two stations station named “test1” and “test2”.
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.create_test1_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file for a single station named “test1”.
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[Union[str, pathlib.Path]] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[Union[str, pathlib.Path]] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.create_test1_h5_with_nan(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file for a single station named “test1” with some nan values.
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.create_test2_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file for a single station named “test2”.
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.create_test3_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file for a single station named “test3”. This example has several runs and can be used to test looping over runs.
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :type force_make_mth5: bool :param force_make_mth5: If set to true, the file will be made, even if it already exists. If false, and file already exists, skip the make job. :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.create_test4_h5(file_version: str | None = '0.1.0', channel_nomenclature: str | None = 'default', target_folder: str | Path | None = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/mth5/checkouts/latest/mth5/data/mth5'), source_folder: str | Path | None = '', force_make_mth5: bool | None = True) Path[source]
Creates an MTH5 file for a single station named “test1”, data are up-sampled to 8Hz from original 1 Hz.
Note: Because the 8Hz data are derived from the 1Hz, only frequencies below 0.5Hz will have valid TFs that yield the apparent resistivity of the synthetic data (100 Ohm-m).
- Parameters:
file_version (str) – One of [“0.1.0”, “0.2.0”], corresponding to the version of mth5 to create
channel_nomenclature (Optional[str]) – Keyword corresponding to channel nomenclature mapping in CHANNEL_MAPS variable
from channel_nomenclature.py module in mt_metadata. Supported values are [‘default’, ‘lemi12’, ‘lemi34’, ‘phoenix123’] A full list is in mt_metadata/transfer_functions/processing/aurora/standards/channel_nomenclatures.json :type target_folder: Optional[str, pathlib.Path] :param target_folder: Where the mth5 file will be stored :type source_folder: Optional[str, pathlib.Path] :param source_folder: Where the ascii source data are stored :rtype: pathlib.Path :return: the path to the mth5 file
- mth5.data.make_mth5_from_asc.get_time_series_dataframe(run: SyntheticRun, source_folder: Path | str, add_nan_values: bool | None = False) DataFrame[source]
Returns time series data in a dataframe with columns named for EM field component.
Up-samples data to run.sample_rate, which is treated as in integer. Only tested for 8, to make 8Hz data for testing. If run.sample_rate is default (1.0) then no up-sampling takes place.
TODO: Move noise, and nan addition out of this method.
- Parameters:
run (mth5.data.station_config.SyntheticRun) – Information needed to define/create the run
source_folder (Optional[Union[pathlib.Path, str]]) – Where to load the ascii time series from. This overwrites any
previous value that may have been stored in the SyntheticRun :type add_nan_values: bool :param add_nan_values: If True, add some NaN, if False, do not add Nan. :rtype df: pandas.DataFrame :return df: The time series data for the synthetic run
mth5.data.paths module
Sets up paths for synthetic data testing.
- class mth5.data.paths.SyntheticTestPaths(sandbox_path: Path | None = None, ascii_data_path: Path | None = None)[source]
Bases:
objectThis class was created to workaround installations with read-only access to the folder containing mth5. Normally, the mth5 data/ folder can be used to store mth5 test data generated when running tests or examples. If data/ is read-only, then this class allows setting “sandbox_path”, a writable folder for tests or examples.
- writability_check() None[source]
Check if the path is writable, and Placeholder
Tried adding the second solution from here: https://stackoverflow.com/questions/2113427/determining-whether-a-directory-is-writeable
If dirs are not writeable, consider HOME = pathlib.Path().home() workaround_sandbox = HOME.joinpath(“.cache”, “aurora”, “sandbox”)
mth5.data.station_config module
This module contains tools for building MTH5 files from synthetic data.
- Development Notes:
These tools are a work in progress and ideally will be able to yield
generalize to more than just the legacy EMTF ascii datasets that they initially served.
Definitions used in the creation of synthetic mth5 files.
Survey level: ‘mth5_path’, Path to output h5
- Station level: mt_metadata Station() object with station info.
the id field (name of the station) is required.
other station metadata can be added
channel_nomenclature - The channel_nomenclature was previously stored at the run level. It makes more sense to store
this info at the station level, as the only reason the nomenclature would change (that I can think of) would be if the acquistion system changed, in which case it would make the most sense to initialize a new station object.
Run level: ‘columns’, :channel names as a list; [“hx”, “hy”, “hz”, “ex”, “ey”] Run level: ‘raw_data_path’, Path to ascii data source Run level: ‘noise_scalars’, dict keyed by channel, default is zero, Run level: ‘nan_indices’, iterable of integers, where to put nan [ Run level: ‘filters’, dict of filters keyed by columns Run level: ‘run_id’, name of the run Run level: ‘sample_rate’, 1.0
- class mth5.data.station_config.LegacyEMTFAsciiFile(file_path: Path)[source]
Bases:
objectThis class can be used to interact with the legacy synthetic data files that were originally in EMTF.
- Development Notes:
As of 2025-02-03 the only LegacyEMTFAsciiFile date sources are sampled at 1Hz. One-off upsampling can be handled in this class if the requested sample rate differs.
- load_dataframe(channel_names: list, sample_rate: float) DataFrame[source]
Loads an EMTF legacy ASCII time series into a dataframe.
- These files have an awkward whitespace separator, and also need to have the
electric field channels inverted to fix a phase swap.
- Parameters:
channel_names (list) – The names of the channels in the legacy EMTF file, in order.
sample_rate (float) – The sample rate of the output time series in Hz.
- Return df:
The labelled time series from the legacy EMTF file.
- Rtype df:
pd.DataFrame
- class mth5.data.station_config.SyntheticRun(id: str, sample_rate: float, channels: List[str], raw_data_path: str | Path | None = None, noise_scalars: dict | None = None, nan_indices: dict | None = None, filters: dict | None = None, start: str | None = None, timeseries_dataframe: DataFrame | None = None, data_source: str = 'legacy emtf ascii')[source]
Bases:
objectPlace to store information that will be needed to initialize and MTH5 Run object.
Initially this class worked only with the synthetic ASCII data from legacy EMTF.
- class mth5.data.station_config.SyntheticStation(station_metadata: Station, mth5_name: str | Path | None = None, channel_nomenclature_keyword: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default)[source]
Bases:
objectClass used to contain information needed to generate MTH5 file from synthetic data.
- TODO: could add channel_nomenclature to this obj (instead of run) but would need to decide that
runs cannot change channel nomenclature first. If that were decided, the channel_map() could go here as well.
- mth5.data.station_config.make_filters(as_list: bool | None = False) dict | list[source]
Creates a collection of filters
- Because the synthetic data from EMTF are already in mV/km and nT, no calibration filters are required.
The filters here are placeholders to show where instrument response function information would get assigned.
- Parameters:
as_list (bool) – If True we return a list, False return a dict
- Return filters_list:
Filters for populating the filters lists of synthetic data
- Rtype filters_list:
Union[List, Dict]
- mth5.data.station_config.make_station_01(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]
This method prepares the metadata needed to generate an mth5 with syntheric data.
- Parameters:
channel_nomenclature (str) – Must be one of the nomenclatures defined in SupportedNomenclatureEnum
- Returns:
Object with all info needed to generate MTH5 file from synthetic data.
- Return type:
- mth5.data.station_config.make_station_02(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]
Just like station 1, but the data are different
- Parameters:
channel_nomenclature (SupportedNomenclatureEnum) – Must be one of the nomenclatures defined in SupportedNomenclatureEnum
- Returns:
Object with all info needed to generate MTH5 file from synthetic data.
- Return type:
- mth5.data.station_config.make_station_03(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]
Create a synthetic station with multiple runs. Rather than generate fresh synthetic data, we just reuse test1.asc for each run.
- Parameters:
channel_nomenclature (SupportedNomenclatureEnum) – Literal, Must be one of the nomenclatures defined in “channel_nomenclatures.json”
- Return type:
- Returns:
Object with all info needed to generate MTH5 file from synthetic data.
- mth5.data.station_config.make_station_04(channel_nomenclature: SupportedNomenclatureEnum = SupportedNomenclatureEnum.default) SyntheticStation[source]
Just like station 01, but data are resampled to 8Hz
- Parameters:
channel_nomenclature (SupportedNomenclatureEnum) – Literal, Must be one of the nomenclatures defined in “channel_nomenclatures.json”
- Return type:
- Returns:
Object with all info needed to generate MTH5 file from synthetic data.