mth5.io package

Subpackages

Submodules

mth5.io.collection module

Phoenix file collection

Created on Thu Aug 4 16:48:47 2022

@author: jpeacock

class mth5.io.collection.Collection(file_path=None, **kwargs)[source]

Bases: object

A general collection class to keep track of files with methods to create runs and run ids.

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:
  • df (pandas.DataFrame) – dataframe with file information

  • zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

property file_path[source]

Path object to file directory

get_empty_entry_dict()[source]
Returns:

an empty dictionary with the proper keys for an entry into a dataframe

Return type:

dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:

extension (string or list) – file extension(s)

Returns:

list of files in the file_path with the given extensions

Return type:

list of Path objects

get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:
  • max_hours (TYPE, optional) – DESCRIPTION, defaults to 6

  • min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:
  • sample_rates – list of sample rates to read, defaults to [150, 24000]

  • run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:
>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

  • survey: survey id

  • station: station id

  • run: run id

  • start: start time UTC

  • end: end time UTC

  • channel_id: channel id or list of channel id’s in file

  • component: channel component or list of components in file

  • fn: path to file

  • sample_rate: sample rate in samples per second

  • file_size: file size in bytes

  • n_samples: number of samples in file

  • sequence_number: sequence number of the file

  • instrument_id: instrument id

  • calibration_fn: calibration file

Parameters:
  • sample_rates (list, optional) – list of sample rates to process, defaults to None

  • run_name_zeros (int, optional) – number of zeros in run name, defaults to 4

  • calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

mth5.io.conversion module

Convert MTH5 to other formats

  • MTH5 -> miniSEED + StationXML

class mth5.io.conversion.MTH5ToMiniSEEDStationXML(mth5_path: str | Path | None = None, save_path: str | Path | None = None, network_code: str = 'ZU', use_runs_with_data_only: bool = True, **kwargs: Any)[source]

Bases: object

Convert MTH5 files to miniSEED and StationXML formats.

This class provides functionality to convert magnetotelluric data stored in MTH5 format to industry-standard miniSEED time series files and StationXML metadata files for data exchange and archival purposes.

Parameters:
  • mth5_path (str, Path, or None, default None) – Path to the input MTH5 file to be converted

  • save_path (str, Path, or None, default None) – Directory path where output files will be saved. If None, uses the parent directory of mth5_path

  • network_code (str, default "ZU") – Two-character FDSN network code for the output files

  • use_runs_with_data_only (bool, default True) – If True, only process runs that contain actual time series data

  • **kwargs (dict) – Additional keyword arguments to set as instance attributes

mth5_path[source]

Path to the MTH5 input file

Type:

Path or None

save_path[source]

Directory where output files are saved

Type:

Path

network_code[source]

FDSN network code for output files

Type:

str

use_runs_with_data_only[source]

Flag to process only runs with data

Type:

bool

encoding[source]

Encoding format for miniSEED files

Type:

str or None

Examples

>>> converter = MTH5ToMiniSEEDStationXML(
...     mth5_path="/path/to/data.h5",
...     network_code="MT",
...     save_path="/path/to/output"
... )
>>> xml_file, mseed_files = converter.convert_mth5_to_ms_stationxml()
classmethod convert_mth5_to_ms_stationxml(mth5_path: str | Path, save_path: str | Path | None = None, network_code: str = 'ZU', use_runs_with_data_only: bool = True, **kwargs: Any) tuple[Path, list[Path]][source]

Convert an MTH5 file to miniSEED and StationXML formats.

Class method that provides a convenient interface to convert MTH5 data to standard seismological formats for data exchange and archival.

Parameters:
  • mth5_path (str or Path) – Path to the input MTH5 file to be converted

  • save_path (str, Path, or None, default None) – Directory where output files will be saved. If None, uses the parent directory of mth5_path

  • network_code (str, default "ZU") – Two-character FDSN network code for output files

  • use_runs_with_data_only (bool, default True) – If True, only process runs containing actual time series data

  • **kwargs (dict) – Additional keyword arguments passed to converter initialization

Returns:

Tuple containing: - Path to the generated StationXML file - List of paths to generated miniSEED files (one per day per channel)

Return type:

tuple[Path, list[Path]]

Examples

>>> xml_file, mseed_files = MTH5ToMiniSEEDStationXML.convert_mth5_to_ms_stationxml(
...     "/path/to/data.h5",
...     network_code="MT",
...     save_path="/output/directory"
... )
>>> print(f"Created {len(mseed_files)} miniSEED files and {xml_file}")
property mth5_path: Path | None[source]

Path to the MTH5 input file.

Returns:

Path to the MTH5 file to be converted, or None if not set.

Return type:

Path or None

property network_code: str[source]

Two-character FDSN network code.

Returns:

Alphanumeric string of exactly 2 characters as required by FDSN DMC.

Return type:

str

property save_path: Path[source]

Directory path where output files will be saved.

Returns:

Directory path for saving miniSEED and StationXML files.

Return type:

Path

split_ms_to_days(streams, save_path: Path, encoding: str) list[Path][source]

Split miniSEED traces into daily files.

Splits continuous time series traces into separate files for each day to conform with standard seismological data archiving practices.

Parameters:
  • streams (obspy.Stream) – Stream object containing traces to be split by day

  • save_path (Path) – Directory where daily miniSEED files will be saved

  • encoding (str) – Data encoding format for miniSEED files (e.g., ‘INT32’, ‘FLOAT64’)

Returns:

List of paths to the generated daily miniSEED files

Return type:

list[Path]

Notes

Files are named using the pattern: {network}_{station}_{location}_{channel}_{YYYY_MM_DDTHH_MM_SS}.mseed

mth5.io.conversion.get_encoding(run_ts) str[source]

Determine consistent data encoding for miniSEED files across channels.

Analyzes data types across all channels in a run and selects a median encoding to ensure compatibility in miniSEED file generation.

Parameters:

run_ts (RunTS) – Run time series object containing multiple channels of data

Returns:

String identifier for miniSEED encoding format (e.g., ‘INT32’, ‘FLOAT64’)

Return type:

str

Notes

Uses median data type to handle mixed precision datasets. Automatically converts INT64 to INT32 for miniSEED compatibility since some readers don’t support 64-bit integers.

Examples

>>> encoding = get_encoding(run_timeseries)
>>> print(f"Selected encoding: {encoding}")
mth5.io.conversion.split_miniseed_by_day(input_file: str | Path) list[Path][source]

Split an existing miniSEED file into daily files.

Utility function to split a multi-day miniSEED file into separate files for each calendar day, following standard seismological archiving practices.

Parameters:

input_file (str or Path) – Path to the input miniSEED file to be split

Returns:

List of paths to the generated daily miniSEED files

Return type:

list[Path]

Notes

Output files are named using the pattern: {network}.{station}.{location}.{channel}.{YYYY-MM-DD}.mseed

Files are saved in the same directory as the input file.

Examples

>>> daily_files = split_miniseed_by_day("/path/to/continuous.mseed")
>>> print(f"Created {len(daily_files)} daily files")

mth5.io.reader module

Universal reader for magnetotelluric time series data files.

This module provides a plugin-like system for reading various MT data formats and returning appropriate mth5.timeseries objects. The reader automatically detects file types and dispatches to the correct parser.

Plugin Structure

If you are writing your own reader, implement the following structure:

  • Class object that reads the given file format

  • A reader function named read_{file_type} (e.g., read_nims)

  • Return value should be a mth5.timeseries.MTTS or mth5.timeseries.RunTS object plus extra metadata as a dictionary with keys formatted as {level.attribute}

Example Implementation

class NewFile:
    def __init__(self, fn):
        self.fn = fn

    def read_header(self):
        return header_information

    def read_newfile(self):
        ex, ey, hx, hy, hz = read_in_channels_as_MTTS
        return RunTS([ex, ey, hx, hy, hz])

def read_newfile(fn):
    new_file_obj = NewFile(fn)
    run_obj = new_file_obj.read_newfile()
    return run_obj, extra_metadata

Then add your reader to the readers dictionary for automatic detection.

See also

Existing readers in mth5.io for implementation guidance.

Created on Wed Aug 26 10:32:45 2020

author:

Jared Peacock

license:

MIT

mth5.io.reader.get_reader(extension: str) tuple[str, Callable][source]

Get the appropriate reader function for a file extension.

Searches the reader registry to find the correct parser function for the given file extension. Handles ambiguous extensions by issuing warnings when multiple readers might apply.

Parameters:

extension (str) – File extension (without the dot) to find a reader for

Returns:

Tuple containing: - Reader name (str): Identifier for the reader type - Reader function (Callable): Function to parse files of this type

Return type:

tuple[str, Callable]

Raises:

ValueError – If no reader is found for the given file extension

Examples

>>> reader_name, reader_func = get_reader("z3d")
>>> print(reader_name)  # "zen"
>>> data = reader_func("/path/to/file.z3d")

Notes

Some extensions like “bin” are ambiguous and could match multiple readers (NIMS or Phoenix). A warning is issued in such cases.

mth5.io.reader.read_file(fn: str | Path | list[str | Path], file_type: str | None = None, **kwargs: Any) Any[source]

Universal reader for magnetotelluric time series data files.

Automatically detects the file type based on extension and dispatches to the appropriate reader function. Supports both single files and lists of files for multi-file formats.

Parameters:
  • fn (str, Path, or list of str/Path) – Full path(s) to data file(s) to be read. For multi-file formats, pass a list of file paths.

  • file_type (str, optional) – Specific reader type to use if file extension is ambiguous. Must be one of the keys in the readers registry, by default None

  • **kwargs (dict) – Additional keyword arguments passed to the specific reader function. Supported arguments depend on the file format and reader.

Returns:

Time series object containing the data: - mth5.timeseries.MTTS for single channel data - mth5.timeseries.RunTS for multi-channel run data

Return type:

MTTS or RunTS

Raises:
  • IOError – If any specified file does not exist

  • KeyError – If the specified file_type is not supported

  • ValueError – If no reader can be found for the file extension

Examples

Read a single Z3D file (auto-detected)

>>> data = read_file("/path/to/station_001.z3d")
>>> print(type(data))  # <class 'mth5.timeseries.ChannelTS'>

Read with explicit file type for ambiguous extensions

>>> data = read_file("/path/to/data.bin", file_type="nims")
>>> print(data.n_channels)

Read multiple files for a multi-file format

>>> files = ["/path/to/file1.asc", "/path/to/file2.asc"]
>>> run_data = read_file(files, sample_rate=1.0)

Notes

Supported file types and extensions: - zen: .z3d (Zonge Z3D files) - nims: .bin, .bnn (USGS NIMS files) - usgs_ascii: .asc, .zip (USGS ASCII format) - miniseed: .miniseed, .ms, .mseed (miniSEED format) - lemi424: .txt (LEMI-424 format) - phoenix: .bin, .td_30, .td_150, .td_24k (Phoenix formats) - metronix: .atss (Metronix ADU format)

For ambiguous extensions like .bin, specify file_type explicitly.

Module contents

class mth5.io.Collection(file_path=None, **kwargs)[source]

Bases: object

A general collection class to keep track of files with methods to create runs and run ids.

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:
  • df (pandas.DataFrame) – dataframe with file information

  • zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

property file_path

Path object to file directory

get_empty_entry_dict()[source]
Returns:

an empty dictionary with the proper keys for an entry into a dataframe

Return type:

dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:

extension (string or list) – file extension(s)

Returns:

list of files in the file_path with the given extensions

Return type:

list of Path objects

get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:
  • max_hours (TYPE, optional) – DESCRIPTION, defaults to 6

  • min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:
  • sample_rates – list of sample rates to read, defaults to [150, 24000]

  • run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:
>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

  • survey: survey id

  • station: station id

  • run: run id

  • start: start time UTC

  • end: end time UTC

  • channel_id: channel id or list of channel id’s in file

  • component: channel component or list of components in file

  • fn: path to file

  • sample_rate: sample rate in samples per second

  • file_size: file size in bytes

  • n_samples: number of samples in file

  • sequence_number: sequence number of the file

  • instrument_id: instrument id

  • calibration_fn: calibration file

Parameters:
  • sample_rates (list, optional) – list of sample rates to process, defaults to None

  • run_name_zeros (int, optional) – number of zeros in run name, defaults to 4

  • calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

mth5.io.read_file(fn: str | Path | list[str | Path], file_type: str | None = None, **kwargs: Any) Any[source]

Universal reader for magnetotelluric time series data files.

Automatically detects the file type based on extension and dispatches to the appropriate reader function. Supports both single files and lists of files for multi-file formats.

Parameters:
  • fn (str, Path, or list of str/Path) – Full path(s) to data file(s) to be read. For multi-file formats, pass a list of file paths.

  • file_type (str, optional) – Specific reader type to use if file extension is ambiguous. Must be one of the keys in the readers registry, by default None

  • **kwargs (dict) – Additional keyword arguments passed to the specific reader function. Supported arguments depend on the file format and reader.

Returns:

Time series object containing the data: - mth5.timeseries.MTTS for single channel data - mth5.timeseries.RunTS for multi-channel run data

Return type:

MTTS or RunTS

Raises:
  • IOError – If any specified file does not exist

  • KeyError – If the specified file_type is not supported

  • ValueError – If no reader can be found for the file extension

Examples

Read a single Z3D file (auto-detected)

>>> data = read_file("/path/to/station_001.z3d")
>>> print(type(data))  # <class 'mth5.timeseries.ChannelTS'>

Read with explicit file type for ambiguous extensions

>>> data = read_file("/path/to/data.bin", file_type="nims")
>>> print(data.n_channels)

Read multiple files for a multi-file format

>>> files = ["/path/to/file1.asc", "/path/to/file2.asc"]
>>> run_data = read_file(files, sample_rate=1.0)

Notes

Supported file types and extensions: - zen: .z3d (Zonge Z3D files) - nims: .bin, .bnn (USGS NIMS files) - usgs_ascii: .asc, .zip (USGS ASCII format) - miniseed: .miniseed, .ms, .mseed (miniSEED format) - lemi424: .txt (LEMI-424 format) - phoenix: .bin, .td_30, .td_150, .td_24k (Phoenix formats) - metronix: .atss (Metronix ADU format)

For ambiguous extensions like .bin, specify file_type explicitly.