mth5.io package

Subpackages

Submodules

mth5.io.collection module

mth5.io.conversion module

Convert MTH5 to other formats

MTH5 -> miniSEED + StationXML

class mth5.io.conversion.MTH5ToMiniSEEDStationXML(mth5_path: str | Path | None = None, save_path: str | Path | None = None, network_code: str = 'ZU', use_runs_with_data_only: bool = True, **kwargs: Any)[source]

Bases: object

Convert MTH5 files to miniSEED and StationXML formats.

This class provides functionality to convert magnetotelluric data stored in MTH5 format to industry-standard miniSEED time series files and StationXML metadata files for data exchange and archival purposes.

Parameters:

mth5_path (str, Path, or None, default None) – Path to the input MTH5 file to be converted
save_path (str, Path, or None, default None) – Directory path where output files will be saved. If None, uses the parent directory of mth5_path
network_code (str, default "ZU") – Two-character FDSN network code for the output files
use_runs_with_data_only (bool, default True) – If True, only process runs that contain actual time series data
**kwargs (dict) – Additional keyword arguments to set as instance attributes

mth5_path[source]

Path to the MTH5 input file

Type:: Path or None

save_path[source]

Directory where output files are saved

Type:: Path

network_code[source]

FDSN network code for output files

Type:: str

use_runs_with_data_only[source]

Flag to process only runs with data

Type:: bool

encoding[source]

Encoding format for miniSEED files

Type:: str or None

Examples

>>> converter = MTH5ToMiniSEEDStationXML(
...     mth5_path="/path/to/data.h5",
...     network_code="MT",
...     save_path="/path/to/output"
... )
>>> xml_file, mseed_files = converter.convert_mth5_to_ms_stationxml()

classmethod convert_mth5_to_ms_stationxml(mth5_path: str | Path, save_path: str | Path | None = None, network_code: str = 'ZU', use_runs_with_data_only: bool = True, **kwargs: Any) → tuple[Path, list[Path]][source]

Convert an MTH5 file to miniSEED and StationXML formats.

Class method that provides a convenient interface to convert MTH5 data to standard seismological formats for data exchange and archival.

Parameters:

mth5_path (str or Path) – Path to the input MTH5 file to be converted
save_path (str, Path, or None, default None) – Directory where output files will be saved. If None, uses the parent directory of mth5_path
network_code (str, default "ZU") – Two-character FDSN network code for output files
use_runs_with_data_only (bool, default True) – If True, only process runs containing actual time series data
**kwargs (dict) – Additional keyword arguments passed to converter initialization

Returns:

Tuple containing: - Path to the generated StationXML file - List of paths to generated miniSEED files (one per day per channel)

Return type:

tuple[Path, list[Path]]

Examples

>>> xml_file, mseed_files = MTH5ToMiniSEEDStationXML.convert_mth5_to_ms_stationxml(
...     "/path/to/data.h5",
...     network_code="MT",
...     save_path="/output/directory"
... )
>>> print(f"Created {len(mseed_files)} miniSEED files and {xml_file}")

property mth5_path: Path | None[source]

Path to the MTH5 input file.

Returns:: Path to the MTH5 file to be converted, or None if not set.
Return type:: Path or None

property network_code: str[source]

Two-character FDSN network code.

Returns:: Alphanumeric string of exactly 2 characters as required by FDSN DMC.
Return type:: str

property save_path: Path[source]

Directory path where output files will be saved.

Returns:: Directory path for saving miniSEED and StationXML files.
Return type:: Path

split_ms_to_days(streams, save_path: Path, encoding: str) → list[Path][source]

Split miniSEED traces into daily files.

Splits continuous time series traces into separate files for each day to conform with standard seismological data archiving practices.

Parameters:

streams (obspy.Stream) – Stream object containing traces to be split by day
save_path (Path) – Directory where daily miniSEED files will be saved
encoding (str) – Data encoding format for miniSEED files (e.g., ‘INT32’, ‘FLOAT64’)

Returns:

List of paths to the generated daily miniSEED files

Return type:

list[Path]

Notes

Files are named using the pattern: {network}_{station}_{location}_{channel}_{YYYY_MM_DDTHH_MM_SS}.mseed

mth5.io.conversion.get_encoding(run_ts) → str[source]

Determine consistent data encoding for miniSEED files across channels.

Analyzes data types across all channels in a run and selects a median encoding to ensure compatibility in miniSEED file generation.

Parameters:: run_ts (RunTS) – Run time series object containing multiple channels of data
Returns:: String identifier for miniSEED encoding format (e.g., ‘INT32’, ‘FLOAT64’)
Return type:: str

Notes

Uses median data type to handle mixed precision datasets. Automatically converts INT64 to INT32 for miniSEED compatibility since some readers don’t support 64-bit integers.

Examples

>>> encoding = get_encoding(run_timeseries)
>>> print(f"Selected encoding: {encoding}")

mth5.io.conversion.split_miniseed_by_day(input_file: str | Path) → list[Path][source]

Split an existing miniSEED file into daily files.

Utility function to split a multi-day miniSEED file into separate files for each calendar day, following standard seismological archiving practices.

Parameters:: input_file (str or Path) – Path to the input miniSEED file to be split
Returns:: List of paths to the generated daily miniSEED files
Return type:: list[Path]

Notes

Output files are named using the pattern: {network}.{station}.{location}.{channel}.{YYYY-MM-DD}.mseed

Files are saved in the same directory as the input file.

Examples

>>> daily_files = split_miniseed_by_day("/path/to/continuous.mseed")
>>> print(f"Created {len(daily_files)} daily files")

mth5.io.reader module

Module contents

class mth5.io.Collection(file_path=None, **kwargs)[source]

Bases: object

A general collection class to keep track of files with methods to create runs and run ids.

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:

df (pandas.DataFrame) – dataframe with file information
zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

property file_path: Path object to file directory

get_empty_entry_dict()[source]

Returns:: an empty dictionary with the proper keys for an entry into a dataframe
Return type:: dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:: extension (string or list) – file extension(s)
Returns:: list of files in the file_path with the given extensions
Return type:: list of Path objects

get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:

max_hours (TYPE, optional) – DESCRIPTION, defaults to 6
min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:

sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:

>>> from mt_io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

survey: survey id

station: station id

run: run id

start: start time UTC

end: end time UTC

channel_id: channel id or list of channel id’s in file

component: channel component or list of components in file

fn: path to file

sample_rate: sample rate in samples per second

file_size: file size in bytes

n_samples: number of samples in file

sequence_number: sequence number of the file

instrument_id: instrument id

calibration_fn: calibration file

Parameters:

sample_rates (list, optional) – list of sample rates to process, defaults to None
run_name_zeros (int, optional) – number of zeros in run name, defaults to 4
calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

mth5.io.read_file(fn: str | Path | list[str | Path], file_type: str | None = None, **kwargs: Any) → Any[source]

Universal reader for magnetotelluric time series data files.

Automatically detects the file type based on extension and dispatches to the appropriate reader function. Supports both single files and lists of files for multi-file formats.

Parameters:

fn (str, Path, or list of str/Path) – Full path(s) to data file(s) to be read. For multi-file formats, pass a list of file paths.
file_type (str, optional) – Specific reader type to use if file extension is ambiguous. Must be one of the keys in the readers registry, by default None
**kwargs (dict) – Additional keyword arguments passed to the specific reader function. Supported arguments depend on the file format and reader.

Returns:

Time series object containing the data: - mt_timeseries.MTTS for single channel data - mt_timeseries.RunTS for multi-channel run data

Return type:

MTTS or RunTS

Raises:

IOError – If any specified file does not exist
KeyError – If the specified file_type is not supported
ValueError – If no reader can be found for the file extension

Examples

Read a single Z3D file (auto-detected)

>>> data = read_file("/path/to/station_001.z3d")
>>> print(type(data))  # <class 'mt_timeseries.ChannelTS'>

Read with explicit file type for ambiguous extensions

>>> data = read_file("/path/to/data.bin", file_type="nims")
>>> print(data.n_channels)

Read multiple files for a multi-file format

>>> files = ["/path/to/file1.asc", "/path/to/file2.asc"]
>>> run_data = read_file(files, sample_rate=1.0)

Notes

Supported file types and extensions: - zen: .z3d (Zonge Z3D files) - nims: .bin, .bnn (USGS NIMS files) - usgs_ascii: .asc, .zip (USGS ASCII format) - miniseed: .miniseed, .ms, .mseed (miniSEED format) - lemi424: .txt (LEMI-424 format) - phoenix: .bin, .td_30, .td_150, .td_24k (Phoenix formats) - metronix: .atss (Metronix ADU format)

For ambiguous extensions like .bin, specify file_type explicitly.