mth5.io

Submodules

Classes

Collection

A general collection class to keep track of files with methods to create

Functions

read_file(→ Any)

Universal reader for magnetotelluric time series data files.

Package Contents

class mth5.io.Collection(file_path=None, **kwargs)[source]

A general collection class to keep track of files with methods to create runs and run ids.

logger
property file_path

Path object to file directory

file_ext = '*'
get_empty_entry_dict()[source]
Returns:

an empty dictionary with the proper keys for an entry into a dataframe

Return type:

dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:

extension (string or list) – file extension(s)

Returns:

list of files in the file_path with the given extensions

Return type:

list of Path objects

to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

  • survey: survey id

  • station: station id

  • run: run id

  • start: start time UTC

  • end: end time UTC

  • channel_id: channel id or list of channel id’s in file

  • component: channel component or list of components in file

  • fn: path to file

  • sample_rate: sample rate in samples per second

  • file_size: file size in bytes

  • n_samples: number of samples in file

  • sequence_number: sequence number of the file

  • instrument_id: instrument id

  • calibration_fn: calibration file

Parameters:
  • sample_rates (list, optional) – list of sample rates to process, defaults to None

  • run_name_zeros (int, optional) – number of zeros in run name, defaults to 4

  • calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:
  • df (pandas.DataFrame) – dataframe with file information

  • zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:
  • sample_rates – list of sample rates to read, defaults to [150, 24000]

  • run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:
>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:
  • max_hours (TYPE, optional) – DESCRIPTION, defaults to 6

  • min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE

mth5.io.read_file(fn: str | pathlib.Path | list[str | pathlib.Path], file_type: str | None = None, **kwargs: Any) Any[source]

Universal reader for magnetotelluric time series data files.

Automatically detects the file type based on extension and dispatches to the appropriate reader function. Supports both single files and lists of files for multi-file formats.

Parameters:
  • fn (str, Path, or list of str/Path) – Full path(s) to data file(s) to be read. For multi-file formats, pass a list of file paths.

  • file_type (str, optional) – Specific reader type to use if file extension is ambiguous. Must be one of the keys in the readers registry, by default None

  • **kwargs (dict) – Additional keyword arguments passed to the specific reader function. Supported arguments depend on the file format and reader.

Returns:

Time series object containing the data: - mth5.timeseries.MTTS for single channel data - mth5.timeseries.RunTS for multi-channel run data

Return type:

MTTS or RunTS

Raises:
  • IOError – If any specified file does not exist

  • KeyError – If the specified file_type is not supported

  • ValueError – If no reader can be found for the file extension

Examples

Read a single Z3D file (auto-detected)

>>> data = read_file("/path/to/station_001.z3d")
>>> print(type(data))  # <class 'mth5.timeseries.ChannelTS'>

Read with explicit file type for ambiguous extensions

>>> data = read_file("/path/to/data.bin", file_type="nims")
>>> print(data.n_channels)

Read multiple files for a multi-file format

>>> files = ["/path/to/file1.asc", "/path/to/file2.asc"]
>>> run_data = read_file(files, sample_rate=1.0)

Notes

Supported file types and extensions: - zen: .z3d (Zonge Z3D files) - nims: .bin, .bnn (USGS NIMS files) - usgs_ascii: .asc, .zip (USGS ASCII format) - miniseed: .miniseed, .ms, .mseed (miniSEED format) - lemi424: .txt (LEMI-424 format) - phoenix: .bin, .td_30, .td_150, .td_24k (Phoenix formats) - metronix: .atss (Metronix ADU format)

For ambiguous extensions like .bin, specify file_type explicitly.