mth5.io
Submodules
Classes
A general collection class to keep track of files with methods to create |
Functions
|
Universal reader for magnetotelluric time series data files. |
Package Contents
- class mth5.io.Collection(file_path=None, **kwargs)[source]
A general collection class to keep track of files with methods to create runs and run ids.
- logger
- property file_path
Path object to file directory
- file_ext = '*'
- get_empty_entry_dict()[source]
- Returns:
an empty dictionary with the proper keys for an entry into a dataframe
- Return type:
dict
- get_files(extension)[source]
Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.
- Parameters:
extension (string or list) – file extension(s)
- Returns:
list of files in the file_path with the given extensions
- Return type:
list of Path objects
- to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]
Get a data frame of the file summary with column names:
survey: survey id
station: station id
run: run id
start: start time UTC
end: end time UTC
channel_id: channel id or list of channel id’s in file
component: channel component or list of components in file
fn: path to file
sample_rate: sample rate in samples per second
file_size: file size in bytes
n_samples: number of samples in file
sequence_number: sequence number of the file
instrument_id: instrument id
calibration_fn: calibration file
- Parameters:
sample_rates (list, optional) – list of sample rates to process, defaults to None
run_name_zeros (int, optional) – number of zeros in run name, defaults to 4
calibration_path (str or Path, optional) – path to calibration files, defaults to None
- Returns:
summary table of file names,
- Return type:
pandas.DataFrame
- assign_run_names(df, zeros=4)[source]
Assign run names to a dataframe. This is a base method that should be overridden by subclasses.
- Parameters:
df (pandas.DataFrame) – dataframe with file information
zeros (int, optional) – number of zeros in run name, defaults to 4
- Returns:
dataframe with run names assigned
- Return type:
pandas.DataFrame
- get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]
Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.
For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.
For segmented data it will only read in the given segment, which is slightly different from the original reader.
- Parameters:
sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4
- Returns:
List of run dataframes with only the first block of files
- Return type:
collections.OrderedDict- Example:
>>> from mth5.io.phoenix import PhoenixCollection >>> phx_collection = PhoenixCollection(r"/path/to/station") >>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
- mth5.io.read_file(fn: str | pathlib.Path | list[str | pathlib.Path], file_type: str | None = None, **kwargs: Any) Any[source]
Universal reader for magnetotelluric time series data files.
Automatically detects the file type based on extension and dispatches to the appropriate reader function. Supports both single files and lists of files for multi-file formats.
- Parameters:
fn (str, Path, or list of str/Path) – Full path(s) to data file(s) to be read. For multi-file formats, pass a list of file paths.
file_type (str, optional) – Specific reader type to use if file extension is ambiguous. Must be one of the keys in the readers registry, by default None
**kwargs (dict) – Additional keyword arguments passed to the specific reader function. Supported arguments depend on the file format and reader.
- Returns:
Time series object containing the data: -
mth5.timeseries.MTTSfor single channel data -mth5.timeseries.RunTSfor multi-channel run data- Return type:
MTTS or RunTS
- Raises:
IOError – If any specified file does not exist
KeyError – If the specified file_type is not supported
ValueError – If no reader can be found for the file extension
Examples
Read a single Z3D file (auto-detected)
>>> data = read_file("/path/to/station_001.z3d") >>> print(type(data)) # <class 'mth5.timeseries.ChannelTS'>
Read with explicit file type for ambiguous extensions
>>> data = read_file("/path/to/data.bin", file_type="nims") >>> print(data.n_channels)
Read multiple files for a multi-file format
>>> files = ["/path/to/file1.asc", "/path/to/file2.asc"] >>> run_data = read_file(files, sample_rate=1.0)
Notes
Supported file types and extensions: - zen: .z3d (Zonge Z3D files) - nims: .bin, .bnn (USGS NIMS files) - usgs_ascii: .asc, .zip (USGS ASCII format) - miniseed: .miniseed, .ms, .mseed (miniSEED format) - lemi424: .txt (LEMI-424 format) - phoenix: .bin, .td_30, .td_150, .td_24k (Phoenix formats) - metronix: .atss (Metronix ADU format)
For ambiguous extensions like .bin, specify file_type explicitly.