mth5.io package

Subpackages

Submodules

mth5.io.collection module

Phoenix file collection

Created on Thu Aug 4 16:48:47 2022

@author: jpeacock

class mth5.io.collection.Collection(file_path=None, **kwargs)[source]

Bases: object

A general collection class to keep track of files with methods to create runs and run ids.

assign_run_names()[source]

Returns: DESCRIPTION
Return type: TYPE

property file_path: Path object to file directory

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters: extension (string or list) – file extension(s)
Returns: list of files in the file_path with the given extensions
Return type: list of Path objects

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters

sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns

List of run dataframes with only the first block of files

Return type

collections.OrderedDict

Example

>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

to_dataframe()[source]

Get a data frame of the file summary with column names:

survey: survey id

station: station id

run: run id

start: start time UTC

end: end time UTC

channel_id: channel id or list of channel id’s in file

component: channel component or list of components in file

fn: path to file

sample_rate: sample rate in samples per second

file_size: file size in bytes

n_samples: number of samples in file

sequence_number: sequence number of the file

instrument_id: instrument id

calibration_fn: calibration file

Returns: summary table of file names,
Return type: TYPE

mth5.io.reader module

This is a utility function to get the appropriate reader for a given file type and return the appropriate object of mth5.timeseries

This setup to be like plugins but a hack cause I did not find the time to set this up properly as a true plugin.

If you are writing your own reader you need the following structure:

Class object that will read the given file

a reader function that is read_{file_type}, for instance read_nims

the return value is a mth5.timeseries.MTTS or mth5.timeseries.RunTS object and any extra metadata in the form of a dictionary with keys as {level.attribute}.

class NewFile
    def __init__(self, fn):
        self.fn = fn

    def read_header(self):
        return header_information

    def read_newfile(self):
        ex, ey, hx, hy, hz = read_in_channels_as_MTTS
        return RunTS([ex, ey, hx, hy, hz])

def read_newfile(fn):
    new_file_obj = NewFile(fn)
    run_obj = new_file_obj.read_newfile()

    return run_obj, extra_metadata

Then add your reader to the reader dictionary so that those files can be read.

Module contents

class mth5.io.Collection(file_path=None, **kwargs)[source]

Bases: object

A general collection class to keep track of files with methods to create runs and run ids.

assign_run_names()[source]

Returns: DESCRIPTION
Return type: TYPE

property file_path: Path object to file directory

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters: extension (string or list) – file extension(s)
Returns: list of files in the file_path with the given extensions
Return type: list of Path objects

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters

sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns

List of run dataframes with only the first block of files

Return type

collections.OrderedDict

Example

>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

to_dataframe()[source]

Get a data frame of the file summary with column names:

survey: survey id

station: station id

run: run id

start: start time UTC

end: end time UTC

channel_id: channel id or list of channel id’s in file

component: channel component or list of components in file

fn: path to file

sample_rate: sample rate in samples per second

file_size: file size in bytes

n_samples: number of samples in file

sequence_number: sequence number of the file

instrument_id: instrument id

calibration_fn: calibration file

Returns: summary table of file names,
Return type: TYPE

mth5.io.read_file(fn, file_type=None, **kwargs)[source]

This is the universal reader for MT time series. This will pick out the proper reader given the file type or extension. Keyworkd arguments will depend on the reader and file type.

Parameters

fn (string or pathlib.Path) – full path to file
file_type (string) – a specific file time if the extension is ambiguous.

Returns

channel or run time series object

Return type

mth5.timeseries.MTTS or mth5.timeseries.RunTS