mth5.io.collection

Phoenix file collection

Created on Thu Aug 4 16:48:47 2022

@author: jpeacock

Classes

Collection

A general collection class to keep track of files with methods to create

Module Contents

class mth5.io.collection.Collection(file_path=None, **kwargs)[source]

A general collection class to keep track of files with methods to create runs and run ids.

logger[source]

property file_path[source]: Path object to file directory

file_ext = '*'[source]

get_empty_entry_dict()[source]

Returns:: an empty dictionary with the proper keys for an entry into a dataframe
Return type:: dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:: extension (string or list) – file extension(s)
Returns:: list of files in the file_path with the given extensions
Return type:: list of Path objects

to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

survey: survey id

station: station id

run: run id

start: start time UTC

end: end time UTC

channel_id: channel id or list of channel id’s in file

component: channel component or list of components in file

fn: path to file

sample_rate: sample rate in samples per second

file_size: file size in bytes

n_samples: number of samples in file

sequence_number: sequence number of the file

instrument_id: instrument id

calibration_fn: calibration file

Parameters:

sample_rates (list, optional) – list of sample rates to process, defaults to None
run_name_zeros (int, optional) – number of zeros in run name, defaults to 4
calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:

df (pandas.DataFrame) – dataframe with file information
zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:

sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:

>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:

max_hours (TYPE, optional) – DESCRIPTION, defaults to 6
min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE