mth5.io.collection

Phoenix file collection

Created on Thu Aug 4 16:48:47 2022

@author: jpeacock

Classes

Collection

A general collection class to keep track of files with methods to create

Module Contents

class mth5.io.collection.Collection(file_path=None, **kwargs)[source]

A general collection class to keep track of files with methods to create runs and run ids.

logger[source]
property file_path[source]

Path object to file directory

file_ext = '*'[source]
get_empty_entry_dict()[source]
Returns:

an empty dictionary with the proper keys for an entry into a dataframe

Return type:

dict

get_files(extension)[source]

Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.

Parameters:

extension (string or list) – file extension(s)

Returns:

list of files in the file_path with the given extensions

Return type:

list of Path objects

to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]

Get a data frame of the file summary with column names:

  • survey: survey id

  • station: station id

  • run: run id

  • start: start time UTC

  • end: end time UTC

  • channel_id: channel id or list of channel id’s in file

  • component: channel component or list of components in file

  • fn: path to file

  • sample_rate: sample rate in samples per second

  • file_size: file size in bytes

  • n_samples: number of samples in file

  • sequence_number: sequence number of the file

  • instrument_id: instrument id

  • calibration_fn: calibration file

Parameters:
  • sample_rates (list, optional) – list of sample rates to process, defaults to None

  • run_name_zeros (int, optional) – number of zeros in run name, defaults to 4

  • calibration_path (str or Path, optional) – path to calibration files, defaults to None

Returns:

summary table of file names,

Return type:

pandas.DataFrame

assign_run_names(df, zeros=4)[source]

Assign run names to a dataframe. This is a base method that should be overridden by subclasses.

Parameters:
  • df (pandas.DataFrame) – dataframe with file information

  • zeros (int, optional) – number of zeros in run name, defaults to 4

Returns:

dataframe with run names assigned

Return type:

pandas.DataFrame

get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]

Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.

For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.

For segmented data it will only read in the given segment, which is slightly different from the original reader.

Parameters:
  • sample_rates – list of sample rates to read, defaults to [150, 24000]

  • run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4

Returns:

List of run dataframes with only the first block of files

Return type:

collections.OrderedDict

Example:
>>> from mth5.io.phoenix import PhoenixCollection
>>> phx_collection = PhoenixCollection(r"/path/to/station")
>>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
get_remote_reference_list(df, max_hours=6, min_hours=1.5)[source]

get remote reference pairs

Parameters:
  • max_hours (TYPE, optional) – DESCRIPTION, defaults to 6

  • min_hours (TYPE, optional) – DESCRIPTION, defaults to 1.5

Returns:

DESCRIPTION

Return type:

TYPE