mth5.io.collection
Phoenix file collection
Created on Thu Aug 4 16:48:47 2022
@author: jpeacock
Classes
A general collection class to keep track of files with methods to create |
Module Contents
- class mth5.io.collection.Collection(file_path=None, **kwargs)[source]
A general collection class to keep track of files with methods to create runs and run ids.
- get_empty_entry_dict()[source]
- Returns:
an empty dictionary with the proper keys for an entry into a dataframe
- Return type:
dict
- get_files(extension)[source]
Get files with given extension. Uses Pathlib.Path.rglob, so it finds all files within the file_path by searching all sub-directories.
- Parameters:
extension (string or list) – file extension(s)
- Returns:
list of files in the file_path with the given extensions
- Return type:
list of Path objects
- to_dataframe(sample_rates=None, run_name_zeros=4, calibration_path=None)[source]
Get a data frame of the file summary with column names:
survey: survey id
station: station id
run: run id
start: start time UTC
end: end time UTC
channel_id: channel id or list of channel id’s in file
component: channel component or list of components in file
fn: path to file
sample_rate: sample rate in samples per second
file_size: file size in bytes
n_samples: number of samples in file
sequence_number: sequence number of the file
instrument_id: instrument id
calibration_fn: calibration file
- Parameters:
sample_rates (list, optional) – list of sample rates to process, defaults to None
run_name_zeros (int, optional) – number of zeros in run name, defaults to 4
calibration_path (str or Path, optional) – path to calibration files, defaults to None
- Returns:
summary table of file names,
- Return type:
pandas.DataFrame
- assign_run_names(df, zeros=4)[source]
Assign run names to a dataframe. This is a base method that should be overridden by subclasses.
- Parameters:
df (pandas.DataFrame) – dataframe with file information
zeros (int, optional) – number of zeros in run name, defaults to 4
- Returns:
dataframe with run names assigned
- Return type:
pandas.DataFrame
- get_runs(sample_rates, run_name_zeros=4, calibration_path=None)[source]
Get a list of runs contained within the given folder. First the dataframe will be developed from which the runs are extracted.
For continous data all you need is the first file in the sequence. The reader will read in the entire sequence.
For segmented data it will only read in the given segment, which is slightly different from the original reader.
- Parameters:
sample_rates – list of sample rates to read, defaults to [150, 24000]
run_name_zeros (integer, optional) – Number of zeros in the run name, defaults to 4
- Returns:
List of run dataframes with only the first block of files
- Return type:
collections.OrderedDict- Example:
>>> from mth5.io.phoenix import PhoenixCollection >>> phx_collection = PhoenixCollection(r"/path/to/station") >>> run_dict = phx_collection.get_runs(sample_rates=[150, 24000])