mth5.processing.run_summary

This module contains the RunSummary class.

This is a helper class that summarizes the Runs in an mth5.

TODO: This class and methods could be replaced by methods in MTH5.

Functionality of RunSummary() 1. User can get a list of local_station options, which correspond to unique pairs of values: (survey, station)

2. User can see all possible ways of processing the data: - one list per (survey, station) pair in the run_summary

Some of the following functionalities may end up in KernelDataset: 3. User can select local_station -this can trigger a reduction of runs to only those that are from the local staion and simultaneous runs at other stations 4. Given a local station, a list of possible reference stations can be generated 5. Given a remote reference station, a list of all relevent runs, truncated to maximize coverage of the local station runs is generated 6. Given such a “restricted run list”, runs can be dropped 7. Time interval endpoints can be changed

Development Notes:

TODO: consider adding methods:

drop_runs_shorter_than”: removes short runs from summary
fill_gaps_by_time_interval”: allows runs to be merged if gaps between are short
fill_gaps_by_run_names”: allows runs to be merged if gaps between are short

TODO: Consider whether this should return a copy or modify in-place when querying the df.

Classes

RunSummary

Class to contain a run-summary table from one or more mth5s.

Functions

extract_run_summaries_from_mth5s(mth5_list[, ...])

Given a list of mth5's, iterate over them, extracting run_summaries and

Module Contents

class mth5.processing.run_summary.RunSummary(input_dict: dict | None | None = None, df: pandas.DataFrame | None | None = None)[source]

Class to contain a run-summary table from one or more mth5s.

WIP: For the full MMT case this may need modification to a channel based summary.

column_dtypes[source]

property df: pandas.DataFrame[source]: Df function.

clone()[source]: 2022-10-20: Cloning may be causing issues with extra instances of open h5 files …

from_mth5s(mth5_list) → list[source]: Iterates over mth5s in list and creates one big dataframe summarizing the runs

property mini_summary: pandas.DataFrame[source]: Shows the dataframe with only a few columns for readbility.

property print_mini_summary: str[source]: Calls minisummary through logger so it is formatted.

drop_no_data_rows() → bool[source]: Drops rows marked has_data = False and resets the index of self.df.

set_sample_rate(sample_rate: float, inplace: bool = False)[source]

Set the sample rate so that the run summary represents all runs for a single sample rate.

Parameters:

sample_rate (float)
inplace (bool, optional) – DESCRIPTION. By default, False.

Returns:

DESCRIPTION.

Return type:

TYPE

mth5.processing.run_summary.extract_run_summaries_from_mth5s(mth5_list, summary_type='run', deduplicate=True)[source]

Given a list of mth5’s, iterate over them, extracting run_summaries and merging into one big table.

Development Notes: ToDo: Move this method into mth5? or mth5_helpers? ToDo: Make this a class so that the __repr__ is a nice visual representation of the df, like what channel summary does in mth5 - 2022-05-28 Modified to allow this method to accept mth5 objects as well as the already supported types of pathlib.Path or str

In order to drop duplicates I used the solution here: https://stackoverflow.com/questions/43855462/pandas-drop-duplicates-method-not-working-on-dataframe-containing-lists

Parameters:

deduplicate (, defaults to True. : bool, optional) – By default, True.
mth5_list
mth5_paths (list) – Paths or strings that point to mth5s.
summary_type (string, optional) – One of [“channel”, “run”] “channel” returns concatenated channel summary, “run” returns concatenated run summary,. By default, “run”.
deduplicate

Returns:

super_summary – Given a list of mth5s, a dataframe of all available runs.

Return type:

pd.DataFrame