mth5.io.metronix.metronix_collection

Metronix collection utilities for managing ATSS files.

This module provides classes for collecting and managing Metronix ATSS (Audio Time Series System) files and creating pandas DataFrames with metadata for processing workflows.

Classes

MetronixCollection: Collection class for managing Metronix ATSS files

Created on Fri Nov 22 13:22:44 2024

@author: jpeacock

Classes

MetronixCollection

Collection class for managing Metronix ATSS files.

Module Contents

class mth5.io.metronix.metronix_collection.MetronixCollection(file_path: str | pathlib.Path | None = None, **kwargs: Any)[source]

Bases: mth5.io.collection.Collection

Collection class for managing Metronix ATSS files.

This class extends the base Collection class to handle Metronix ATSS (Audio Time Series System) files and their associated JSON metadata files. It provides functionality to create pandas DataFrames with comprehensive metadata for processing workflows.

Parameters:

file_path (Union[str, Path, None], optional) – Path to directory containing Metronix ATSS files, by default None
**kwargs – Additional keyword arguments passed to parent Collection class

file_ext[source]

List of file extensions to search for ([“atss”])

Type:: list[str]

Examples

>>> from mth5.io.metronix import MetronixCollection
>>> collection = MetronixCollection("/path/to/metronix/files")
>>> df = collection.to_dataframe(sample_rates=[128, 256])

file_ext: list[str] = ['atss'][source]

to_dataframe(sample_rates: list[int] = [128], run_name_zeros: int = 0, calibration_path: str | pathlib.Path | None = None) → pandas.DataFrame[source]

Create DataFrame for Metronix timeseries ATSS + JSON file sets.

Processes all ATSS files in the collection directory, extracts metadata, and creates a comprehensive pandas DataFrame with information about each channel including timing, location, and instrument details.

Parameters:

sample_rates (list[int], optional) – List of sample rates to include in Hz, by default [128]
run_name_zeros (int, optional) – Number of zeros for zero-padding run names. If 0, run names are unchanged. If > 0, run names are formatted as ‘sr{sample_rate}_{run_number:0{zeros}d}’, by default 0
calibration_path (Union[str, Path, None], optional) – Path to calibration files (currently unused), by default None

Returns:

DataFrame with columns: - survey: Survey ID - station: Station ID - run: Run ID - start: Start time (datetime) - end: End time (datetime) - channel_id: Channel number - component: Component name (ex, ey, hx, hy, hz) - fn: File path - sample_rate: Sample rate in Hz - file_size: File size in bytes - n_samples: Number of samples - sequence_number: Sequence number (always 0) - dipole: Dipole length (always 0) - coil_number: Coil serial number (magnetic channels only) - latitude: Latitude in decimal degrees - longitude: Longitude in decimal degrees - elevation: Elevation in meters - instrument_id: Instrument/system number - calibration_fn: Calibration file path (always None)

Return type:

pd.DataFrame

Examples

>>> collection = MetronixCollection("/path/to/files")
>>> df = collection.to_dataframe(sample_rates=[128, 256])
>>> df = collection.to_dataframe(run_name_zeros=4)  # Zero-pad run names

assign_run_names(df: pandas.DataFrame, zeros: int = 0) → pandas.DataFrame[source]

Assign formatted run names based on sample rate and run number.

If zeros is 0, run names are unchanged. Otherwise, run names are formatted as ‘sr{sample_rate}_{run_number:0{zeros}d}’ where the run number is extracted from the original run name after the first underscore.

Parameters:

df (pd.DataFrame) – DataFrame containing run information with ‘run’ and ‘sample_rate’ columns
zeros (int, optional) – Number of zeros for zero-padding run numbers. If 0, run names are unchanged, by default 0

Returns:

DataFrame with updated run names

Return type:

pd.DataFrame

Examples

>>> df = pd.DataFrame({
...     'run': ['run_1', 'run_2'],
...     'sample_rate': [128, 256]
... })
>>> collection = MetronixCollection()
>>> result = collection.assign_run_names(df, zeros=3)
>>> print(result['run'].tolist())
['sr128_001', 'sr256_002']

Notes

The method expects run names to be in format ‘prefix_number’ where ‘number’ can be extracted and converted to an integer for formatting.