mth5.io.lemi package

Submodules

mth5.io.lemi.lemi424 module

Created on Tue May 11 15:31:31 2021

copyright:

Jared Peacock (jpeacock@usgs.gov)

license:

MIT

class mth5.io.lemi.lemi424.LEMI424(fn: str | Path | None = None, **kwargs: Any)[source]

Bases: object

Read and process LEMI424 magnetotelluric data files.

This is a placeholder until IRIS finalizes their reader.

Parameters:
  • fn (str or pathlib.Path, optional) – Full path to LEMI424 file, by default None.

  • **kwargs (dict) – Additional keyword arguments for configuration.

sample_rate[source]

Sample rate of the file, default is 1.0.

Type:

float

chunk_size[source]

Chunk size for pandas to use, default is 8640.

Type:

int

file_column_names[source]

Column names of the LEMI424 file.

Type:

list of str

dtypes[source]

Data types for each column.

Type:

dict

data_column_names[source]

Same as file_column_names with an added column for date.

Type:

list of str

data[source]

The loaded data.

Type:

pd.DataFrame or None

Notes

LEMI424 File Column Names:

year, month, day, hour, minute, second, bx, by, bz, temperature_e, temperature_h, e1, e2, e3, e4, battery, elevation, latitude, lat_hemisphere, longitude, lon_hemisphere, n_satellites, gps_fix, time_diff

Data Column Names:

date, bx, by, bz, temperature_e, temperature_h, e1, e2, e3, e4, battery, elevation, latitude, lat_hemisphere, longitude, lon_hemisphere, n_satellites, gps_fix, time_diff

property data: DataFrame | None[source]

Data represented as a pandas DataFrame with data column names.

Returns:

The loaded data or None if no data is loaded.

Return type:

pd.DataFrame or None

property elevation: float | None[source]

Median elevation where data have been collected.

Returns:

Median elevation in meters or None if no data is loaded.

Return type:

float or None

property end: MTime | None[source]

End time of data collection in the LEMI424 file.

Returns:

End time or None if no data is loaded.

Return type:

MTime or None

property file_size: int | None[source]

Size of file in bytes.

Returns:

File size in bytes or None if no file is set.

Return type:

int or None

property fn: Path | None[source]

Full path to LEMI424 file.

Returns:

Path to the file or None if not set.

Return type:

pathlib.Path or None

property gps_lock: Any | None[source]

GPS lock status array.

Returns:

GPS fix values or None if no data is loaded.

Return type:

numpy.ndarray or None

property latitude: float | None[source]

Median latitude where data have been collected.

Returns:

Median latitude in degrees or None if no data is loaded.

Return type:

float or None

property longitude: float | None[source]

Median longitude where data have been collected.

Returns:

Median longitude in degrees or None if no data is loaded.

Return type:

float or None

property n_samples: int | None[source]

Number of samples in the file.

Returns:

Number of samples or None if no data/file available.

Return type:

int or None

read(fn: str | Path | None = None, fast: bool = True) None[source]

Read a LEMI424 file using pandas.

The fast way will read in the first and last line to get the start and end time to make a time index. Then it will read in the data skipping parsing the date time columns. It will check to make sure the expected amount of points are correct. If not then it will read in the slower way which uses the date time parser to ensure any time gaps are respected.

Parameters:
  • fn (str, pathlib.Path, or None, optional) – Full path to file. Uses LEMI424.fn if not provided, by default None.

  • fast (bool, optional) – Read the fast way (True) or not (False), by default True.

Raises:

IOError – If file cannot be found.

read_calibration(fn: str | Path) FrequencyResponseTableFilter[source]

Read a LEMI424 calibration file.

Calibration files are assumed to be JSON files with the following format: {

“Calibration”: {

“gain”: float, “Freq”: [float], “Re”: [float], “Im”: [float]

}

}

Parameters:

fn (str or pathlib.Path) – Full path to calibration file.

Returns:

Calibration filter object.

Return type:

mt_metadata.timeseries.filters.FrequencyResponseTableFilter

read_metadata() None[source]

Read only first and last rows to get important metadata.

This method is used to extract essential metadata from the collection without loading the entire dataset.

property run_metadata: Run[source]

Run metadata as mt_metadata.timeseries.Run object.

Returns:

Run metadata object.

Return type:

mt_metadata.timeseries.Run

property start: MTime | None[source]

Start time of data collection in the LEMI424 file.

Returns:

Start time or None if no data is loaded.

Return type:

MTime or None

property station_metadata: Station[source]

Station metadata as mt_metadata.timeseries.Station object.

Returns:

Station metadata object.

Return type:

mt_metadata.timeseries.Station

to_run_ts(fn: str | Path | None = None, e_channels: list[str] = ['e1', 'e2'], calibration_dict: dict | None = None) RunTS[source]

Create a RunTS object from the data.

Parameters:
  • fn (str, pathlib.Path, or None, optional) – Full path to file. Will use LEMI424.fn if None, by default None.

  • e_channels (list of str, optional) – Column names for the electric channels to use, by default [“e1”, “e2”].

  • calibration_dict (dict, optional) – Calibration dictionary to apply to the data, by default {}. Keys are the channel names and values are the calibration file path. The file path is assumed to be in the format lemi-{component}.sr.json.

Returns:

RunTS object containing the data.

Return type:

mth5.timeseries.RunTS

mth5.io.lemi.lemi424.lemi_date_parser(year: int, month: int, day: int, hour: int, minute: int, second: int) Series[source]

Combine the date-time columns that are output by LEMI into a single column.

Assumes UTC timezone.

Parameters:
  • year (int) – Year value.

  • month (int) – Month value (1-12).

  • day (int) – Day of the month (1-31).

  • hour (int) – Hour in 24-hour format (0-23).

  • minute (int) – Minutes in the hour (0-59).

  • second (int) – Seconds in the minute (0-59).

Returns:

Combined date-time as a pandas DatetimeIndex.

Return type:

pd.DatetimeIndex

mth5.io.lemi.lemi424.lemi_hemisphere_parser(hemisphere: str) int[source]

Convert hemisphere into a value [-1, 1].

Assumes the prime meridian is 0.

Parameters:

hemisphere (str) – Hemisphere string. Valid values are ‘N’, ‘S’, ‘E’, ‘W’.

Returns:

Unity with a sign for the given hemisphere. Returns -1 for ‘S’ or ‘W’, 1 for ‘N’ or ‘E’.

Return type:

int

mth5.io.lemi.lemi424.lemi_position_parser(position: float) float[source]

Parse LEMI location strings into decimal degrees.

Uses the hemisphere for the sign.

Notes

The format of the location is odd in that it is multiplied by 100 within the LEMI to provide a single floating point value that includes the degrees and decimal degrees –> {degrees}{degrees[mm.ss]}. For example 40.50166 would be represented as 4030.1.

Parameters:

position (float) – LEMI position value to parse.

Returns:

Decimal degrees position.

Return type:

float

mth5.io.lemi.lemi424.read_lemi424(fn: str | Path | list[str | Path], e_channels: list[str] = ['e1', 'e2'], fast: bool = True, calibration_dict: dict | None = None) RunTS[source]

Read a LEMI 424 TXT file.

Parameters:
  • fn (str or pathlib.Path) – Input file name.

  • e_channels (list of str, optional) – A list of electric channels to read, by default [“e1”, “e2”].

  • fast (bool, optional) – Use fast reading method, by default True.

  • calibration_dict (dict, optional) – Calibration dictionary to apply to the data, by default None. Keys are the channel names and values are the calibration file path.

Returns:

A RunTS object with appropriate metadata.

Return type:

mth5.timeseries.RunTS

mth5.io.lemi.lemi_collection module

LEMI 424 Collection

Collection of TXT files combined into runs

Created on Wed Aug 31 10:32:44 2022

@author: jpeacock

class mth5.io.lemi.lemi_collection.LEMICollection(file_path: str | Path | None = None, file_ext: List[str] | None = None, **kwargs)[source]

Bases: Collection

Collection of LEMI 424 files into runs based on start and end times.

Will assign the run name as ‘sr1_{index:0{zeros}}’ –> ‘sr1_0001’ for zeros = 4.

Notes

This class assumes that the given file path contains a single LEMI station. If you want to do multiple stations merge the returned data frames.

LEMI data comes with little metadata about the station or survey, therefore you should assign station_id and survey_id.

Parameters:
  • file_path (str or pathlib.Path, optional) – Full path to single station LEMI424 directory, by default None

  • file_ext (list of str, optional) – Extension of LEMI424 files, by default [“txt”, “TXT”]

  • **kwargs – Additional keyword arguments passed to parent Collection class

station_id[source]

Station identification string, defaults to “mt001”

Type:

str

survey_id[source]

Survey identification string, defaults to “mt”

Type:

str

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection(r"/path/to/single/lemi/station")
>>> lc.station_id = "mt001"
>>> lc.survey_id = "test_survey"
>>> run_dict = lc.get_runs(1)
assign_run_names(df: DataFrame, zeros: int = 4) DataFrame[source]

Assign run names based on start and end times.

Checks if a file has the same start time as the last end time. Run names are assigned as sr{sample_rate}_{run_number:0{zeros}}.

Parameters:
  • df (pd.DataFrame) – DataFrame with the appropriate columns

  • zeros (int, optional) – Number of zeros in run name, by default 4

Returns:

DataFrame with run names assigned

Return type:

pd.DataFrame

get_calibrations(calibration_path: str | Path) dict[source]

Get calibration dictionary for LEMI424 files. This assumes that the calibrations files are in JSON format and named as ‘LEMI-424-<component>.json’

Parameters:

calibration_path (str or pathlib.Path) – Path to calibration files

Returns:

Calibration dictionary for LEMI424 files

Return type:

dict

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection("/path/to/single/lemi/station")
>>> cal_dict = lc.get_calibrations(Path("/path/to/calibrations"))
to_dataframe(sample_rates: int | List[int] | None = None, run_name_zeros: int = 4, calibration_path: str | Path | None = None) DataFrame[source]

Create a data frame of each TXT file in a given directory.

Notes

This assumes the given directory contains a single station

Parameters:
  • sample_rates (int or list of int, optional) – Sample rate to get, will always be 1 for LEMI data, by default [1]

  • run_name_zeros (int, optional) – Number of zeros to assign to the run name, by default 4

  • calibration_path (str or pathlib.Path, optional) – Path to calibration files, by default None

Returns:

DataFrame with information of each TXT file in the given directory

Return type:

pd.DataFrame

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection("/path/to/single/lemi/station")
>>> lemi_df = lc.to_dataframe()

Module contents

class mth5.io.lemi.LEMI424(fn: str | Path | None = None, **kwargs: Any)[source]

Bases: object

Read and process LEMI424 magnetotelluric data files.

This is a placeholder until IRIS finalizes their reader.

Parameters:
  • fn (str or pathlib.Path, optional) – Full path to LEMI424 file, by default None.

  • **kwargs (dict) – Additional keyword arguments for configuration.

sample_rate

Sample rate of the file, default is 1.0.

Type:

float

chunk_size

Chunk size for pandas to use, default is 8640.

Type:

int

file_column_names

Column names of the LEMI424 file.

Type:

list of str

dtypes

Data types for each column.

Type:

dict

data_column_names

Same as file_column_names with an added column for date.

Type:

list of str

data

The loaded data.

Type:

pd.DataFrame or None

Notes

LEMI424 File Column Names:

year, month, day, hour, minute, second, bx, by, bz, temperature_e, temperature_h, e1, e2, e3, e4, battery, elevation, latitude, lat_hemisphere, longitude, lon_hemisphere, n_satellites, gps_fix, time_diff

Data Column Names:

date, bx, by, bz, temperature_e, temperature_h, e1, e2, e3, e4, battery, elevation, latitude, lat_hemisphere, longitude, lon_hemisphere, n_satellites, gps_fix, time_diff

property data: DataFrame | None

Data represented as a pandas DataFrame with data column names.

Returns:

The loaded data or None if no data is loaded.

Return type:

pd.DataFrame or None

property elevation: float | None

Median elevation where data have been collected.

Returns:

Median elevation in meters or None if no data is loaded.

Return type:

float or None

property end: MTime | None

End time of data collection in the LEMI424 file.

Returns:

End time or None if no data is loaded.

Return type:

MTime or None

property file_size: int | None

Size of file in bytes.

Returns:

File size in bytes or None if no file is set.

Return type:

int or None

property fn: Path | None

Full path to LEMI424 file.

Returns:

Path to the file or None if not set.

Return type:

pathlib.Path or None

property gps_lock: Any | None

GPS lock status array.

Returns:

GPS fix values or None if no data is loaded.

Return type:

numpy.ndarray or None

property latitude: float | None

Median latitude where data have been collected.

Returns:

Median latitude in degrees or None if no data is loaded.

Return type:

float or None

property longitude: float | None

Median longitude where data have been collected.

Returns:

Median longitude in degrees or None if no data is loaded.

Return type:

float or None

property n_samples: int | None

Number of samples in the file.

Returns:

Number of samples or None if no data/file available.

Return type:

int or None

read(fn: str | Path | None = None, fast: bool = True) None[source]

Read a LEMI424 file using pandas.

The fast way will read in the first and last line to get the start and end time to make a time index. Then it will read in the data skipping parsing the date time columns. It will check to make sure the expected amount of points are correct. If not then it will read in the slower way which uses the date time parser to ensure any time gaps are respected.

Parameters:
  • fn (str, pathlib.Path, or None, optional) – Full path to file. Uses LEMI424.fn if not provided, by default None.

  • fast (bool, optional) – Read the fast way (True) or not (False), by default True.

Raises:

IOError – If file cannot be found.

read_calibration(fn: str | Path) FrequencyResponseTableFilter[source]

Read a LEMI424 calibration file.

Calibration files are assumed to be JSON files with the following format: {

“Calibration”: {

“gain”: float, “Freq”: [float], “Re”: [float], “Im”: [float]

}

}

Parameters:

fn (str or pathlib.Path) – Full path to calibration file.

Returns:

Calibration filter object.

Return type:

mt_metadata.timeseries.filters.FrequencyResponseTableFilter

read_metadata() None[source]

Read only first and last rows to get important metadata.

This method is used to extract essential metadata from the collection without loading the entire dataset.

property run_metadata: Run

Run metadata as mt_metadata.timeseries.Run object.

Returns:

Run metadata object.

Return type:

mt_metadata.timeseries.Run

property start: MTime | None

Start time of data collection in the LEMI424 file.

Returns:

Start time or None if no data is loaded.

Return type:

MTime or None

property station_metadata: Station

Station metadata as mt_metadata.timeseries.Station object.

Returns:

Station metadata object.

Return type:

mt_metadata.timeseries.Station

to_run_ts(fn: str | Path | None = None, e_channels: list[str] = ['e1', 'e2'], calibration_dict: dict | None = None) RunTS[source]

Create a RunTS object from the data.

Parameters:
  • fn (str, pathlib.Path, or None, optional) – Full path to file. Will use LEMI424.fn if None, by default None.

  • e_channels (list of str, optional) – Column names for the electric channels to use, by default [“e1”, “e2”].

  • calibration_dict (dict, optional) – Calibration dictionary to apply to the data, by default {}. Keys are the channel names and values are the calibration file path. The file path is assumed to be in the format lemi-{component}.sr.json.

Returns:

RunTS object containing the data.

Return type:

mth5.timeseries.RunTS

class mth5.io.lemi.LEMICollection(file_path: str | Path | None = None, file_ext: List[str] | None = None, **kwargs)[source]

Bases: Collection

Collection of LEMI 424 files into runs based on start and end times.

Will assign the run name as ‘sr1_{index:0{zeros}}’ –> ‘sr1_0001’ for zeros = 4.

Notes

This class assumes that the given file path contains a single LEMI station. If you want to do multiple stations merge the returned data frames.

LEMI data comes with little metadata about the station or survey, therefore you should assign station_id and survey_id.

Parameters:
  • file_path (str or pathlib.Path, optional) – Full path to single station LEMI424 directory, by default None

  • file_ext (list of str, optional) – Extension of LEMI424 files, by default [“txt”, “TXT”]

  • **kwargs – Additional keyword arguments passed to parent Collection class

station_id

Station identification string, defaults to “mt001”

Type:

str

survey_id

Survey identification string, defaults to “mt”

Type:

str

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection(r"/path/to/single/lemi/station")
>>> lc.station_id = "mt001"
>>> lc.survey_id = "test_survey"
>>> run_dict = lc.get_runs(1)
assign_run_names(df: DataFrame, zeros: int = 4) DataFrame[source]

Assign run names based on start and end times.

Checks if a file has the same start time as the last end time. Run names are assigned as sr{sample_rate}_{run_number:0{zeros}}.

Parameters:
  • df (pd.DataFrame) – DataFrame with the appropriate columns

  • zeros (int, optional) – Number of zeros in run name, by default 4

Returns:

DataFrame with run names assigned

Return type:

pd.DataFrame

get_calibrations(calibration_path: str | Path) dict[source]

Get calibration dictionary for LEMI424 files. This assumes that the calibrations files are in JSON format and named as ‘LEMI-424-<component>.json’

Parameters:

calibration_path (str or pathlib.Path) – Path to calibration files

Returns:

Calibration dictionary for LEMI424 files

Return type:

dict

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection("/path/to/single/lemi/station")
>>> cal_dict = lc.get_calibrations(Path("/path/to/calibrations"))
to_dataframe(sample_rates: int | List[int] | None = None, run_name_zeros: int = 4, calibration_path: str | Path | None = None) DataFrame[source]

Create a data frame of each TXT file in a given directory.

Notes

This assumes the given directory contains a single station

Parameters:
  • sample_rates (int or list of int, optional) – Sample rate to get, will always be 1 for LEMI data, by default [1]

  • run_name_zeros (int, optional) – Number of zeros to assign to the run name, by default 4

  • calibration_path (str or pathlib.Path, optional) – Path to calibration files, by default None

Returns:

DataFrame with information of each TXT file in the given directory

Return type:

pd.DataFrame

Examples

>>> from mth5.io.lemi import LEMICollection
>>> lc = LEMICollection("/path/to/single/lemi/station")
>>> lemi_df = lc.to_dataframe()
mth5.io.lemi.read_lemi424(fn: str | Path | list[str | Path], e_channels: list[str] = ['e1', 'e2'], fast: bool = True, calibration_dict: dict | None = None) RunTS[source]

Read a LEMI 424 TXT file.

Parameters:
  • fn (str or pathlib.Path) – Input file name.

  • e_channels (list of str, optional) – A list of electric channels to read, by default [“e1”, “e2”].

  • fast (bool, optional) – Use fast reading method, by default True.

  • calibration_dict (dict, optional) – Calibration dictionary to apply to the data, by default None. Keys are the channel names and values are the calibration file path.

Returns:

A RunTS object with appropriate metadata.

Return type:

mth5.timeseries.RunTS