mth5 package

Subpackages

Submodules

mth5.helpers module

Helper functions for HDF5

Created on Tue Jun 2 12:37:50 2020

copyright: Jared Peacock (jpeacock@usgs.gov)
license: MIT

mth5.helpers.close_open_files()[source]

mth5.helpers.from_numpy_type(value)[source]

Need to make the attributes friendly with Numpy and HDF5.

For numbers and bool this is straight forward they are automatically mapped in h5py to a numpy type.

But for strings this can be a challenge, especially a list of strings.

HDF5 should only deal with ASCII characters or Unicode. No binary data is allowed.

mth5.helpers.get_tree(parent)[source]: Simple function to recursively print the contents of an hdf5 group :param parent: HDF5 (sub-)tree to print :type parent: h5py.Group

mth5.helpers.inherit_doc_string(cls)[source]

mth5.helpers.recursive_hdf5_tree(group, lines=[])[source]

mth5.helpers.to_numpy_type(value)[source]

Need to make the attributes friendly with Numpy and HDF5.

For numbers and bool this is straight forward they are automatically mapped in h5py to a numpy type.

But for strings this can be a challenge, especially a list of strings.

HDF5 should only deal with ASCII characters or Unicode. No binary data is allowed.

mth5.helpers.validate_compression(compression, level)[source]

validate that the input compression is supported.

Parameters

compression (string, [ 'lzf' | 'gzip' | 'szip' | None ]) – type of lossless compression
level (string for 'szip' or int for 'gzip') – compression level if supported

Returns

compression type

Return type

string

Returns

compressiong level

Return type

string for ‘szip’ or int for ‘gzip’

Raises

ValueError if comporession or level are not supported

Raises

TypeError if compression level is not a string

mth5.helpers.validate_name(name, pattern=None)[source]

Validate name

Parameters

name (TYPE) – DESCRIPTION
pattern (TYPE, optional) – DESCRIPTION, defaults to None

Returns

DESCRIPTION

Return type

TYPE

mth5.mth5 module

MTH5

MTH5 deals with reading and writing an MTH5 file, which are HDF5 files developed for magnetotelluric (MT) data. The code is based on h5py and therefor numpy. This is the simplest and we are not really dealing with large tables of data to warrant using pytables.

Created on Sun Dec 9 20:50:41 2018

copyright: Jared Peacock (jpeacock@usgs.gov)
license: MIT

class mth5.mth5.MTH5(filename=None, compression='gzip', compression_opts=4, shuffle=True, fletcher32=True, data_level=1, file_version='0.2.0')[source]

Bases: object

MTH5 is the main container for the HDF5 file format developed for MT data

It uses the metadata standards developled by the IRIS PASSCAL software group and defined in the metadata documentation.

MTH5 is built with h5py and therefore numpy. The structure follows the different levels of MT data collection:

For version 0.1.0:

Survey

Reports

Standards

Filters

Stations

Run

Channel

For version 0.2.0:

Experiment

Reports

Standards

Surveys

Reports

Standards

Filters

Stations

Run

-Channel

All timeseries data are stored as individual channels with the appropriate metadata defined for the given channel, i.e. electric, magnetic, auxiliary.

Each level is represented as a mth5 group class object which has methods to add, remove, and get a group from the level below. Each group has a metadata attribute that is the approprate metadata class object. For instance the SurveyGroup has an attribute metadata that is a mth5.metadata.Survey object. Metadata is stored in the HDF5 group attributes as (key, value) pairs.

All groups are represented by their structure tree and can be shown at any time from the command line.

Each level has a summary array of the contents of the levels below to hopefully make searching easier.

Parameters

filename (string or pathlib.Path) – name of the to be or existing file
compression –
compression type. Supported lossless compressions are
- ’lzf’ - Available with every installation of h5py
  (C source code also available). Low to moderate compression, very fast. No options.
- ’gzip’ - Available with every installation of HDF5,
  so it’s best where portability is required. Good compression, moderate speed. compression_opts sets the compression level and may be an integer from 0 to 9, default is 3.
- ’szip’ - Patent-encumbered filter used in the NASA
  community. Not available with all installations of HDF5 due to legal reasons. Consult the HDF5 docs for filter options.
compression_opts (string or int depending on compression type) – compression options, see above
shuffle (boolean) – Block-oriented compressors like GZIP or LZF work better when presented with runs of similar values. Enabling the shuffle filter rearranges the bytes in the chunk and may improve compression ratio. No significant speed penalty, lossless.
fletcher32 (boolean) – Adds a checksum to each chunk to detect data corruption. Attempts to read corrupted chunks will fail with an error. No significant speed penalty. Obviously shouldn’t be used with lossy compression filters.
data_level (integer, defaults to 1) –
level the data are stored following levels defined by NASA ESDS
- 0 - Raw data
- 1 - Raw data with response information and full metadata
- 2 - Derived product, raw data has been manipulated
file_version (string, optional) – Version of the file [ ‘0.1.0’ | ‘0.2.0’ ], defaults to “0.2.0”

Usage

Open a new file and show initialized file

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5(file_version='0.1.0')
>>> # Have a look at the dataset options
>>> mth5.dataset_options
{'compression': 'gzip',
 'compression_opts': 3,
 'shuffle': True,
 'fletcher32': True}
>>> mth5_obj.open_mth5(r"/home/mtdata/mt01.mth5", 'w')
>>> mth5_obj
/:
====================
    |- Group: Survey
    ----------------
        |- Group: Filters
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Reports
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Stations
        ------------------
            --> Dataset: summary
            ......................

Add metadata for survey from a dictionary

>>> survey_dict = {'survey':{'acquired_by': 'me', 'archive_id': 'MTCND'}}
>>> survey = mth5_obj.survey_group
>>> survey.metadata.from_dict(survey_dict)
>>> survey.metadata
{
"survey": {
    "acquired_by.author": "me",
    "acquired_by.comments": null,
    "archive_id": "MTCND"
    ...}
}

Add a station from the convenience function

>>> station = mth5_obj.add_station('MT001')
>>> mth5_obj
/:
====================
    |- Group: Survey
    ----------------
        |- Group: Filters
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Reports
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Stations
        ------------------
            |- Group: MT001
            ---------------
                --> Dataset: summary
                ......................
            --> Dataset: summary
            ......................
>>> station
/Survey/Stations/MT001:
====================
    --> Dataset: summary
    ......................

>>> data.schedule_01.ex[0:10] = np.nan
>>> data.calibration_hx[...] = np.logspace(-4, 4, 20)

Note

if replacing an entire array with a new one you need to use […] otherwise the data will not be updated.

Warning

You can only replace entire arrays with arrays of the same size. Otherwise you need to delete the existing data and make a new dataset.

add_channel(station_name, run_name, channel_name, channel_type, data, channel_dtype='int32', max_shape=(None,), chunks=True, channel_metadata=None, survey=None)[source]

Convenience function to add a channel using mth5.stations_group.get_station().get_run().add_channel()

add a channel to a given run for a given station

Parameters

station_name (string) – existing station name
run_name (string) – existing run name
channel_name (string) – name of the channel
channel_type (string) – [ electric | magnetic | auxiliary ]
channel_metadata ([ mth5.metadata.Electric | mth5.metadata.Magnetic | mth5.metadata.Auxiliary ], optional) – metadata container, defaults to None
survey (string) – existing survey name, needed for file version >= 0.2.0

Raises

MTH5Error – If channel type is not correct

Returns

Channel container

Return type

[ mth5.mth5_groups.ElectricDatset | mth5.mth5_groups.MagneticDatset | mth5.mth5_groups.AuxiliaryDatset ]

Example

>>> new_channel = mth5_obj.add_channel('MT001', 'MT001a''Ex',
>>> ...                                'electric', None)
>>> new_channel
Channel Electric:
-------------------
                component:        None
        data type:        electric
        data format:      float32
        data shape:       (1,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:00+00:00
        sample rate:      None

add_run(station_name, run_name, run_metadata=None, survey=None)[source]

Convenience function to add a run using

Add a run to a given station.

Parameters

run_name (string) – run name, should be archive_id{a-z}
survey (string) – existing survey name, needed for file version >= 0.2.0
metadata (mth5.metadata.Station, optional) – metadata container, defaults to None

Example

>>> new_run = mth5_obj.add_run('MT001', 'MT001a')

add_station(station_name, station_metadata=None, survey=None)[source]

Convenience function to add a station using mth5.stations_group.add_station

Add a station with metadata if given with the path [v0.1.0]:: /Survey/Stations/station_name
Add a station with metadata if given with the path [v0.2.0]:: Experiment/Surveys/survey/Stations/station_name

If the station already exists, will return that station and nothing is added.

Parameters

station_name (string) – Name of the station, should be the same as metadata.archive_id
station_metadata (mth5.metadata.Station, optional) – Station metadata container, defaults to None
survey (string) – existing survey name, needed for file version >= 0.2.0

Returns

A convenience class for the added station

Return type

mth5_groups.StationGroup

Example

>>> new_staiton = mth5_obj.add_station('MT001')

add_survey(survey_name, survey_metadata=None)[source]

Add a survey with metadata if given with the path:: /Experiment/Surveys/survey_name

If the survey already exists, will return that survey and nothing is added.

Parameters

survey_name (string) – Name of the survey, should be the same as metadata.id
survey_metadata (mth5.metadata.survey, optional) – survey metadata container, defaults to None

Returns

A convenience class for the added survey

Return type

mth5_groups.SurveyGroup

Example

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> # one option
>>> new_survey = mth5_obj.add_survey('MT001')
>>> # another option
>>> new_station = mth5_obj.experiment_group.surveys_group.add_survey('MT001')

add_transfer_function(tf_object)[source]: Add a transfer function :param tf_object: DESCRIPTION :type tf_object: TYPE :return: DESCRIPTION :rtype: TYPE

property channel_summary: return a dataframe of channels

close_mth5()[source]: close mth5 file to make sure everything is flushed to the file

property data_level: data level

property dataset_options: summary of dataset options

property experiment_group: Convenience property for /Experiment group

property file_attributes

property file_type: File Type should be MTH5

property file_version: mth5 file version

property filename: file name of the hdf5 file

property filters_group: Convenience property for /Survey/Filters group

from_experiment(experiment, survey_index=0, update=False)[source]

Fill out an MTH5 from a mt_metadata.timeseries.Experiment object given a survey_id

Parameters

experiment (mt_metadata.timeseries.Experiment) – Experiment metadata
survey_index (int, defaults to 0) – Index of the survey to write

from_reference(h5_reference)[source]

Get an HDF5 group, dataset, etc from a reference

Parameters: h5_reference (TYPE) – DESCRIPTION
Returns: DESCRIPTION
Return type: TYPE

get_channel(station_name, run_name, channel_name, survey=None)[source]

Convenience function to get a channel using mth5.stations_group.get_station().get_run().get_channel()

Get a channel from an existing name. Returns the appropriate container.

Parameters

station_name (string) – existing station name
run_name (string) – existing run name
channel_name (string) – name of the channel
survey (string) – existing survey name, needed for file version >= 0.2.0

Returns

Channel container

Return type

[ mth5.mth5_groups.ElectricDatset | mth5.mth5_groups.MagneticDatset | mth5.mth5_groups.AuxiliaryDatset ]

Raises

MTH5Error – If no channel is found

Example

>>> existing_channel = mth5_obj.get_channel(station_name,
>>> ...                                     run_name,
>>> ...                                     channel_name)
>>> existing_channel
Channel Electric:
-------------------
                component:        Ex
        data type:        electric
        data format:      float32
        data shape:       (4096,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:01+00:00
        sample rate:      4096

get_run(station_name, run_name, survey=None)[source]

Convenience function to get a run using mth5.stations_group.get_station(station_name).get_run()

get a run from run name for a given station

Parameters

station_name (string) – existing station name
run_name (string) – existing run name
survey (string) – existing survey name, needed for file version >= 0.2.0

Returns

Run object

Return type

mth5.mth5_groups.RunGroup

Example

>>> existing_run = mth5_obj.get_run('MT001', 'MT001a')

get_station(station_name, survey=None)[source]

Convenience function to get a station using

Get a station with the same name as station_name

Parameters

station_name (string) – existing station name
survey (string) – existing survey name, needed for file version >= 0.2.0

Returns

convenience station class

Return type

mth5.mth5_groups.StationGroup

Raises

MTH5Error – if the station name is not found.

Example

>>> existing_staiton = mth5_obj.get_station('MT001')
MTH5Error: MT001 does not exist, check station_list for existing names

get_survey(survey_name)[source]

Get a survey with the same name as survey_name

Parameters: survey_name (string) – existing survey name
Returns: convenience survey class
Return type: mth5.mth5_groups.surveyGroup
Raises: MTH5Error – if the survey name is not found.
Example

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> # one option
>>> existing_survey = mth5_obj.get_survey('MT001')
>>> # another option
>>> existing_staiton = mth5_obj.experiment_group.surveys_group.get_survey('MT001')
MTH5Error: MT001 does not exist, check groups_list for existing names

get_transfer_function(station_id, tf_id, survey=None)[source]

Get a transfer function

Parameters

survey_id (TYPE) – DESCRIPTION
station_id (TYPE) – DESCRIPTION
tf_id (TYPE) – DESCRIPTION

Returns

DESCRIPTION

Return type

TYPE

h5_is_read()[source]

check to see if the hdf5 file is open and readable

Returns: True if readable, False if not
Return type: Boolean

h5_is_write()[source]: check to see if the hdf5 file is open and writeable

has_group(group_name)[source]: Check to see if the group name exists

open_mth5(filename=None, mode='a')[source]

open an mth5 file

Returns: Survey Group
Type: groups.SurveyGroup
Example

>>> from mth5 import mth5
>>> mth5_object = mth5.MTH5(file_version='0.1.0')
>>> survey_object = mth5_object.open_mth5('Test.mth5', 'w')

>>> from mth5 import mth5
>>> mth5_object = mth5.MTH5()
>>> survey_object = mth5_object.open_mth5('Test.mth5', 'w')
>>> mth5_object.file_version
'0.2.0'

remove_channel(station_name, run_name, channel_name, survey=None)[source]

Convenience function to remove a channel using mth5.stations_group.get_station().get_run().remove_channel()

Remove a channel from a given run and station.

Note

Deleting a channel is not as simple as del(channel). In HDF5 this does not free up memory, it simply removes the reference to that channel. The common way to get around this is to copy what you want into a new file, or overwrite the channel.

Parameters

station_name (string) – existing station name
run_name (string) – existing run name
channel_name (string) – existing station name
survey (string) – existing survey name, needed for file version >= 0.2.0

Example

>>> mth5_obj.remove_channel('MT001', 'MT001a', 'Ex')

remove_run(station_name, run_name, survey=None)[source]

Remove a run from the station.

Note

Deleting a run is not as simple as del(run). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the run.

Parameters

station_name (string) – existing station name
run_name (string) – existing run name
survey (string) – existing survey name, needed for file version >= 0.2.0

Example

>>> mth5_obj.remove_station('MT001', 'MT001a')

remove_station(station_name, survey=None)[source]

Convenience function to remove a station using

Remove a station from the file.

Note

Deleting a station is not as simple as del(station). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the station.

Parameters

station_name (string) – existing station name
survey (string) – existing survey name, needed for file version >= 0.2.0

Example

>>> mth5_obj.remove_station('MT001')

remove_survey(survey_name)[source]

Remove a survey from the file.

Note

Deleting a survey is not as simple as del(survey). In HDF5 this does not free up memory, it simply removes the reference to that survey. The common way to get around this is to copy what you want into a new file, or overwrite the survey.

Parameters

survey_name (string) – existing survey name

Example

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> mth5_obj.open_mth5(r"/test.mth5", mode='a')
>>> # one option
>>> mth5_obj.remove_survey('MT001')
>>> # another option
>>> mth5_obj.experiment_group.surveys_group.remove_survey('MT001')

remove_transfer_function(station_id, tf_id, survey=None)[source]

remove a transfer function

Parameters

survey_id (TYPE) – DESCRIPTION
station_id (TYPE) – DESCRIPTION
tf_id (TYPE) – DESCRIPTION

Returns

DESCRIPTION

Return type

TYPE

property reports_group: Convenience property for /Survey/Reports group

property software_name: software name that wrote the file

property standards_group: Convenience property for /Standards group

property station_list: list of existing stations names

property stations_group: Convenience property for /Survey/Stations group

property survey_group: Convenience property for /Survey group

property surveys_group: Convenience property for /Surveys group

property tf_summary: return a dataframe of channels

to_experiment()[source]

Create an mt_metadata.timeseries.Experiment object from the metadata contained in the MTH5 file.

Returns: mt_metadata.timeseries.Experiment

validate_file()[source]

Validate an open mth5 file

will test the attribute values and group names

Returns: Boolean [ True = valid, False = not valid]
Return type: Boolean

Module contents

Top-level package for MTH5.