mth5 package

Submodules

mth5.helpers module

Helper functions for HDF5

Created on Tue Jun 2 12:37:50 2020

copyright

Jared Peacock (jpeacock@usgs.gov)

license

MIT

mth5.helpers.close_open_files()[source]
mth5.helpers.get_tree(parent)[source]

Simple function to recursively print the contents of an hdf5 group :param parent: HDF5 (sub-)tree to print :type parent: h5py.Group

mth5.helpers.inherit_doc_string(cls)[source]
mth5.helpers.recursive_hdf5_tree(group, lines=[])[source]
mth5.helpers.to_numpy_type(value)[source]

Need to make the attributes friendly with Numpy and HDF5.

For numbers and bool this is straight forward they are automatically mapped in h5py to a numpy type.

But for strings this can be a challenge, especially a list of strings.

HDF5 should only deal with ASCII characters or Unicode. No binary data is allowed.

mth5.helpers.validate_compression(compression, level)[source]

validate that the input compression is supported.

Parameters
  • compression (string, [ 'lzf' | 'gzip' | 'szip' | None ]) – type of lossless compression

  • level (string for 'szip' or int for 'gzip') – compression level if supported

Returns

compression type

Return type

string

Returns

compressiong level

Return type

string for ‘szip’ or int for ‘gzip’

Raises

ValueError if comporession or level are not supported

Raises

TypeError if compression level is not a string

mth5.mth5 module

MTH5

MTH5 deals with reading and writing an MTH5 file, which are HDF5 files developed for magnetotelluric (MT) data. The code is based on h5py and therefor numpy. This is the simplest and we are not really dealing with large tables of data to warrant using pytables.

Created on Sun Dec 9 20:50:41 2018

copyright

Jared Peacock (jpeacock@usgs.gov)

license

MIT

class mth5.mth5.MTH5(filename=None, compression='gzip', compression_opts=9, shuffle=True, fletcher32=True, data_level=1)[source]

Bases: object

MTH5 is the main container for the HDF5 file format developed for MT data

It uses the metadata standards developled by the IRIS PASSCAL software group and defined in the metadata documentation.

MTH5 is built with h5py and therefore numpy. The structure follows the different levels of MT data collection: Survey

|_Reports |_Standards |_Filters |_Station

|_Run

|_Channel

All timeseries data are stored as individual channels with the appropriate metadata defined for the given channel, i.e. electric, magnetic, auxiliary.

Each level is represented as a mth5 group class object which has methods to add, remove, and get a group from the level below. Each group has a metadata attribute that is the approprate metadata class object. For instance the SurveyGroup has an attribute metadata that is a mth5.metadata.Survey object. Metadata is stored in the HDF5 group attributes as (key, value) pairs.

All groups are represented by their structure tree and can be shown at any time from the command line.

Each level has a summary array of the contents of the levels below to hopefully make searching easier.

Parameters
  • filename (string or pathlib.Path) – name of the to be or existing file

  • compression

    compression type. Supported lossless compressions are * ‘lzf’ - Available with every installation of h5py

    (C source code also available). Low to moderate compression, very fast. No options.

    • ’gzip’ - Available with every installation of HDF5,

      so it’s best where portability is required. Good compression, moderate speed. compression_opts sets the compression level and may be an integer from 0 to 9, default is 3.

    • ’szip’ - Patent-encumbered filter used in the NASA

      community. Not available with all installations of HDF5 due to legal reasons. Consult the HDF5 docs for filter options.

  • compression_opts (string or int depending on compression type) – compression options, see above

  • shuffle (boolean) – Block-oriented compressors like GZIP or LZF work better when presented with runs of similar values. Enabling the shuffle filter rearranges the bytes in the chunk and may improve compression ratio. No significant speed penalty, lossless.

  • fletcher32 (boolean) – Adds a checksum to each chunk to detect data corruption. Attempts to read corrupted chunks will fail with an error. No significant speed penalty. Obviously shouldn’t be used with lossy compression filters.

  • data_level (integer, defaults to 1) –

    level the data are stored following levels defined by NASA ESDS

    • 0 - Raw data

    • 1 - Raw data with response information and full metadata

    • 2 - Derived product, raw data has been manipulated

Usage

  • Open a new file and show initialized file

>>> from mth5 import mth5
>>> mth5_obj = mth5.MTH5()
>>> # Have a look at the dataset options
>>> mth5.dataset_options
{'compression': 'gzip',
 'compression_opts': 3,
 'shuffle': True,
 'fletcher32': True}
>>> mth5_obj.open_mth5(r"/home/mtdata/mt01.mth5", 'w')
>>> mth5_obj
/:
====================
    |- Group: Survey
    ----------------
        |- Group: Filters
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Reports
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Stations
        ------------------
            --> Dataset: summary
            ......................
  • Add metadata for survey from a dictionary

>>> survey_dict = {'survey':{'acquired_by': 'me', 'archive_id': 'MTCND'}}
>>> survey = mth5_obj.survey_group
>>> survey.metadata.from_dict(survey_dict)
>>> survey.metadata
{
"survey": {
    "acquired_by.author": "me",
    "acquired_by.comments": null,
    "archive_id": "MTCND"
    ...}
}
  • Add a station from the convenience function

>>> station = mth5_obj.add_station('MT001')
>>> mth5_obj
/:
====================
    |- Group: Survey
    ----------------
        |- Group: Filters
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Reports
        -----------------
            --> Dataset: summary
            ......................
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Stations
        ------------------
            |- Group: MT001
            ---------------
                --> Dataset: summary
                ......................
            --> Dataset: summary
            ......................
>>> station
/Survey/Stations/MT001:
====================
    --> Dataset: summary
    ......................
>>> data.schedule_01.ex[0:10] = np.nan
>>> data.calibration_hx[...] = np.logspace(-4, 4, 20)

Note

if replacing an entire array with a new one you need to use […] otherwise the data will not be updated.

Warning

You can only replace entire arrays with arrays of the same size. Otherwise you need to delete the existing data and make a new dataset.

add_channel(station_name, run_name, channel_name, channel_type, data, channel_metadata=None)[source]

Convenience function to add a channel using mth5.stations_group.get_station().get_run().add_channel()

add a channel to a given run for a given station

Parameters
  • station_name (string) – existing station name

  • run_name (string) – existing run name

  • channel_name (string) – name of the channel

  • channel_type (string) – [ electric | magnetic | auxiliary ]

  • channel_metadata ([ mth5.metadata.Electric | mth5.metadata.Magnetic | mth5.metadata.Auxiliary ], optional) – metadata container, defaults to None

Raises

MTH5Error – If channel type is not correct

Returns

Channel container

Return type

[ mth5.mth5_groups.ElectricDatset | mth5.mth5_groups.MagneticDatset | mth5.mth5_groups.AuxiliaryDatset ]

Example

>>> new_channel = mth5_obj.add_channel('MT001', 'MT001a''Ex',
>>> ...                                'electric', None)
>>> new_channel
Channel Electric:
-------------------
                component:        None
        data type:        electric
        data format:      float32
        data shape:       (1,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:00+00:00
        sample rate:      None
add_run(station_name, run_name, run_metadata=None)[source]

Convenience function to add a run using mth5.stations_group.get_station(station_name).add_run()

Add a run to a given station.

Parameters
  • run_name (string) – run name, should be archive_id{a-z}

  • metadata (mth5.metadata.Station, optional) – metadata container, defaults to None

Example

>>> new_run = mth5_obj.add_run('MT001', 'MT001a')
add_station(name, station_metadata=None)[source]

Convenience function to add a station using mth5.stations_group.add_station

Add a station with metadata if given with the path:

/Survey/Stations/station_name

If the station already exists, will return that station and nothing is added.

Parameters
  • station_name (string) – Name of the station, should be the same as metadata.archive_id

  • station_metadata (mth5.metadata.Station, optional) – Station metadata container, defaults to None

Returns

A convenience class for the added station

Return type

mth5_groups.StationGroup

Example

>>> new_staiton = mth5_obj.add_station('MT001')
close_mth5()[source]

close mth5 file to make sure everything is flushed to the file

property data_level

data level

property dataset_options

summary of dataset options

property file_type

File Type should be MTH5

property file_version

mth5 file version

property filename

file name of the hdf5 file

property filters_group

Convenience property for /Survey/Filters group

from_experiment(experiment, survey_index=0)[source]

Fill out an MTH5 from a mt_metadata.timeseries.Experiment object given a survey_id

Parameters
  • experiment (mt_metadata.timeseries.Experiment) – Experiment metadata

  • survey_index (int, defaults to 0) – Index of the survey to write

from_reference(h5_reference)[source]

Get an HDF5 group, dataset, etc from a reference

Parameters

h5_reference (TYPE) – DESCRIPTION

Returns

DESCRIPTION

Return type

TYPE

get_channel(station_name, run_name, channel_name)[source]

Convenience function to get a channel using mth5.stations_group.get_station().get_run().get_channel()

Get a channel from an existing name. Returns the appropriate container.

Parameters
  • station_name (string) – existing station name

  • run_name (string) – existing run name

  • channel_name (string) – name of the channel

Returns

Channel container

Return type

[ mth5.mth5_groups.ElectricDatset | mth5.mth5_groups.MagneticDatset | mth5.mth5_groups.AuxiliaryDatset ]

Raises

MTH5Error – If no channel is found

Example

>>> existing_channel = mth5_obj.get_channel(station_name,
>>> ...                                     run_name,
>>> ...                                     channel_name)
>>> existing_channel
Channel Electric:
-------------------
                component:        Ex
        data type:        electric
        data format:      float32
        data shape:       (4096,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:01+00:00
        sample rate:      4096
get_run(station_name, run_name)[source]

Convenience function to get a run using mth5.stations_group.get_station(station_name).get_run()

get a run from run name for a given station

Parameters
  • station_name (string) – existing station name

  • run_name (string) – existing run name

Returns

Run object

Return type

mth5.mth5_groups.RunGroup

Example

>>> existing_run = mth5_obj.get_run('MT001', 'MT001a')
get_station(station_name)[source]

Convenience function to get a station using mth5.stations_group.get_station

Get a station with the same name as station_name

Parameters

station_name (string) – existing station name

Returns

convenience station class

Return type

mth5.mth5_groups.StationGroup

Raises

MTH5Error – if the station name is not found.

Example

>>> existing_staiton = mth5_obj.get_station('MT001')
MTH5Error: MT001 does not exist, check station_list for existing names
h5_is_write()[source]

check to see if the hdf5 file is open and writeable

has_group(group_name)[source]

Check to see if the group name exists

open_mth5(filename=None, mode='a')[source]

open an mth5 file

Returns

Survey Group

Type

groups.SurveyGroup

Example

>>> from mth5 import mth5
>>> mth5_object = mth5.MTH5()
>>> survey_object = mth5_object.open_mth5('Test.mth5', 'w')
remove_channel(station_name, run_name, channel_name)[source]

Convenience function to remove a channel using mth5.stations_group.get_station().get_run().remove_channel()

Remove a channel from a given run and station.

Note

Deleting a channel is not as simple as del(channel). In HDF5 this does not free up memory, it simply removes the reference to that channel. The common way to get around this is to copy what you want into a new file, or overwrite the channel.

Parameters
  • station_name (string) – existing station name

  • run_name (string) – existing run name

  • channel_name (string) – existing station name

Example

>>> mth5_obj.remove_channel('MT001', 'MT001a', 'Ex')
remove_run(station_name, run_name)[source]

Convenience function to add a run using mth5.stations_group.get_station(station_name).remove_run()

Remove a run from the station.

Note

Deleting a run is not as simple as del(run). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the run.

Parameters
  • station_name (string) – existing station name

  • run_name (string) – existing run name

Example

>>> mth5_obj.remove_station('MT001', 'MT001a')
remove_station(station_name)[source]

Convenience function to remove a station using mth5.stations_group.remove_station

Remove a station from the file.

Note

Deleting a station is not as simple as del(station). In HDF5 this does not free up memory, it simply removes the reference to that station. The common way to get around this is to copy what you want into a new file, or overwrite the station.

Parameters

station_name (string) – existing station name

Example

>>> mth5_obj.remove_station('MT001')
property reports_group

Convenience property for /Survey/Reports group

property software_name

software name that wrote the file

property standards_group

Convenience property for /Survey/Standards group

property station_list

list of existing stations names

property stations_group

Convenience property for /Survey/Stations group

property survey_group

Convenience property for /Survey group

to_experiment()[source]

Create an mt_metadata.timeseries.Experiment object from the metadata contained in the MTH5 file.

Returns

mt_metadata.timeseries.Experiment

validate_file()[source]

Validate an open mth5 file

will test the attribute values and group names

Returns

Boolean [ True = valid, False = not valid]

Return type

Boolean

Module contents

Top-level package for MTH5.