Example of Working with a Version 0.2.0 MTH5 File

[1]:

from mth5.mth5 import MTH5

2022-04-12T21:21:46 [line 157] numexpr.utils._init_num_threads - INFO: NumExpr defaulting to 8 threads.

2022-04-12 21:21:47,999 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\Documents\GitHub\mth5\logs\mth5_debug.log

Initialize an MTH5 object with file version 0.2.0

[2]:

m = MTH5(file_version="0.2.0")

Have a look at the attributes of the file

[3]:

m.file_attributes

[3]:

{'file.type': 'MTH5',
 'file.version': '0.2.0',
 'file.access.platform': 'Windows-10-10.0.19041-SP0',
 'file.access.time': '2022-04-13T04:21:53.788393+00:00',
 'mth5.software.version': '0.2.5',
 'mth5.software.name': 'mth5',
 'data_level': 1}

Here are the data set options

[4]:

m.dataset_options

[4]:

{'compression': 'gzip',
 'compression_opts': 9,
 'shuffle': True,
 'fletcher32': True}

The file is currently not open yet

[5]:

[5]:

HDF5 file is closed and cannot be accessed.

Open a new file

We will open the file in mode w here, which will overwrite the file if it already exists. If you don’t want to do that or are unsure if a file already exists the safest option is using mode a.

Context Manager

Its strongly encouraged that if you are making an MTH5 file, even if you want to open it up afterwards that you use with.

with MTH5(**kwargs) as m:
    m.open_mth5(filename)
    #pack MTH5

Using this style of pseudocode your MTH5 file will be made in a safe way if anything goes wrong in the packing. Using the with statement will automatically flush and close the MTH5 upon exiting the with statement, that includes any errors encountered.

Here we are just showing an example and how to interogate an MTH5 file.

[6]:

m.open_mth5(r"example.h5", "w")

2022-04-12 21:22:02,707 [line 591] mth5.mth5.MTH5.open_mth5 - WARNING: example.h5 will be overwritten in 'w' mode
2022-04-12 21:22:03,000 [line 656] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.2.0 file example.h5 in mode w

Now that we have initiated a file, let’s see what’s in an empty file.

[20]:

[20]:

/:
====================
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: example
            -----------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                    |- Group: fap
                    -------------
                    |- Group: fir
                    -------------
                    |- Group: time_delay
                    --------------------
                    |- Group: zpk
                    -------------
                |- Group: Reports
                -----------------
                |- Group: Standards
                -------------------
                    --> Dataset: summary
                    ......................
                |- Group: Stations
                ------------------
                    |- Group: mt001
                    ---------------
                        |- Group: Transfer_Functions
                        ----------------------------
                    |- Group: mt002
                    ---------------
                        |- Group: 001
                        -------------
                            --> Dataset: ex
                            .................
                            --> Dataset: hy
                            .................
                        |- Group: 002
                        -------------
                        |- Group: Transfer_Functions
                        ----------------------------
        --> Dataset: channel_summary
        ..............................
        --> Dataset: tf_summary
        .........................

We can see that there are default groups that are initiated by default. And here are the methods an MTH5 object contains. You can open/close an MTH5 file; add/remove station, run, channel; read from an mt_metadata.timeseries.Experiment object to fill the metadata and structure before adding data and create an mt_metadata.timeseries.Experiment object for archiving.

[8]:

print("\n".join(sorted([func for func in dir(m) if callable(getattr(m, func)) and not func.startswith("_")])))

2022-04-12 21:22:11,930 [line 519] mth5.mth5.MTH5.filters_group - INFO: File version 0.2.0 does not have a FiltersGroup at the experiment level
2022-04-12 21:22:11,932 [line 541] mth5.mth5.MTH5.stations_group - INFO: File version 0.2.0 does not have a Stations. try surveys_group.
2022-04-12 21:22:11,934 [line 479] mth5.mth5.MTH5.survey_group - INFO: File version 0.2.0 does not have a survey_group, try surveys_group

add_channel
add_run
add_station
add_survey
add_transfer_function
close_mth5
from_experiment
from_reference
get_channel
get_run
get_station
get_survey
get_transfer_function
h5_is_read
h5_is_write
has_group
open_mth5
remove_channel
remove_run
remove_station
remove_survey
remove_transfer_function
to_experiment
validate_file

Add a Survey

The first step is to add a survey, here we will add the survey example. This will return a SurveyGroup object which will commonly be the main group we work with.

[9]:

survey_group = m.add_survey("example")

Add a station

Here we will add a station called mt001. This will return a StationGroup object. We can add a station 2 ways: one directy from the MTH5 object m or from the newly created survey_group. Note if we add it from m then we need to include the survey name example.

[10]:

station_group = m.add_station("mt001", survey="example")
station_group = survey_group.stations_group.add_station("mt002")

Add some metadata to this station like location, who acquired it, and the reference frame in which the data were collected.

[11]:

station_group.metadata.location.latitude = "40:05:01"
station_group.metadata.location.longitude = -122.3432
station_group.metadata.location.elevation = 403.1
station_group.metadata.acquired_by.author = "me"
station_group.metadata.orientation.reference_frame = "geomagnetic"

# IMPORTANT: Must always use the write_metadata method when metadata is updated.
station_group.write_metadata()

[12]:

station_group.metadata

[12]:

{
    "station": {
        "acquired_by.name": "me",
        "channels_recorded": [],
        "data_type": "BBMT",
        "geographic_name": null,
        "hdf5_reference": "<HDF5 object reference>",
        "id": "mt002",
        "location.declination.model": "WMM",
        "location.declination.value": 0.0,
        "location.elevation": 403.1,
        "location.latitude": 40.08361111111111,
        "location.longitude": -122.3432,
        "mth5_type": "Station",
        "orientation.method": null,
        "orientation.reference_frame": "geomagnetic",
        "provenance.creation_time": "1980-01-01T00:00:00+00:00",
        "provenance.software.author": "none",
        "provenance.software.name": null,
        "provenance.software.version": null,
        "provenance.submitter.email": null,
        "provenance.submitter.organization": null,
        "run_list": [],
        "time_period.end": "1980-01-01T00:00:00+00:00",
        "time_period.start": "1980-01-01T00:00:00+00:00"
    }
}

Add a Run

We can now add a run to the new station. We can do this in 2 ways, one directly from the m the MTH5 object, or from the newly created station_group

[13]:

run_01 = m.add_run("mt002", "001", survey="example")
run_02 = station_group.add_run("002")

[14]:

station_group

[14]:

/Experiment/Surveys/example/Stations/mt002:
====================
    |- Group: 001
    -------------
    |- Group: 002
    -------------
    |- Group: Transfer_Functions
    ----------------------------

Add a Channel

Again we can do this in 2 ways: directly from the m the MTH5 object, or from the newly created run_01 or run_02 group. There are only 3 types of channels electric, magnetic, and auxiliary and this needs to be specified when a channel is initiated. We will initate the channel with data=None, which will create an empty data set.

[15]:

ex = m.add_channel("mt002", "001", "ex", "electric", None, survey="example")
hy = run_01.add_channel("hy", "magnetic", None)

[16]:

hy

[16]:

Channel Magnetic:
-------------------
        component:        hy
        data type:        magnetic
        data format:      int32
        data shape:       (1,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:00+00:00
        sample rate:      0.0

Now, let’s see what the contents are of this file

[17]:

[17]:

/:
====================
    |- Group: Experiment
    --------------------
        |- Group: Reports
        -----------------
        |- Group: Standards
        -------------------
            --> Dataset: summary
            ......................
        |- Group: Surveys
        -----------------
            |- Group: example
            -----------------
                |- Group: Filters
                -----------------
                    |- Group: coefficient
                    ---------------------
                    |- Group: fap
                    -------------
                    |- Group: fir
                    -------------
                    |- Group: time_delay
                    --------------------
                    |- Group: zpk
                    -------------
                |- Group: Reports
                -----------------
                |- Group: Standards
                -------------------
                    --> Dataset: summary
                    ......................
                |- Group: Stations
                ------------------
                    |- Group: mt001
                    ---------------
                        |- Group: Transfer_Functions
                        ----------------------------
                    |- Group: mt002
                    ---------------
                        |- Group: 001
                        -------------
                            --> Dataset: ex
                            .................
                            --> Dataset: hy
                            .................
                        |- Group: 002
                        -------------
                        |- Group: Transfer_Functions
                        ----------------------------
        --> Dataset: channel_summary
        ..............................
        --> Dataset: tf_summary
        .........................

Channel Summary

We can have a look at the what channels are in this file. This can take a long time if you have lots of data. This returns a pandas.DataFrame object and can therefore be queried with the standard Pandas methods.

Note: the number of samples is 1 even though we did not add any data. This is because we initialize the dataset to be extendable and it needs at least 1 dimension to be initialized. We set the max shape to be (1, None) which means it can be extended to an arbitrary shape.

[18]:

%time

m.channel_summary.clear_table()
m.channel_summary.summarize()

ch_df = m.channel_summary.to_dataframe()
ch_df

Wall time: 0 ns

[18]:

	survey	station	run	latitude	longitude	elevation	component	start	end	n_samples	sample_rate	measurement_type	azimuth	tilt	units	hdf5_reference	run_hdf5_reference	station_hdf5_reference
0	example	mt002	001	40.083611	-122.3432	403.1	ex	1980-01-01 00:00:00+00:00	1980-01-01 00:00:00+00:00	1	0.0	electric	0.0	0.0	none	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
1	example	mt002	001	40.083611	-122.3432	403.1	hy	1980-01-01 00:00:00+00:00	1980-01-01 00:00:00+00:00	1	0.0	magnetic	0.0	0.0	none	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>

Access channel through HDF5 Reference

The channel summary table contains a column labeled hdf5_reference, this is an interal HDF5 reference that can be used directly to access that specific group or dataset. A method is provided in MTH5 to use this reference and return the proper group object. Here we will request to get the first channel in the table

[19]:

h5_reference = ch_df.iloc[0].hdf5_reference
ex = m.from_reference(h5_reference)
ex

[19]:

Channel Electric:
-------------------
        component:        ex
        data type:        electric
        data format:      int32
        data shape:       (1,)
        start:            1980-01-01T00:00:00+00:00
        end:              1980-01-01T00:00:00+00:00
        sample rate:      0.0

Close MTH5 file

This part is important, be sure to close the file in order to save any changes. This function flushes metadata and data to the HDF5 file and then closes it. Note that once a file is closed all groups lose their link to the file and cannot retrieve any data.

[ ]:

m.close_mth5()
station_group