Make an MTH5 from Phoenix Data

This example demonstrates how to read Phoenix data into an MTH5 file. The data comes from example data in PhoenixGeoPy. Here I downloaded those data into a local folder on my computer by forking the main branch.

Imports

[1]:
from pathlib import Path

from mth5.mth5 import MTH5
from mth5 import read_file
from mth5.io.phoenix import ReceiverMetadataJSON, PhoenixCollection
2022-08-31 13:19:07,431 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mth5\logs\mth5_debug.log

Data Directory

Specify the station directory. Phoenix files place each channel in a folder under the station directory named by the channel number. There is also a recmeta.json file that has metadata output by the receiver that can be useful. In the PhoenixGeopPy/sample_data there are 2 folders one for native data, these are .bin files which are the raw data in counts sampled at 24k. There is also a folder for segmented files, these files are calibrated to millivolts and decimated or segmented data according to the recording configuration. Most of the time you would use the segmented files?

[2]:
station_dir = Path(r"c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630")

File Collection

We’ve developed a collection dataframe to help sort out which files are which and which files can be grouped together into runs. Continous runs will be given a single run name and segmented data will have sequential run names. Both will have the pattern sr{sample_rate}_#### for the run.id.

Here PhoenixCollection.get_runs returns a two level ordered dictionary (OrderedDict). The first level is keyed by station ID. These objects are in turn ordered dictionaries by run ID. Therefore you can loop over stations and runs.

Receiver Metadata

The data logger or receiver will output a JSON file that contains useful metadata that is missing from the data files. The recmeta.json file can be read into an object with methods to translate to mt_metadata objects. This is read in by PhoenixCollection and is in the attribute receiver_metadata.

[3]:
phx_collection = PhoenixCollection(file_path=station_dir)
run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

Initiate MTH5

First initiate an MTH5 file, can use the receiver metadata to fill in some Survey metadata

[4]:
m = MTH5()
m.open_mth5(station_dir.joinpath("mth5_from_phoenix.h5"), "w")
2022-08-31 13:19:32,755 [line 596] mth5.mth5.MTH5.open_mth5 - WARNING: mth5_from_phoenix.h5 will be overwritten in 'w' mode
2022-08-31 13:19:33,267 [line 663] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.2.0 file c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5 in mode w
[5]:
survey_metadata = phx_collection.receiver_metadata.survey_metadata
survey_group = m.add_survey(survey_metadata.id)

Loop through Stations, Runs, and Channels

Using the run_dict value output by the PhoenixCollection.get_runs we can simply loop through the runs without knowing if the data are continous or discontinuous, the read_file will take care of that.

Users should note the Phoenix file structure. Inside the folder are files with extensions of .td_24k and td_150.

  • .td_24k are usually bursts of a few seconds of data sampled at 24k samples per second to get high frequency information. The returned object is a mth5.timeseries.ChannelTS.

  • td_150 is data continuously sampled at 150 samples per second. These files usually have a set length, commonly an hour. The returned object is a mth5.timeseries.ChannelTS.

[6]:
%%time

for station_id, station_dict in run_dict.items():
    station_metadata = phx_collection.receiver_metadata.station_metadata
    station_group = survey_group.stations_group.add_station(
        station_metadata.id,
        station_metadata=station_metadata
    )
    for run_id, run_df in station_dict.items():
        run_metadata = phx_collection.receiver_metadata.run_metadata
        run_metadata.id = run_id
        run_metadata.sample_rate = float(run_df.sample_rate.unique()[0])

        run_group = station_group.add_run(run_metadata.id, run_metadata=run_metadata)
        for row in run_df.itertuples():
            ch_ts = read_file(row.fn, **{"channel_map":phx_collection.receiver_metadata.channel_map})
            ch_metadata = phx_collection.receiver_metadata.get_ch_metadata(
                ch_ts.channel_metadata.channel_number
            )
            # need to update the time period and sample rate as estimated from the data not the metadata
            ch_metadata.sample_rate = ch_ts.sample_rate
            ch_metadata.time_period.update(ch_ts.channel_metadata.time_period)
            ch_ts.channel_metadata.update(ch_metadata)

            # add channel to the run group
            ch_dataset = run_group.from_channel_ts(ch_ts)
Wall time: 3min 57s

Update metadata before closing

Need to update the metadata to account for added stations, runs, and channels.

[7]:
%%time
station_group.validate_station_metadata()
station_group.write_metadata()

survey_group.update_survey_metadata()
survey_group.write_metadata()
Wall time: 9min 19s
[8]:
m.channel_summary.summarize()
m.channel_summary.to_dataframe()
[8]:
survey station run latitude longitude elevation component start end n_samples sample_rate measurement_type azimuth tilt units hdf5_reference run_hdf5_reference station_hdf5_reference
0 UofS REMOTE sr150_0001 50.979666 -106.794989 559.472154 e1 2019-09-05 18:56:30+00:00 2019-09-06 07:15:38+00:00 6652200 150.0 electric 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
1 UofS REMOTE sr150_0001 50.979666 -106.794989 559.472154 e2 2019-09-05 18:56:30+00:00 2019-09-06 07:15:38+00:00 6652200 150.0 electric 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
2 UofS REMOTE sr150_0001 50.979666 -106.794989 559.472154 h1 2019-09-05 18:56:30+00:00 2019-09-06 07:15:38+00:00 6652200 150.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
3 UofS REMOTE sr150_0001 50.979666 -106.794989 559.472154 h2 2019-09-05 18:56:30+00:00 2019-09-06 07:15:38+00:00 6652200 150.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
4 UofS REMOTE sr150_0001 50.979666 -106.794989 559.472154 h3 2019-09-05 18:56:30+00:00 2019-09-06 07:15:38+00:00 6652200 150.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
995 UofS REMOTE sr24k_0124 50.979666 -106.794989 559.472154 h2 2019-09-06 14:14:31+00:00 2019-09-06 14:14:33+00:00 48000 24000.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
996 UofS REMOTE sr24k_0124 50.979666 -106.794989 559.472154 h3 2019-09-06 14:14:31+00:00 2019-09-06 14:14:33+00:00 48000 24000.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
997 UofS REMOTE sr24k_0124 50.979666 -106.794989 559.472154 h4 2019-09-06 14:14:31+00:00 2019-09-06 14:14:33+00:00 48000 24000.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
998 UofS REMOTE sr24k_0124 50.979666 -106.794989 559.472154 h5 2019-09-06 14:14:31+00:00 2019-09-06 14:14:33+00:00 48000 24000.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>
999 UofS REMOTE sr24k_0124 50.979666 -106.794989 559.472154 h6 2019-09-06 14:14:31+00:00 2019-09-06 14:14:33+00:00 48000 24000.0 magnetic 0.0 0.0 millivolts <HDF5 object reference> <HDF5 object reference> <HDF5 object reference>

1000 rows × 18 columns

[9]:
m.close_mth5()
2022-08-31 13:32:58,389 [line 744] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5