Make an MTH5 from Phoenix Data

This example demonstrates how to read Phoenix data into an MTH5 file. The data comes from example data in PhoenixGeoPy. Here I downloaded those data into a local folder on my computer by forking the main branch.

Imports

[1]:

from pathlib import Path

from mth5.mth5 import MTH5
from mth5 import read_file
from mth5.io.phoenix import ReceiverMetadataJSON, PhoenixCollection

2022-08-31 13:19:07,431 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mth5\logs\mth5_debug.log

Data Directory

Specify the station directory. Phoenix files place each channel in a folder under the station directory named by the channel number. There is also a recmeta.json file that has metadata output by the receiver that can be useful. In the PhoenixGeopPy/sample_data there are 2 folders one for native data, these are .bin files which are the raw data in counts sampled at 24k. There is also a folder for segmented files, these files are calibrated to millivolts and decimated or segmented data according to the recording configuration. Most of the time you would use the segmented files?

[2]:

station_dir = Path(r"c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630")

File Collection

We’ve developed a collection dataframe to help sort out which files are which and which files can be grouped together into runs. Continous runs will be given a single run name and segmented data will have sequential run names. Both will have the pattern sr{sample_rate}_#### for the run.id.

Here PhoenixCollection.get_runs returns a two level ordered dictionary (OrderedDict). The first level is keyed by station ID. These objects are in turn ordered dictionaries by run ID. Therefore you can loop over stations and runs.

Receiver Metadata

The data logger or receiver will output a JSON file that contains useful metadata that is missing from the data files. The recmeta.json file can be read into an object with methods to translate to mt_metadata objects. This is read in by PhoenixCollection and is in the attribute receiver_metadata.

[3]:

phx_collection = PhoenixCollection(file_path=station_dir)
run_dict = phx_collection.get_runs(sample_rates=[150, 24000])

Initiate MTH5

First initiate an MTH5 file, can use the receiver metadata to fill in some Survey metadata

[4]:

m = MTH5()
m.open_mth5(station_dir.joinpath("mth5_from_phoenix.h5"), "w")

2022-08-31 13:19:32,755 [line 596] mth5.mth5.MTH5.open_mth5 - WARNING: mth5_from_phoenix.h5 will be overwritten in 'w' mode
2022-08-31 13:19:33,267 [line 663] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.2.0 file c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5 in mode w

[5]:

survey_metadata = phx_collection.receiver_metadata.survey_metadata
survey_group = m.add_survey(survey_metadata.id)

Loop through Stations, Runs, and Channels

Using the run_dict value output by the PhoenixCollection.get_runs we can simply loop through the runs without knowing if the data are continous or discontinuous, the read_file will take care of that.

Users should note the Phoenix file structure. Inside the folder are files with extensions of .td_24k and td_150.

.td_24k are usually bursts of a few seconds of data sampled at 24k samples per second to get high frequency information. The returned object is a mth5.timeseries.ChannelTS.
td_150 is data continuously sampled at 150 samples per second. These files usually have a set length, commonly an hour. The returned object is a mth5.timeseries.ChannelTS.

[6]:

%%time

for station_id, station_dict in run_dict.items():
    station_metadata = phx_collection.receiver_metadata.station_metadata
    station_group = survey_group.stations_group.add_station(
        station_metadata.id,
        station_metadata=station_metadata
    )
    for run_id, run_df in station_dict.items():
        run_metadata = phx_collection.receiver_metadata.run_metadata
        run_metadata.id = run_id
        run_metadata.sample_rate = float(run_df.sample_rate.unique()[0])

        run_group = station_group.add_run(run_metadata.id, run_metadata=run_metadata)
        for row in run_df.itertuples():
            ch_ts = read_file(row.fn, **{"channel_map":phx_collection.receiver_metadata.channel_map})
            ch_metadata = phx_collection.receiver_metadata.get_ch_metadata(
                ch_ts.channel_metadata.channel_number
            )
            # need to update the time period and sample rate as estimated from the data not the metadata
            ch_metadata.sample_rate = ch_ts.sample_rate
            ch_metadata.time_period.update(ch_ts.channel_metadata.time_period)
            ch_ts.channel_metadata.update(ch_metadata)

            # add channel to the run group
            ch_dataset = run_group.from_channel_ts(ch_ts)

Wall time: 3min 57s

Update metadata before closing

Need to update the metadata to account for added stations, runs, and channels.

[7]:

%%time
station_group.validate_station_metadata()
station_group.write_metadata()

survey_group.update_survey_metadata()
survey_group.write_metadata()

Wall time: 9min 19s

[8]:

m.channel_summary.summarize()
m.channel_summary.to_dataframe()

[8]:

	survey	station	run	latitude	longitude	elevation	component	start	end	n_samples	sample_rate	measurement_type	azimuth	tilt	units	hdf5_reference	run_hdf5_reference	station_hdf5_reference
0	UofS	REMOTE	sr150_0001	50.979666	-106.794989	559.472154	e1	2019-09-05 18:56:30+00:00	2019-09-06 07:15:38+00:00	6652200	150.0	electric	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
1	UofS	REMOTE	sr150_0001	50.979666	-106.794989	559.472154	e2	2019-09-05 18:56:30+00:00	2019-09-06 07:15:38+00:00	6652200	150.0	electric	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
2	UofS	REMOTE	sr150_0001	50.979666	-106.794989	559.472154	h1	2019-09-05 18:56:30+00:00	2019-09-06 07:15:38+00:00	6652200	150.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
3	UofS	REMOTE	sr150_0001	50.979666	-106.794989	559.472154	h2	2019-09-05 18:56:30+00:00	2019-09-06 07:15:38+00:00	6652200	150.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
4	UofS	REMOTE	sr150_0001	50.979666	-106.794989	559.472154	h3	2019-09-05 18:56:30+00:00	2019-09-06 07:15:38+00:00	6652200	150.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
995	UofS	REMOTE	sr24k_0124	50.979666	-106.794989	559.472154	h2	2019-09-06 14:14:31+00:00	2019-09-06 14:14:33+00:00	48000	24000.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
996	UofS	REMOTE	sr24k_0124	50.979666	-106.794989	559.472154	h3	2019-09-06 14:14:31+00:00	2019-09-06 14:14:33+00:00	48000	24000.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
997	UofS	REMOTE	sr24k_0124	50.979666	-106.794989	559.472154	h4	2019-09-06 14:14:31+00:00	2019-09-06 14:14:33+00:00	48000	24000.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
998	UofS	REMOTE	sr24k_0124	50.979666	-106.794989	559.472154	h5	2019-09-06 14:14:31+00:00	2019-09-06 14:14:33+00:00	48000	24000.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>
999	UofS	REMOTE	sr24k_0124	50.979666	-106.794989	559.472154	h6	2019-09-06 14:14:31+00:00	2019-09-06 14:14:33+00:00	48000	24000.0	magnetic	0.0	0.0	millivolts	<HDF5 object reference>	<HDF5 object reference>	<HDF5 object reference>

1000 rows × 18 columns

[9]:

m.close_mth5()

2022-08-31 13:32:58,389 [line 744] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5