Make an MTH5 from Phoenix Data
This example demonstrates how to read Phoenix data into an MTH5 file. The data comes from example data in PhoenixGeoPy. Here I downloaded those data into a local folder on my computer by forking the main branch.
Imports
[1]:
from pathlib import Path
from mth5.mth5 import MTH5
from mth5 import read_file
from mth5.io.phoenix import ReceiverMetadataJSON, PhoenixCollection
2022-08-31 13:19:07,431 [line 135] mth5.setup_logger - INFO: Logging file can be found C:\Users\jpeacock\OneDrive - DOI\Documents\GitHub\mth5\logs\mth5_debug.log
Data Directory
Specify the station directory. Phoenix files place each channel in a folder under the station directory named by the channel number. There is also a recmeta.json
file that has metadata output by the receiver that can be useful. In the PhoenixGeopPy/sample_data
there are 2 folders one for native data, these are .bin
files which are the raw data in counts sampled at 24k. There is also a folder for segmented files, these files are calibrated to millivolts and decimated or segmented data
according to the recording configuration. Most of the time you would use the segmented files?
[2]:
station_dir = Path(r"c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630")
File Collection
We’ve developed a collection dataframe to help sort out which files are which and which files can be grouped together into runs. Continous runs will be given a single run name and segmented data will have sequential run names. Both will have the pattern sr{sample_rate}_####
for the run.id
.
Here PhoenixCollection.get_runs
returns a two level ordered dictionary (OrderedDict
). The first level is keyed by station ID. These objects are in turn ordered dictionaries by run ID. Therefore you can loop over stations and runs.
Receiver Metadata
The data logger or receiver will output a JSON
file that contains useful metadata that is missing from the data files. The recmeta.json
file can be read into an object with methods to translate to mt_metadata
objects. This is read in by PhoenixCollection
and is in the attribute receiver_metadata
.
[3]:
phx_collection = PhoenixCollection(file_path=station_dir)
run_dict = phx_collection.get_runs(sample_rates=[150, 24000])
Initiate MTH5
First initiate an MTH5 file, can use the receiver metadata to fill in some Survey
metadata
[4]:
m = MTH5()
m.open_mth5(station_dir.joinpath("mth5_from_phoenix.h5"), "w")
2022-08-31 13:19:32,755 [line 596] mth5.mth5.MTH5.open_mth5 - WARNING: mth5_from_phoenix.h5 will be overwritten in 'w' mode
2022-08-31 13:19:33,267 [line 663] mth5.mth5.MTH5._initialize_file - INFO: Initialized MTH5 0.2.0 file c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5 in mode w
[5]:
survey_metadata = phx_collection.receiver_metadata.survey_metadata
survey_group = m.add_survey(survey_metadata.id)
Loop through Stations, Runs, and Channels
Using the run_dict
value output by the PhoenixCollection.get_runs
we can simply loop through the runs without knowing if the data are continous or discontinuous, the read_file
will take care of that.
Users should note the Phoenix file structure. Inside the folder are files with extensions of .td_24k
and td_150
.
.td_24k
are usually bursts of a few seconds of data sampled at 24k samples per second to get high frequency information. The returned object is amth5.timeseries.ChannelTS
.td_150
is data continuously sampled at 150 samples per second. These files usually have a set length, commonly an hour. The returned object is amth5.timeseries.ChannelTS
.
[6]:
%%time
for station_id, station_dict in run_dict.items():
station_metadata = phx_collection.receiver_metadata.station_metadata
station_group = survey_group.stations_group.add_station(
station_metadata.id,
station_metadata=station_metadata
)
for run_id, run_df in station_dict.items():
run_metadata = phx_collection.receiver_metadata.run_metadata
run_metadata.id = run_id
run_metadata.sample_rate = float(run_df.sample_rate.unique()[0])
run_group = station_group.add_run(run_metadata.id, run_metadata=run_metadata)
for row in run_df.itertuples():
ch_ts = read_file(row.fn, **{"channel_map":phx_collection.receiver_metadata.channel_map})
ch_metadata = phx_collection.receiver_metadata.get_ch_metadata(
ch_ts.channel_metadata.channel_number
)
# need to update the time period and sample rate as estimated from the data not the metadata
ch_metadata.sample_rate = ch_ts.sample_rate
ch_metadata.time_period.update(ch_ts.channel_metadata.time_period)
ch_ts.channel_metadata.update(ch_metadata)
# add channel to the run group
ch_dataset = run_group.from_channel_ts(ch_ts)
Wall time: 3min 57s
Update metadata before closing
Need to update the metadata to account for added stations, runs, and channels.
[7]:
%%time
station_group.validate_station_metadata()
station_group.write_metadata()
survey_group.update_survey_metadata()
survey_group.write_metadata()
Wall time: 9min 19s
[8]:
m.channel_summary.summarize()
m.channel_summary.to_dataframe()
[8]:
survey | station | run | latitude | longitude | elevation | component | start | end | n_samples | sample_rate | measurement_type | azimuth | tilt | units | hdf5_reference | run_hdf5_reference | station_hdf5_reference | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | UofS | REMOTE | sr150_0001 | 50.979666 | -106.794989 | 559.472154 | e1 | 2019-09-05 18:56:30+00:00 | 2019-09-06 07:15:38+00:00 | 6652200 | 150.0 | electric | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
1 | UofS | REMOTE | sr150_0001 | 50.979666 | -106.794989 | 559.472154 | e2 | 2019-09-05 18:56:30+00:00 | 2019-09-06 07:15:38+00:00 | 6652200 | 150.0 | electric | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
2 | UofS | REMOTE | sr150_0001 | 50.979666 | -106.794989 | 559.472154 | h1 | 2019-09-05 18:56:30+00:00 | 2019-09-06 07:15:38+00:00 | 6652200 | 150.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
3 | UofS | REMOTE | sr150_0001 | 50.979666 | -106.794989 | 559.472154 | h2 | 2019-09-05 18:56:30+00:00 | 2019-09-06 07:15:38+00:00 | 6652200 | 150.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
4 | UofS | REMOTE | sr150_0001 | 50.979666 | -106.794989 | 559.472154 | h3 | 2019-09-05 18:56:30+00:00 | 2019-09-06 07:15:38+00:00 | 6652200 | 150.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
995 | UofS | REMOTE | sr24k_0124 | 50.979666 | -106.794989 | 559.472154 | h2 | 2019-09-06 14:14:31+00:00 | 2019-09-06 14:14:33+00:00 | 48000 | 24000.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
996 | UofS | REMOTE | sr24k_0124 | 50.979666 | -106.794989 | 559.472154 | h3 | 2019-09-06 14:14:31+00:00 | 2019-09-06 14:14:33+00:00 | 48000 | 24000.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
997 | UofS | REMOTE | sr24k_0124 | 50.979666 | -106.794989 | 559.472154 | h4 | 2019-09-06 14:14:31+00:00 | 2019-09-06 14:14:33+00:00 | 48000 | 24000.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
998 | UofS | REMOTE | sr24k_0124 | 50.979666 | -106.794989 | 559.472154 | h5 | 2019-09-06 14:14:31+00:00 | 2019-09-06 14:14:33+00:00 | 48000 | 24000.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
999 | UofS | REMOTE | sr24k_0124 | 50.979666 | -106.794989 | 559.472154 | h6 | 2019-09-06 14:14:31+00:00 | 2019-09-06 14:14:33+00:00 | 48000 | 24000.0 | magnetic | 0.0 | 0.0 | millivolts | <HDF5 object reference> | <HDF5 object reference> | <HDF5 object reference> |
1000 rows × 18 columns
[9]:
m.close_mth5()
2022-08-31 13:32:58,389 [line 744] mth5.mth5.MTH5.close_mth5 - INFO: Flushing and closing c:\Users\jpeacock\OneDrive - DOI\mt\phoenix_example_data\10291_2019-09-06-015630\mth5_from_phoenix.h5