HDF5 files are a good format for storing and analyzing large datasets, including earthquake data. We can use the h5py library in Python to read and write HDF5 files.
Here is an example of how to read earthquake data from a CSV file, store it in an HDF5 file, and then read it back into a NumPy array:
import pandas as pdimport numpy as npimport h5py# Load the earthquake data into a pandas DataFramedf = pd.read_csv('C:\emg\myprojects\python\datacamp\earthquake_data.csv')# Store the earthquake data in an HDF5 filewith h5py.File('earthquake_data.h5', 'w') as f: f.create_dataset('time', data=df['time'].to_numpy()) f.create_dataset('latitude', data=df['latitude'].to_numpy()) f.create_dataset('longitude', data=df['longitude'].to_numpy()) f.create_dataset('depth', data=df['depth'].to_numpy()) f.create_dataset('magnitude', data=df['mag'].to_numpy())# Read the earthquake data from the HDF5 file into a NumPy arraywith h5py.File('earthquake_data.h5', 'r') as f: time = f['time'][:] latitude = f['latitude'][:] longitude = f['longitude'][:] depth = f['depth'][:] magnitude = f['magnitude'][:]import pandas as pdimport numpy as npimport h5py# Load the earthquake data into a pandas DataFramedf = pd.read_csv('earthquake_data.csv')# Store the earthquake data in an HDF5 filewith h5py.File('earthquake_data.h5', 'w') as f: f.create_dataset('time', data=df['time'].to_numpy()) f.create_dataset('latitude', data=df['latitude'].to_numpy()) f.create_dataset('longitude', data=df['longitude'].to_numpy()) f.create_dataset('depth', data=df['depth'].to_numpy()) f.create_dataset('magnitude', data=df['mag'].to_numpy())# Read the earthquake data from the HDF5 file into a NumPy arraywith h5py.File('earthquake_data.h5', 'r') as f: time = f['time'][:] latitude = f['latitude'][:] longitude = f['longitude'][:] depth = f['depth'][:] magnitude = f['magnitude'][:] |
In this example, we load the earthquake data into a pandas DataFrame using pd.read_csv(), store the data in an HDF5 file using h5py.File(), and then read the data back into NumPy arrays using h5py.File(). Note that we use the to_numpy() method of a pandas Series to convert it to a NumPy array before storing it in the HDF5 file.
Once we have the earthquake data in a NumPy array, we can use various tools and libraries to analyze the data. For example, we can use matplotlib to create a scatter plot of earthquake locations:
import matplotlib.pyplot as plt# Create a scatter plot of earthquake locationsplt.scatter(longitude, latitude, c=magnitude, cmap='magma')plt.xlabel('Longitude')plt.ylabel('Latitude')plt.colorbar(label='Magnitude')plt.show() |
In this example, we use plt.scatter() to create a scatter plot of earthquake locations, with the magnitude of each earthquake represented by the color of the point. We use plt.colorbar() to add a colorbar to the plot.