HDF5 (Hierarchical Data Format version 5) is a file format and data model designed for efficient storage and retrieval of large and complex data. It is commonly used in scientific and engineering applications to store and share large datasets, such as climate data, satellite imagery, and simulation output.
HDF5 files store data in a hierarchical structure consisting of groups, datasets, and attributes. Groups are used to organize datasets and other groups in a tree-like structure, while datasets contain the actual data values. Attributes are used to store metadata or additional information about groups and datasets.
HDF5 files can store various types of data, including numerical data (integers, floating-point numbers), text data, and binary data. The format supports compression, chunking, and parallel I/O, which enables efficient storage and retrieval of large datasets.
In Python, HDF5 files can be accessed using the h5py library, which provides a Pythonic interface for reading and writing HDF5 files. Here is an example of how to create an HDF5 file and write data to it using h5py:
import h5pyimport numpy as np# Create an HDF5 filewith h5py.File('data.h5', 'w') as f: # Create a group group = f.create_group('my_group') # Create a dataset and write data to it data = np.random.randn(100, 100) dset = group.create_dataset('my_dataset', data=data) # Add an attribute to the dataset dset.attrs['description'] = 'Random data' |
In this example, the h5py.File() function is used to create an HDF5 file in write mode. The create_group() method is then used to create a group within the file, and the create_dataset() method is used to create a dataset within the group and write some random data to it. An attribute is also added to the dataset using the attrs attribute.
HDF5 files can also be read and modified using h5py. Here is an example of how to read the data from the HDF5 file created in the previous example:
# Open an existing HDF5 filewith h5py.File('data.h5', 'r') as f: # Access the dataset and read the data data = f['my_group/my_dataset'][:] # Access the attribute of the dataset description = f['my_group/my_dataset'].attrs['description'] |
In this example, the h5py.File() function is used to open an existing HDF5 file in read mode. The data from the my_dataset dataset is then read using slicing, and the description attribute of the dataset is accessed using the attrs attribute.