Time series data is a sequence of data points that are ordered by time. In Python, you can work with time series data using a variety of libraries, including NumPy.
To create a NumPy array of time series data, you can use the numpy.datetime64 data type to represent the time index, and use a two-dimensional NumPy array to store the data. Here is an example of how to create a NumPy array of time series data:
import numpy as np# Create a datetime64 index for the time series dataindex = np.arange('2022-01-01', '2022-01-11', dtype='datetime64[D]')# Create a two-dimensional NumPy array to store the datadata = np.array([ [1.2, 2.3, 3.4, 4.5, 5.6, 6.7, 7.8, 8.9, 9.0, 10.1], [11.2, 12.3, 13.4, 14.5, 15.6, 16.7, 17.8, 18.9, 19.0, 20.1]])# Create a dictionary to store the column namesnames = { 0: 'column1', 1: 'column2'}# Create a structured array to store the time series datadtype = np.dtype([ ('index', 'datetime64[D]'), ('column1', 'float64'), ('column2', 'float64')])ts_data = np.empty((len(index),), dtype=dtype)ts_data['index'] = indexts_data['column1'] = data[0]ts_data['column2'] = data[1]# Print the time series dataprint(ts_data) |
In this example, we first create a numpy.datetime64 index for the time series data, with daily frequency ([D]). We then create a two-dimensional NumPy array to store the data, with two columns and ten rows. We also create a dictionary to store the column names, and use this dictionary to create a structured data type for the time series data. Finally, we create an empty structured array with the specified data type, and fill it with the time series data.
Once you have created a NumPy array of time series data, you can perform various operations on it, such as indexing, slicing, and aggregation. For example, you can use the numpy.where function to select rows that meet a certain condition based on one of the columns:
# Select rows where column1 is greater than 5.0selected_data = ts_data[np.where(ts_data['column1'] > 5.0)]print(selected_data) |
This will select all rows where column1 is greater than 5.0. You can also use the numpy.mean function to compute the mean of one of the columns:
# Compute the mean of column2mean_column2 = np.mean(ts_data['column2'])print(mean_column2) |
This will compute the mean of column2 across all rows in the time series data.