Dask is a library for parallel computing in Python, designed to work with large datasets that cannot fit in memory. Here are some basic operations you can perform with Dask arrays:
import dask.array as daimport numpy as np# Create a Dask arrayx = da.from_array(np.random.rand(10000, 10000), chunks=(1000, 1000))# Access elements of the arrayprint(x[0, 0].compute()) # 0.3189835058494385print(x[1, 2].compute()) # 0.35982346321845144# Reshape an arrayy = x.reshape((100, 100, 100, 100)).rechunk((1000, 1000, 1, 1))# Compute sum and mean of an arrayprint(y.sum().compute())print(y.mean().compute())# Compute dot productz = da.dot(x, x.T)print(z.compute()) |
In this example, we first create a Dask array x from a NumPy array using the from_array function. We specify the chunks parameter to indicate that we want to split the array into 1000x1000 chunks.
We can access elements of the array using the indexing syntax x[i, j].compute(). The compute method is used to actually perform the computation and obtain the result.
We can reshape a Dask array using the reshape method, just like with NumPy arrays. We can also use the rechunk method to change the chunk size of the array.
We can compute the sum and mean of a Dask array using the sum and mean methods, respectively. We use the compute method to actually perform the computation and obtain the result.
We can compute the dot product of two Dask arrays using the dot function, just like with NumPy arrays.