Loading data in chunks is a common strategy when dealing with large datasets. This approach allows you to process the data in smaller pieces and avoid consuming too much memory at once.
In Python, you can use the pandas library to load data in chunks from a CSV file. Here's an example:
import pandas as pd# Define chunk sizechunk_size = 1000# Create a CSV reader objectcsv_reader = pd.read_csv('large_file.csv', chunksize=chunk_size)# Process each chunkfor i, chunk in enumerate(csv_reader): # Do something with the chunk print(f'Processing chunk {i+1}') |
In this example, we first define the chunk size as 1000 rows. We then create a CSV reader object using the pd.read_csv() function, passing the file name and the chunk size as arguments. This function returns an iterator that we can loop over to read the data in chunks.
Inside the loop, we process each chunk by doing something with it. In this example, we simply print a message indicating the chunk number. You can modify this code to perform any data processing or analysis you need.
Note that the pd.read_csv() function supports many arguments that allow you to customize how the data is loaded and processed. For example, you can specify the delimiter, encoding, and header row using the delimiter, encoding, and header arguments, respectively. You can also skip rows, select columns, and perform other data cleaning and transformation operations using the various options available.