pd.read_csv() is a powerful method for reading CSV files into a Pandas DataFrame. However, it can be memory-intensive when working with large files. To handle large files, you can use the chunksize parameter of pd.read_csv() to read the file in smaller chunks and process each chunk individually.
Here's an example of using pd.read_csv() with chunksize:
import pandas as pd# specify the file pathfile_path = 'large_file.csv'# specify the chunksize (number of rows to read at a time)chunksize = 1000# initialize an empty list to store the chunkschunks = []# loop over the file and read each chunkfor chunk in pd.read_csv(file_path, chunksize=chunksize): # process the chunk here (e.g. filter rows, compute statistics) chunks.append(chunk)# concatenate the chunks into a single DataFramedf = pd.concat(chunks, ignore_index=True)# do further processing on the complete DataFrame# ... |
In this example, we first specify the file path and the chunksize parameter (1000 rows in this case). We then initialize an empty list to store the chunks and loop over the file using pd.read_csv().
Inside the loop, we can process each chunk as needed. For example, we can filter rows or compute statistics on the chunk. We then append the chunk to the list of chunks.
After processing all the chunks, we concatenate them into a single DataFrame using pd.concat() and do further processing on the complete DataFrame.
By using pd.read_csv() with chunksize, we can work with large files without having to load the entire file into memory at once.