When using pd.read_csv() with the chunksize parameter, you can examine each chunk of data before processing it. This can be useful for debugging or understanding the structure of the data.
Here's an example of how to examine a chunk:
import pandas as pd# specify the file pathfile_path = 'large_file.csv'# specify the chunksize (number of rows to read at a time)chunksize = 1000# initialize an empty list to store the chunkschunks = []# loop over the file and read each chunkfor i, chunk in enumerate(pd.read_csv(file_path, chunksize=chunksize)): # examine the first few rows of the chunk print(f"Chunk {i}:") print(chunk.head()) # append the chunk to the list of chunks chunks.append(chunk)# concatenate the chunks into a single DataFramedf = pd.concat(chunks, ignore_index=True)# do further processing on the complete DataFrame# ... |
In this example, we loop over the file using pd.read_csv() and use the enumerate() function to get the index of each chunk.
Inside the loop, we examine the first few rows of the chunk using the head() method of the DataFrame. This prints the first 5 rows of the chunk by default.
We then append the chunk to the list of chunks as before.
After processing all the chunks, we concatenate them into a single DataFrame using pd.concat() and do further processing on the complete DataFrame.
By examining each chunk, we can get a better understanding of the structure of the data and how to process it.