You can use a generator expression to compute aggregates on a large dataset without loading the entire dataset into memory at once. For example, you might use a generator expression to compute the sum of a column in a large CSV file.
Here's an example:
import pandas as pddf = pd.read_csv('my_data.csv')# compute the sum of the 'sales' column using a generator expressionsales_sum = sum(row['sales'] for index, row in df.iterrows())print('Total sales:', sales_sum) |
In this example, we first read in a CSV file called my_data.csv using the pd.read_csv() method and assign the resulting DataFrame to a variable called df.
We can then use a generator expression to compute the sum of the 'sales' column without loading the entire dataset into memory at once. The expression iterates over each row in the DataFrame using the df.iterrows() method and extracts the 'sales' value from each row. The sum() function is then used to compute the total.
Finally, we print the total sales to the console using the print() function.
Note that this approach can be slower than computing aggregates directly on the DataFrame, especially for small datasets. However, it can be useful for very large datasets that do not fit into memory.