To perform many groupings and summaries in Python using pandas, you can use the groupby() method in combination with the agg() method.
Assuming you have a pandas DataFrame with columns that you want to group by and summarize, you can use the following syntax:
import pandas as pd# create a pandas DataFramedf = pd.DataFrame({'Group1': ['A', 'B', 'C', 'A', 'B', 'C'], 'Group2': ['X', 'X', 'Y', 'Y', 'Z', 'Z'], 'Value1': [1, 2, 3, 4, 5, 6], 'Value2': [10, 20, 30, 40, 50, 60]})# group by the 'Group1' and 'Group2' columns and calculate the mean and sum of 'Value1' and 'Value2' columnsgrouped = df.groupby(['Group1', 'Group2']).agg({'Value1': ['mean', 'sum'], 'Value2': ['mean', 'sum']})print(grouped) |
This will group the DataFrame by the 'Group1' and 'Group2' columns and calculate the mean and sum of the 'Value1' and 'Value2' columns for each group. The resulting output will be:
Value1 Value2 mean sum mean sumGroup1 Group2 A Y 4.0 4 40.0 40 X 1.0 1 10.0 10B X 2.0 2 20.0 20 Z 5.0 5 50.0 50C Y 3.0 3 30.0 30 Z 6.0 6 60.0 60 |
You can add as many columns and aggregation functions as you need inside the dictionary passed to the agg() method. You can also pass a list of tuples to the agg() method if you want to apply different aggregation functions to different columns.