To group by multiple variables in Python using pandas, you can pass a list of column names to the groupby() method.
Assuming you have a pandas DataFrame with columns that you want to group by and summarize, you can use the following syntax:
import pandas as pd# create a pandas DataFramedf = pd.DataFrame({'Group1': ['A', 'B', 'C', 'A', 'B', 'C'], 'Group2': ['X', 'X', 'Y', 'Y', 'Z', 'Z'], 'Value': [1, 2, 3, 4, 5, 6]})# group by the 'Group1' and 'Group2' columns and calculate the mean of the 'Value' columngrouped = df.groupby(['Group1', 'Group2']).mean()print(grouped) |
This will group the DataFrame by the 'Group1' and 'Group2' columns and calculate the mean of the 'Value' column for each group. The resulting output will be:
ValueGroup1 Group2 A Y 4.0 X 1.0B X 2.0 Z 5.0C Y 3.0 Z 6.0 |
You can also use other aggregation functions like sum(), count(), min(), max(), etc. to summarize your data by group. Additionally, you can pass a dictionary of column names and aggregation functions to the agg() method to calculate multiple summaries for each group.