To summarize multiple columns of a DataFrame in Python, you can use the .agg() method with a dictionary. The keys of the dictionary should be the column names, and the values should be the summary statistics you want to calculate.
Here's an example of how to use the .agg() method to summarize multiple columns:
import pandas as pd# Create a DataFramedata = {'Name': ['John', 'Mary', 'Peter', 'Anna', 'Mike'], 'Age': [25, 32, 18, 47, 23], 'Salary': [50000, 80000, 35000, 65000, 45000]}df = pd.DataFrame(data)# Use the .agg() method with a dictionary to calculate multiple summary statisticssummary = df.agg({'Age': ['mean', 'median', 'min', 'max'], 'Salary': ['mean', 'median', 'min', 'max']})# Print the summaryprint(summary) |
In this example, we create a DataFrame with columns for Name, Age, and Salary. We then use the .agg() method with a dictionary to calculate the mean, median, minimum, and maximum values for the Age and Salary columns. The resulting summary DataFrame contains four rows (one for each summary statistic) and two columns (one for Age and one for Salary).
You can also use the .agg() method with custom aggregation functions. For example, if you have a custom function called my_func that you want to apply to the Age and Salary columns of the DataFrame, you can do it like this:
def my_func(x): # Custom aggregation function return x.sum() / x.count()summary = df.agg({'Age': my_func, 'Salary': my_func}) |
In this example, we define a custom function my_func that calculates the average of a column by dividing the sum of the values by the count. We then use the .agg() method with a dictionary to apply this function to the Age and Salary columns of the DataFrame. The resulting summary DataFrame contains one row (the output of the custom function) and two columns (one for Age and one for Salary).