The .agg() method in Python is used to aggregate or summarize data in a DataFrame. It is a versatile method that allows you to perform multiple aggregation functions on one or more columns of a DataFrame.
Here's an example of how to use the .agg() method to summarize data:
|
import pandas as pd
# Create a DataFrame
data = {'Name': ['John', 'Mary', 'Peter', 'Anna', 'Mike'],
'Age': [25, 32, 18, 47, 23],
'Salary': [50000, 80000, 35000, 65000, 45000]}
df = pd.DataFrame(data)
# Use the .agg() method to calculate multiple summary statistics
summary = df[['Age', 'Salary']].agg(['mean', 'median', 'min', 'max'])
# Print the summary
print(summary)
|
In this example, we create a DataFrame with columns for Name, Age, and Salary. We then use the .agg() method to calculate the mean, median, minimum, and maximum values for the Age and Salary columns. The resulting summary DataFrame contains four rows (one for each summary statistic) and two columns (one for Age and one for Salary).
You can also use the .agg() method to apply custom aggregation functions. For example, if you have a custom function called my_func that you want to apply to a column of a DataFrame, you can do it like this:
def my_func(x):
# Custom aggregation function
return x.sum() / x.count() summary = df[['Age', 'Salary']].agg(my_func) In this example, we define a custom function my_func that calculates the average of a column by dividing the sum of the values by the count. We then use the .agg() method to apply this function to the Age and Salary columns of the DataFrame. The resulting summary DataFrame contains one row (the output of the custom function) and two columns (one for Age and one for Salary).