Calculating summary statistics across columns in pandas can be done using methods such as mean(), median(), sum(), min(), max(), var(), std(), and describe().
These methods allow you to perform summary statistics across columns of a DataFrame, which is useful when you want to quickly calculate and compare statistics for multiple variables.
Here's an example. Let's say you have a DataFrame df with columns 'A', 'B', and 'C', and you want to calculate the mean, median, and standard deviation for each column:
import pandas as pdimport numpy as np# create a sample DataFramedf = pd.DataFrame({'A': [1, 2, 3, 4, 5], 'B': [6, 7, 8, 9, 10], 'C': [11, 12, 13, 14, 15]})# calculate mean, median, and standard deviation across columnsmean = df.mean()median = df.median()std = df.std()# print the resultsprint('Mean:\n', mean)print('Median:\n', median)print('Standard deviation:\n', std) |
The output will be:
Mean: A 3.0 B 8.0 C 13.0 dtype: float64 Median: A 3.0 B 8.0 C 13.0 dtype: float64 Standard deviation: A 1.581139 B 1.581139 C 1.581139 dtype: float64 |
Here, we use the mean(), median(), and std() methods on the DataFrame df to calculate the mean, median, and standard deviation across columns. We then print the results.
In addition to these methods, you can also use sum(), min(), max(), var(), and describe() to perform other summary statistics across columns in pandas.
These methods are very useful when working with large datasets where you need to quickly calculate summary statistics across multiple variables to gain insights and make decisions.