Cumulative statistics refer to the summary statistics calculated on a cumulative basis for a given data series. These statistics provide insights into how the data has accumulated over time.
In Pandas, cumulative statistics can be calculated using the .cumsum(), .cummax(), .cummin(), and .cumprod() methods. These methods compute the cumulative sum, maximum, minimum, and product of a given data series, respectively.
Here's an example of how to use the .cumsum() method to calculate the cumulative sum of a column in a DataFrame:
import pandas as pd# Create a DataFramedata = {'Year': [2010, 2011, 2012, 2013, 2014], 'Sales': [10000, 12000, 15000, 18000, 20000]}df = pd.DataFrame(data)# Calculate the cumulative sum of Sales columndf['Cumulative Sales'] = df['Sales'].cumsum()# Print the DataFrameprint(df) |
In this example, we create a DataFrame with two columns Year and Sales. We then use the .cumsum() method to calculate the cumulative sum of Sales column and store it in a new column named "Cumulative Sales". The resulting DataFrame contains the original columns Year and Sales along with a new column "Cumulative Sales" representing the cumulative sum of Sales.
Similarly, you can use the .cummax(), .cummin(), and .cumprod() methods to calculate the cumulative maximum, minimum, and product of a given data series, respectively. Here's an example of how to use the .cummax() method to calculate the cumulative maximum of a column in a DataFrame:
# Calculate the cumulative maximum of Sales columndf['Cumulative Max Sales'] = df['Sales'].cummax()# Print the DataFrameprint(df) |
In this example, we use the .cummax() method to calculate the cumulative maximum of Sales column and store it in a new column named "Cumulative Max Sales". The resulting DataFrame contains the original columns Year and Sales along with two new columns "Cumulative Sales" and "Cumulative Max Sales".