The cumulative sum of a column in a Pandas DataFrame can be calculated using the .cumsum() method. This method returns a new Series or DataFrame where each element represents the cumulative sum up to and including that element.
Here's an example of how to use the .cumsum() method to calculate the cumulative sum of a column in a DataFrame:
import pandas as pd# Create a DataFramedata = {'Name': ['John', 'Mary', 'Peter', 'Anna', 'Mike'], 'Age': [25, 32, 18, 47, 23], 'Salary': [50000, 80000, 35000, 65000, 45000]}df = pd.DataFrame(data)# Calculate the cumulative sum of the Age columncumulative_sum = df['Age'].cumsum()# Print the cumulative sumprint(cumulative_sum) |
In this example, we create a DataFrame with columns for Name, Age, and Salary. We then use the .cumsum() method to calculate the cumulative sum of the Age column. The resulting Series contains one element for each row of the DataFrame, representing the cumulative sum of the Age column up to and including that row.
You can also calculate the cumulative sum of multiple columns in a DataFrame by applying the .cumsum() method to a subset of the DataFrame that contains the columns you want to summarize. For example, if you want to calculate the cumulative sum of the Age and Salary columns, you can do it like this:
# Calculate the cumulative sum of the Age and Salary columnscumulative_sum = df[['Age', 'Salary']].cumsum()# Print the cumulative sumprint(cumulative_sum) |
In this example, we use double square brackets to create a new DataFrame that contains only the Age and Salary columns. We then apply the .cumsum() method to this subset of the DataFrame to calculate the cumulative sum of each column. The resulting DataFrame contains one row for each row of the original DataFrame, and two columns (one for Age and one for Salary) representing the cumulative sum up to and including that row.