Pandas is a popular library for data manipulation in Python. Here are some common operations for manipulating a Pandas DataFrame:
- Creating a DataFrame: You can create a DataFrame from a list of dictionaries, a dictionary of lists, a list of lists, or from a CSV file using the
read_csv()function.
import pandas as pd# create a DataFrame from a list of dictionariesdata = [ {'name': 'John', 'age': 30, 'gender': 'M'}, {'name': 'Jane', 'age': 25, 'gender': 'F'}, {'name': 'Mike', 'age': 35, 'gender': 'M'}, {'name': 'Susan', 'age': 40, 'gender': 'F'}]df = pd.DataFrame(data)# create a DataFrame from a dictionary of listsdata = { 'name': ['John', 'Jane', 'Mike', 'Susan'], 'age': [30, 25, 35, 40], 'gender': ['M', 'F', 'M', 'F']}df = pd.DataFrame(data)# create a DataFrame from a list of listsdata = [ ['John', 30, 'M'], ['Jane', 25, 'F'], ['Mike', 35, 'M'], ['Susan', 40, 'F']]df = pd.DataFrame(data, columns=['name', 'age', 'gender'])# create a DataFrame from a CSV filedf = pd.read_csv('data.csv') |
- Viewing a DataFrame: You can view the first few rows of a DataFrame using the
head()method or the last few rows using thetail()method.
# view the first few rows of a DataFrameprint(df.head())# view the last few rows of a DataFrameprint(df.tail()) |
- Filtering rows: You can filter rows based on a condition using the
loc[]oriloc[]methods.
# filter rows based on a conditiondf_filtered = df.loc[df['age'] > 30]# filter rows based on a condition and select specific columnsdf_filtered = df.loc[df['age'] > 30, ['name', 'age']] |
- Adding columns: You can add a new column to a DataFrame using the square bracket notation or the
assign()method.
# add a new column to a DataFramedf['salary'] = [50000, 60000, 70000, 80000]# add a new column to a DataFrame using the assign() methoddf = df.assign(salary=[50000, 60000, 70000, 80000]) |
- Removing columns: You can remove a column from a DataFrame using the
drop()method.
# remove a column from a DataFramedf = df.drop('salary', axis=1) |
- Grouping data: You can group data by one or more columns and perform aggregate functions using the
groupby()method.
# group data by the 'gender' column and calculate the mean agedf_grouped = df.groupby('gender')['age'].mean() |
- Merging data: You can merge two or more DataFrames based on a common column using the
merge()method.
# create two DataFrames to mergedf1 = pd.DataFrame({'name': ['John', 'Jane'], 'age': [30, 25]})df2 = pd.DataFrame({'name': ['John', 'Mike'], 'salary': [50000, 70000]})# merge the DataFrames based on the 'name' |