Pandas is a popular library for data manipulation in Python. Here are some common operations for manipulating a Pandas DataFrame:

  • Creating a DataFrame: You can create a DataFrame from a list of dictionaries, a dictionary of lists, a list of lists, or from a CSV file using the read_csv() function.
import pandas as pd
 
# create a DataFrame from a list of dictionaries
data = [
    {'name': 'John', 'age': 30, 'gender': 'M'},
    {'name': 'Jane', 'age': 25, 'gender': 'F'},
    {'name': 'Mike', 'age': 35, 'gender': 'M'},
    {'name': 'Susan', 'age': 40, 'gender': 'F'}
]
df = pd.DataFrame(data)
 
# create a DataFrame from a dictionary of lists
data = {
    'name': ['John', 'Jane', 'Mike', 'Susan'],
    'age': [30, 25, 35, 40],
    'gender': ['M', 'F', 'M', 'F']
}
df = pd.DataFrame(data)
 
# create a DataFrame from a list of lists
data = [
    ['John', 30, 'M'],
    ['Jane', 25, 'F'],
    ['Mike', 35, 'M'],
    ['Susan', 40, 'F']
]
df = pd.DataFrame(data, columns=['name', 'age', 'gender'])
 
# create a DataFrame from a CSV file
df = pd.read_csv('data.csv')
  • Viewing a DataFrame: You can view the first few rows of a DataFrame using the head() method or the last few rows using the tail() method.
# view the first few rows of a DataFrame
print(df.head())
 
# view the last few rows of a DataFrame
print(df.tail())
  • Filtering rows: You can filter rows based on a condition using the loc[] or iloc[] methods.
# filter rows based on a condition
df_filtered = df.loc[df['age'] > 30]
 
# filter rows based on a condition and select specific columns
df_filtered = df.loc[df['age'] > 30, ['name', 'age']]
  • Adding columns: You can add a new column to a DataFrame using the square bracket notation or the assign() method.
# add a new column to a DataFrame
df['salary'] = [50000, 60000, 70000, 80000]
 
# add a new column to a DataFrame using the assign() method
df = df.assign(salary=[50000, 60000, 70000, 80000])
 
  • Removing columns: You can remove a column from a DataFrame using the drop() method.
# remove a column from a DataFrame
df = df.drop('salary', axis=1)
  • Grouping data: You can group data by one or more columns and perform aggregate functions using the groupby() method.
# group data by the 'gender' column and calculate the mean age
df_grouped = df.groupby('gender')['age'].mean()
  • Merging data: You can merge two or more DataFrames based on a common column using the merge() method.
# create two DataFrames to merge
df1 = pd.DataFrame({'name': ['John', 'Jane'], 'age': [30, 25]})
df2 = pd.DataFrame({'name': ['John', 'Mike'], 'salary': [50000, 70000]})
 
# merge the DataFrames based on the 'name'