Python | DataFrame manipulation techniques in Python

DataFrame manipulation

Pandas is a popular library for data manipulation in Python. Here are some common operations for manipulating a Pandas DataFrame:

Creating a DataFrame: You can create a DataFrame from a list of dictionaries, a dictionary of lists, a list of lists, or from a CSV file using the read_csv() function.

import pandas as pd

# create a DataFrame from a list of dictionaries

data = [

{'name': 'John', 'age': 30, 'gender': 'M'},

{'name': 'Jane', 'age': 25, 'gender': 'F'},

{'name': 'Mike', 'age': 35, 'gender': 'M'},

{'name': 'Susan', 'age': 40, 'gender': 'F'}

]

df = pd.DataFrame(data)

# create a DataFrame from a dictionary of lists

data = {

'name': ['John', 'Jane', 'Mike', 'Susan'],

'age': [30, 25, 35, 40],

'gender': ['M', 'F', 'M', 'F']

}

df = pd.DataFrame(data)

# create a DataFrame from a list of lists

data = [

['John', 30, 'M'],

['Jane', 25, 'F'],

['Mike', 35, 'M'],

['Susan', 40, 'F']

]

df = pd.DataFrame(data, columns=['name', 'age', 'gender'])

# create a DataFrame from a CSV file

df = pd.read_csv('data.csv')

Viewing a DataFrame: You can view the first few rows of a DataFrame using the head() method or the last few rows using the tail() method.

# view the first few rows of a DataFrame

print(df.head())

# view the last few rows of a DataFrame

print(df.tail())

Filtering rows: You can filter rows based on a condition using the loc[] or iloc[] methods.

# filter rows based on a condition

df_filtered = df.loc[df['age'] > 30]

# filter rows based on a condition and select specific columns

df_filtered = df.loc[df['age'] > 30, ['name', 'age']]

Adding columns: You can add a new column to a DataFrame using the square bracket notation or the assign() method.

# add a new column to a DataFrame

df['salary'] = [50000, 60000, 70000, 80000]

# add a new column to a DataFrame using the assign() method

df = df.assign(salary=[50000, 60000, 70000, 80000])

Removing columns: You can remove a column from a DataFrame using the drop() method.

# remove a column from a DataFrame

df = df.drop('salary', axis=1)

Grouping data: You can group data by one or more columns and perform aggregate functions using the groupby() method.

# group data by the 'gender' column and calculate the mean age

df_grouped = df.groupby('gender')['age'].mean()

Merging data: You can merge two or more DataFrames based on a common column using the merge() method.

# create two DataFrames to merge

df1 = pd.DataFrame({'name': ['John', 'Jane'], 'age': [30, 25]})

df2 = pd.DataFrame({'name': ['John', 'Mike'], 'salary': [50000, 70000]})

# merge the DataFrames based on the 'name'