Pandas is a popular library for data manipulation in Python. Here are some common operations for manipulating a Pandas DataFrame:
read_csv() function.
import pandas as pd# create a DataFrame from a list of dictionariesdata = [ {'name': 'John', 'age': 30, 'gender': 'M'}, {'name': 'Jane', 'age': 25, 'gender': 'F'}, {'name': 'Mike', 'age': 35, 'gender': 'M'}, {'name': 'Susan', 'age': 40, 'gender': 'F'}]df = pd.DataFrame(data)# create a DataFrame from a dictionary of listsdata = { 'name': ['John', 'Jane', 'Mike', 'Susan'], 'age': [30, 25, 35, 40], 'gender': ['M', 'F', 'M', 'F']}df = pd.DataFrame(data)# create a DataFrame from a list of listsdata = [ ['John', 30, 'M'], ['Jane', 25, 'F'], ['Mike', 35, 'M'], ['Susan', 40, 'F']]df = pd.DataFrame(data, columns=['name', 'age', 'gender'])# create a DataFrame from a CSV filedf = pd.read_csv('data.csv') |
head() method or the last few rows using the tail() method.
# view the first few rows of a DataFrameprint(df.head())# view the last few rows of a DataFrameprint(df.tail()) |
loc[] or iloc[] methods.
# filter rows based on a conditiondf_filtered = df.loc[df['age'] > 30]# filter rows based on a condition and select specific columnsdf_filtered = df.loc[df['age'] > 30, ['name', 'age']] |
assign() method.
# add a new column to a DataFramedf['salary'] = [50000, 60000, 70000, 80000]# add a new column to a DataFrame using the assign() methoddf = df.assign(salary=[50000, 60000, 70000, 80000]) |
drop() method.
# remove a column from a DataFramedf = df.drop('salary', axis=1) |
groupby() method.
# group data by the 'gender' column and calculate the mean agedf_grouped = df.groupby('gender')['age'].mean() |
merge() method.
# create two DataFrames to mergedf1 = pd.DataFrame({'name': ['John', 'Jane'], 'age': [30, 25]})df2 = pd.DataFrame({'name': ['John', 'Mike'], 'salary': [50000, 70000]})# merge the DataFrames based on the 'name' |