To drop duplicate pairs in a Pandas DataFrame, you can use the .drop_duplicates() method with the subset parameter set to a list of column names. This method removes duplicate rows based on the values in the specified columns.
Here's an example of how to use the .drop_duplicates() method to remove duplicate pairs of names and ages in a DataFrame:
import pandas as pd# Create a DataFramedata = {'Name': ['John', 'Mary', 'Peter', 'Anna', 'John', 'Mike'], 'Age': [25, 32, 18, 47, 25, 23], 'Salary': [50000, 80000, 35000, 65000, 50000, 45000]}df = pd.DataFrame(data)# Drop duplicates based on the 'Name' and 'Age' columnsdf.drop_duplicates(subset=['Name', 'Age'], inplace=True)# Print the DataFrameprint(df) |
In this example, we create a DataFrame with columns for Name, Age, and Salary. We then use the .drop_duplicates() method to remove duplicate pairs of names and ages. The subset parameter specifies that we want to check for duplicates in the 'Name' and 'Age' columns, and the inplace parameter tells Pandas to modify the DataFrame in place rather than creating a new copy. The resulting DataFrame contains only the unique pairs of names and ages.
You can also drop duplicates based on more than two columns by passing a list of column names to the subset parameter. For example, if you want to drop duplicates based on the 'Name', 'Age', and 'Salary' columns, you can do it like this:
# Drop duplicates based on the 'Name', 'Age', and 'Salary' columnsdf.drop_duplicates(subset=['Name', 'Age', 'Salary'], inplace=True)# Print the DataFrameprint(df) |
In this example, we pass a list of column names to the subset parameter to indicate that we want to check for duplicates based on the 'Name', 'Age', and 'Salary' columns. The resulting DataFrame contains only the unique rows based on all three columns.