To drop duplicate names in a Pandas DataFrame, you can use the .drop_duplicates() method. This method removes duplicate rows based on the values in one or more columns.
Here's an example of how to use the .drop_duplicates() method to remove duplicate rows based on the 'Name' column in a DataFrame:
import pandas as pd# Create a DataFramedata = {'Name': ['John', 'Mary', 'Peter', 'Anna', 'John', 'Mike'], 'Age': [25, 32, 18, 47, 25, 23], 'Salary': [50000, 80000, 35000, 65000, 50000, 45000]}df = pd.DataFrame(data)# Drop duplicates based on the 'Name' columndf.drop_duplicates(subset='Name', inplace=True)# Print the DataFrameprint(df) |
In this example, we create a DataFrame with columns for Name, Age, and Salary. We then use the .drop_duplicates() method to remove duplicate rows based on the 'Name' column. The subset parameter specifies that we want to check for duplicates in the 'Name' column, and the inplace parameter tells Pandas to modify the DataFrame in place rather than creating a new copy. The resulting DataFrame contains only the unique rows based on the 'Name' column.
You can also drop duplicates based on multiple columns by passing a list of column names to the subset parameter. For example, if you want to drop duplicates based on both the 'Name' and 'Age' columns, you can do it like this:
|
|
In this example, we pass a list of column names to the subset parameter to indicate that we want to check for duplicates based on both the 'Name' and 'Age' columns. The resulting DataFrame contains only the unique rows based on both columns.