Remove duplicates
Detect and delete duplicate rows in your dataset.
Example Data
Follow along with right out of the box example data. Copy following data in the information request of the agent you are working in.
It’s common to run into duplicate rows — especially when combining data from different sources. For example, if the same transaction appears twice, it can affect our totals, averages, and any downstream analysis. In this section, we’ll learn how to identify and remove duplicated data.
Remove exact duplicate rows
Excel
In Excel, you use the "Remove Duplicates" feature under the "Data" tab to delete exact matches.
t0 Prompt
Remove duplicates
Drop exact duplicates in the table
Delete repeated rows
Code
The python code looks as follows:
Function | Description |
---|---|
drop_duplicates() | Removes rows that are exact matches |
Remove duplicates based on certain columns
Excel
In Excel, you choose which columns to check when removing duplicates — like only looking at transaction date and amount.
t0 Prompt
Remove duplicates based on "Date" and "Amount"
Drop rows with the same amount and seller
Keep only unique combinations of columns
Code
The python code looks as follows:
Function | Description |
---|---|
drop_duplicates(subset=[...]) | Removes duplicates based on selected columns |
pd.to_datetime() / pd.to_numeric() | Needed before comparing dates or numbers |