IconResources
Conduct analysis

Remove duplicates

Detect and delete duplicate rows in your dataset.

Example Data

Follow along with right out of the box example data. Copy following data in the information request of the agent you are working in.



It’s common to run into duplicate rows — especially when combining data from different sources. For example, if the same transaction appears twice, it can affect our totals, averages, and any downstream analysis. In this section, we’ll learn how to identify and remove duplicated data.

Remove exact duplicate rows

Excel

In Excel, you use the "Remove Duplicates" feature under the "Data" tab to delete exact matches.

t0 Prompt

Remove duplicates

Drop exact duplicates in the table

Delete repeated rows

Code

The python code looks as follows:

transactions.drop_duplicates(inplace=True)
transactions
FunctionDescription
drop_duplicates()Removes rows that are exact matches

Remove duplicates based on certain columns

Excel

In Excel, you choose which columns to check when removing duplicates — like only looking at transaction date and amount.

t0 Prompt

Remove duplicates based on "Date" and "Amount"

Drop rows with the same amount and seller

Keep only unique combinations of columns

Code

The python code looks as follows:

transactions.drop_duplicates(subset=["Date", "Amount"], inplace=True)
transactions
FunctionDescription
drop_duplicates(subset=[...])Removes duplicates based on selected columns
pd.to_datetime() / pd.to_numeric()Needed before comparing dates or numbers

On this page