IconResources
Conduct analysis

Extract text

Pull specific parts of a text column, such as words or character positions.

Example Data

Follow along with right out of the box example data. Copy following data in the information request of the agent you are working in.



Text columns often contain more structure than meets the eye. For example, a company name might include a legal suffix like "S.r.l." or "Inc.", and we might want to extract just that part. Or we might want the first word of a description, or the last few characters of a code. These operations help turn messy strings into useful features.

In this section, we’ll learn how to extract specific words or character positions from text using Python.

Extract characters from the start or end

Excel

In Excel, you use =LEFT() or =RIGHT() to extract characters from the beginning or end of a string.

t0 Prompt

Get the first 5 letters of the seller name

Extract the last 4 characters from the buyer column

Show me the beginning of each company name

Code

The python code looks as follows:

transactions["Seller Start"] = transactions["Seller"].str[:5]
transactions["Buyer End"] = transactions["Buyer"].str[-4:]
transactions

Extract words by position

Excel

In Excel, you use =MID() or =TEXTSPLIT() to get words by position, often with helper formulas like FIND().

t0 Prompt

Extract the first word of the seller column

Give me the second word in buyer

Get just the legal suffix like Inc., AB, or S.r.l.

Code

The python code looks as follows:

transactions["First Word"] = transactions["Seller"].str.split().str[0]
transactions["Suffix"] = transactions["Seller"].str.split().str[-1]
transactions

Extract text using custom rules

Excel

In Excel, complex extractions often require nested formulas, like =MID(A1,FIND(" ",A1)+1,...).

t0 Prompt

Get everything after "Solara Dynamics"

Remove the first word from the name

Extract custom text pattern X

Code

The python code looks as follows:

transactions["Legal Form"] = transactions["Seller"].str.replace("Solara Dynamics ", "", regex=False)
transactions
FunctionDescription
str[:N]Extract first N characters
str[-N:]Extract last N characters
str.split().str[i]Get the i-th word (0 = first)
str.replace("x", "")Replace or remove specific text

On this page