AI Agents - Fuzzy Matching Agent

Introduction

The Fuzzy Matching Agent can dynamically join two columns of data using "fuzzy" matching logic, where the data elements do not need to be exact for a match to occur. This is particularly useful in situations where you are combining data from multiple disparate sources that may not use the same names or terminology for the same thing. The agent will also provide a confidence score for each match so you can review or filter results below a certain threshold.

The agent supports fuzzy matching in all major languages and considers emoticons.

Agent Requirements

Below are all requirements (both mandatory and optional) the agent needs in order to run successfully. These will appear as input fields when using this agent through AI Agent Run.

Requirement

Description

Dataset to Match On

Source dataset for the data on which fuzzy matching will be applied

Column to be Matched

The column on which fuzzy matching will be applied

Keyword Pool

You must provide one or the other of [Keyword Pool or (Matching Source Dataset and Matching Source Column)] You can provide a comma-separated list of keywords to use as a reference set for the fuzzy matching process. Note that if this option is used, the "Include Matching Dataset Columns" option has no effect

Matching Source Dataset (Optional)

You must provide one or the other of [Keyword Pool or (Matching Source Dataset and Matching Source Column)] You can provide a dataset here to join against using the fuzzy matching agent. This dataset will act as the master reference dataset, containing the terms in their correct, standardized form.

Matching Source Column (Optional)

The column containing the values used by the fuzzy matching agent. This represents the “correct” or canonical list of options.

Include Matching Dataset Columns (Optional)

If enabled, this will include all columns from the Matching Source Dataset in the fuzzy matching join output dataset

Match Uniquely? (Optional)

If enabled, the fuzzy matching agent assigns each reference term to a single target row only. The assignment is based on the most confident match, and reference terms are not reused—even if they would otherwise be the highest-confidence match for additional rows.