AI Agents - Fuzzy Matching Agent
Introduction
The Fuzzy Matching Agent can dynamically join two columns of data using "fuzzy" matching logic, where the data elements do not need to be exact for a match to occur. This is particularly useful in situations where you are combining data from multiple disparate sources that may not use the same names or terminology for the same thing. The agent will also provide a confidence score for each match so you can review or filter results below a certain threshold.
The agent supports fuzzy matching in all major languages and considers emoticons.
Agent Requirements
Below are all requirements (both mandatory and optional) the agent needs in order to run successfully. These will appear as input fields when using this agent through AI Agent Run.
Requirement | Description |
|---|---|
Dataset to Match On | Source dataset for the data on which fuzzy matching will be applied |
Column to be Matched | The column on which fuzzy matching will be applied |
Keyword Pool | You must provide one or the other of [Keyword Pool or (Matching Source Dataset and Matching Source Column)] You can provide a comma-separated list of keywords to use as a reference set for the fuzzy matching process. Note that if this option is used, the "Include Matching Dataset Columns" option has no effect |
Matching Source Dataset (Optional) | You must provide one or the other of [Keyword Pool or (Matching Source Dataset and Matching Source Column)] You can provide a dataset here to join against using the fuzzy matching agent. This dataset will act as the master reference dataset, containing the terms in their correct, standardized form. |
Matching Source Column (Optional) | The column containing the values used by the fuzzy matching agent. This represents the “correct” or canonical list of options. |
Include Matching Dataset Columns (Optional) | If enabled, this will include all columns from the Matching Source Dataset in the fuzzy matching join output dataset |
Match Uniquely? (Optional) | If enabled, the fuzzy matching agent assigns each reference term to a single target row only. The assignment is based on the most confident match, and reference terms are not reused—even if they would otherwise be the highest-confidence match for additional rows. |
Updated about 1 month ago
