Skip to main content

Power BI

Needs Votes

Power Query: Use fuzzy logic to replace outliers with similar values

Vote (7) Share
Matthew Parowski's profile image

Matthew Parowski on 19 Aug 2020 18:57:59

It would be incredible if Power Query included an AI-driven feature to analyze a column, recognize similar values that are outliers of a common core, and replace them with the most prominent value. For example, if you have 50 instances of "William Smith", 2 instances of "Will Smith", 1 instance of "Wiliam Smith", and 1 instance of "William Smit", clean the data so that in the end you have 54 instances of "William Smith". Perhaps allow a parameter that modulates the sensitivity for the fuzzy logic.

Similarly, it would be cool to emulate SAS's "Unique Identifier" feature to use multiple columns to flag potential duplicate values, based on similar spellings of names as noted above PLUS matching date of birth, phone number, email address, or other columns that might help identify unique data values.