Consolidation of Multiple Public Data Sources Through Advanced Soft-Matching Techniques

Soft matching algorithms within Excel, R, and Python are convenient but lack contextual relevance when matching business names and entities. We developed an Excel-based soft-matching algorithm optimized for combining multiple business databases into a “single source of truth.” The algorithm is being used to continuously update client CRM databases with the latest publicly available information about their customers.

Clearly Define Your Business Objectives

A major B2B SAAS provider was seeking additional externally sourced information about their existing clients and prospects. Combining this information with existing CRM data would create a broader view of their client and prospect base. This first required consolidating relevant data sources to create a “single source of truth.”

The primary challenge with consolidating multiple public data sources into one comprehensive database is the likelihood of a single entity occurring in multiple forms throughout the dataset. Accounting for this problem became the first priority.


Acquire & Synthesize Relevant Data

Most of the existing client data was stored in relational databases and Excel workbooks. Existing matching formulas such as Excel’s Vlookup or R’s adist were the most obvious solution, but are not optimized to match business names. Therefore we had to develop a proprietary algorithm to match entities across multiple sources.

These entities usually exist as text data entries. Excel’s Vlookup works by first cross-referencing data sources with each other. For example, using the VLookup function in Excel within existing data-sets returned the following for matching results:

Fig. 1 – Soft Matching Attempts Using Excel Lookup Formula

In the example given in Figure 1, the entities in source 2 are being searched in source 1. Vlookup’s exact match did not produce any matches for the entities in source 2 column, while only returning one correct result with approximate match. Relevant algorithms in R and Python showed similar limitations.

Figure 2 shows comparable results when using FischerJordan’s Algorithm (FJ Algorithm). While we can see 3 entities with similar names as “ABC & Company” in source 1, the nearest match is “ABC & Co.”. For “New man tech”, the nearest match is “Newman Technologies.” Both of these return as matches in the third column.

Fig. 2 – Soft Matching Attempts on Data-Set 1 Using FJ Algorithm

Different public databases have data aggregated at different levels. We needed an algorithm to reconcile different data sources in order to create a single source of truth. Figure 3 shows how the FJ algorithm can recognize and match entities that may exhibit sub-entity or sister-entity relationships (e.g. Kaiser Permanente is the parent organization of Kaiser Foundation Group).

Fig. 3 – Soft Matching Attempts on Data-Set 2 Using FJ Algorithm


Develop an Action Plan

We leveraged the algorithm to aggregate multiple public data sources, combine them with our client’s proprietary CRM data, and create a “database of databases” containing all relevant information about a given customer. This allowed our client to identify firmographic information such as size, years in business, and geography, and combine them with proprietary information to develop a more customized relationship management strategy for each customer.

Partner with You to See it Through


Partner with You to See it Through

We automated the algorithm and combined it with existing web scraping algorithms to continuously update data from internal and external data sources on a quarterly basis without requiring on-site management and reconciliation of incoming data (Fig. 4).

Fig. 4 – Process Flow to Final Analysis Output Leveraging FJ Soft-Matching Algorithm


Soft Matching, New York Consulting, Management Consulting, Microsoft Excel, VLOOKUP, SAAS, B2B, Algorithm, R, Python, Data Analytics, Quantitative Analysis, Analytics, Case Study



Discover how our 4-step process has helped clients avoid confusion and succeed in the marketplace.





Copyright © 2019 FischerJordan. All Rights Reserved.

DISCLAIMER - This content is for informational purposes only. You should not construe any such information or other material as legal, tax, investment, financial, or other advice. Nothing contained on our Site constitutes a solicitation, recommendation, endorsement, or offer by FischerJordan or any third party service provider to buy or sell any securities or other financial instruments in this or in in any other jurisdiction in which such solicitation or offer would be unlawful under the securities laws of such jurisdiction.

All Content on this site is information of a general nature and does not address the circumstances of any particular individual or entity. Nothing in the Site constitutes professional and/or financial advice, nor does any information on the Site constitute a comprehensive or complete statement of the matters discussed or the law relating thereto. FischerJordan is not a fiduciary by virtue of any person’s use of or access to the Site or Content. You alone assume the sole responsibility of evaluating the merits and risks associated with the use of any information or other Content on this Site before making any decisions based on such information or other Content. In exchange for using the Site, you agree not to hold FischerJordan, its affiliates or any third party service provider liable for any possible claim for damages arising from any decision you make based on information or other Content made available to you through the Site.