CASE STUDY –

Consolidation of Multiple Public Data Sources Through Advanced Soft-Matching Techniques

ABSTRACT
Soft matching algorithms within Excel, R, and Python are convenient but lack contextual relevance when matching business names and entities. We developed an Excel-based soft-matching algorithm optimized for combining multiple business databases into a “single source of truth.” The algorithm is being used to continuously update client CRM databases with the latest publicly available information about their customers.
 

Clearly Define Your Business Objectives

A major B2B SAAS provider was seeking additional externally sourced information about their existing clients and prospects. Combining this information with existing CRM data would create a broader view of their client and prospect base. This first required consolidating relevant data sources to create a “single source of truth.”

The primary challenge with consolidating multiple public data sources into one comprehensive database is the likelihood of a single entity occurring in multiple forms throughout the dataset. Accounting for this problem became the first priority.

 

Acquire & Synthesize Relevant Data

Most of the existing client data was stored in relational databases and Excel workbooks. Existing matching formulas such as Excel’s Vlookup or R’s adist were the most obvious solution, but are not optimized to match business names. Therefore we had to develop a proprietary algorithm to match entities across multiple sources.

These entities usually exist as text data entries. Excel’s Vlookup works by first cross-referencing data sources with each other. For example, using the VLookup function in Excel within existing data-sets returned the following for matching results:

Fig. 1 – Soft Matching Attempts Using Excel Lookup Formula

In the example given in Figure 1, the entities in source 2 are being searched in source 1. Vlookup’s exact match did not produce any matches for the entities in source 2 column, while only returning one correct result with approximate match. Relevant algorithms in R and Python showed similar limitations.

Figure 2 shows comparable results when using FischerJordan’s Algorithm (FJ Algorithm). While we can see 3 entities with similar names as “ABC & Company” in source 1, the nearest match is “ABC & Co.”. For “New man tech”, the nearest match is “Newman Technologies.” Both of these return as matches in the third column.

Fig. 2 – Soft Matching Attempts on Data-Set 1 Using FJ Algorithm

Different public databases have data aggregated at different levels. We needed an algorithm to reconcile different data sources in order to create a single source of truth. Figure 3 shows how the FJ algorithm can recognize and match entities that may exhibit sub-entity or sister-entity relationships (e.g. Kaiser Permanente is the parent organization of Kaiser Foundation Group).

Fig. 3 – Soft Matching Attempts on Data-Set 2 Using FJ Algorithm

 

Develop an Action Plan

We leveraged the algorithm to aggregate multiple public data sources, combine them with our client’s proprietary CRM data, and create a “database of databases” containing all relevant information about a given customer. This allowed our client to identify firmographic information such as size, years in business, and geography, and combine them with proprietary information to develop a more customized relationship management strategy for each customer.

Partner with You to See it Through

 

Partner with You to See it Through

We automated the algorithm and combined it with existing web scraping algorithms to continuously update data from internal and external data sources on a quarterly basis without requiring on-site management and reconciliation of incoming data (Fig. 4).

Fig. 4 – Process Flow to Final Analysis Output Leveraging FJ Soft-Matching Algorithm

Keywords 

Soft Matching, New York Consulting, Management Consulting, Microsoft Excel, VLOOKUP, SAAS, B2B, Algorithm, R, Python, Data Analytics, Quantitative Analysis, Analytics, Case Study

Back to ANALYTICS

OUR CASE STUDIES

Discover how our 4-step process has helped clients avoid confusion and succeed in the marketplace.

ABOUT

CAREERS

INSIGHTS

CONTACT

Copyright © 2019 FischerJordan. All Rights Reserved.