How can I find and clean up duplicate client records in my WebCATS database, and prevent future duplicates from occurring?
Duplicate records negatively affect the integrity of your database. For example, for DLA-funded centers, duplicates can cause you to overstate your total number of initial counseling sessions on the DLA 1806 report. For SBA-funded centers reporting to EDMIS, duplicate client records are particularly troublesome because not only will you be overstating your client count and understating your extended engagement client count, but cleaning up duplicates is impossible for data that was submitted to EDMIS in prior fiscal years.
There are other complications of duplicate records in addition to reporting complications. Duplicate records lead to confusion for your users (which client record should I be working with?). They also make it difficult to piece together a complete history of a client’s activity because information is spread across multiple client records instead of being available in one location.
How do duplicate records get introduced into your database in the first place? OutreachSystems has discovered that, most of the time, a duplicate client (or contact) record is introduced because the counselor didn't realize the client already existed in the database. Why didn’t the counselor know? Because they didn’t have view permissions to client records belonging to other centers in the network. Restrictive permissions are the leading cause of duplicates.
It's typical for a counselor to only be assigned permissions that let them view, edit and delete clients within their own center. But when the same client visits two centers that are part of the same network, these limited permissions force the counselor to enter the client into the database a second time, thus creating a duplicate client record. And since clients often visit more than one center in their pursuit of answers, multi-center networks are at risk of developing duplicate client issues in their WebCATS databases.
The process for cleaning up duplicates is involved and there are many considerations to be aware of. This FAQ covers common considerations, but each database may have its own unique set of circumstances.
The following four sections thoroughly address the issue of duplicates:
- Locating Duplicate Records Using Duplicate Analysis Reports
- Identifying Legacy Duplicate Client Records
- Duplicate Client & Contact Record Clean-up Process
- Avoiding Duplicate Records in the Future
To keep your database free of duplicate records, it's a good idea to regularly check your database for duplicates. OutreachSystems has developed several duplicate analysis reports in WebCATS to help networks determine whether duplicates exist in their databases and what the scope of the issue might be.
WebCATS has three built-in reports that exist solely to identify duplicate client records:
- Duplicates by Company Name. This report lists out all client records that share the exact same company name as another client record in the database.
- Duplicates by Company Phone.This report lists out all client records that share the exact same company phone number as another client record in the database.
- Duplicates by Company Name and Phone. This report is more restrictive than the previous two; it requires that both company name and phone be shared by two client records.
WebCATS also has two built-in reports dedicated to duplicate contact records:
- Duplicates by Name and Phone. This report lists out all contact records that share (1) the exact same last name, (2) the same first letter of the first name, and (3) the same phone number as another contact record in the database.
- Duplicates by E-mail Address.This report lists out all contact records that share the exact same e-mail address as another contact record in the database.
Note: All five of these duplicate reports are case insensitive, but not punctuation insensitive. For example, "OutreachSystems" and "OutReachSystems" would be treated as identical company names, but "OutreachSystems Inc." and "OutreachSystems, Inc." would not.
Before you begin cleaning up duplicate client records, it is imperative that you first identify which records cannot be altered because of the need to maintain audit integrity. For example, SBDCs, WBCs and PTACs are required to report their numbers to either the SBA or the DLA. Once these numbers have been reported, you are accountable for them and it is important that any clean up process does not affect your reported numbers.
In this article, we will refer to records that cannot be altered because of the need to maintain audit integrity as legacy records. Records that have not yet been reported are referred to as non-legacy records. OutreachSystems has different recommended procedures for cleaning up duplicate client records, depending on whether the client records are legacy or non-legacy records.
The determining factor as to whether a duplicate client record is legacy or non-legacy is the dates of its associated counseling, investment, or award sessions. If a client has any session activity that has been reported to your funding partner, it is a legacy record. Following are some parameter guidelines based on funding partner.
SBA-Funded Centers (SBDC/WBC)
The EDMIS system was established on October 1, 2005. Client activity is uploaded from WebCATS to EDMIS on a quarterly basis. Thus, any client that had a session between October 1, 2005 and the last quarterly upload is what we call a “legacy record” because it has been reported to EDMIS and should not be deleted. Nor should any of its activity sessions be consolidated to another client record.
Clients that have not had any session activity uploaded to EDMIS are what we call “non-legacy records” and these records can be safely consolidated with a keeper record, and then deleted.
Note: The duplicate client reports include the first EDMIS session date and the most recent session date to help you quickly determine the legacy status of potential EDMIS data.
Note: In some cases, SBA may allow you to clean up current-year legacy duplicates and resubmit to EDMIS. See SBA EDMIS Fiscal Year Rollback Exception for a discussion of possible clean-up provisions of current-year legacy duplicates in the EDMIS system.
DLA-Funded Centers (PTAC)
The DLA parameters are more straightforward. The key date here is when the last DLA 1806 was submitted. If a client has been reported on any 1806, then it is a legacy client. Clients with initial activity entered after the last 1806 submission, and thus never submitted to DLA, are non-legacy clients.
After identifying how many legacy and non-legacy duplicate client records exist in your database, you'll need to decide if you want to manually clean them up or have OutreachSystems do it on your behalf using automated scripts we have developed. As you read on in this section, you'll see that it is quite a time-consuming process to manually remove the duplicates—especially legacy duplicates. For this reason, we typically recommend that if you have a significant number of duplicate records, you should strongly consider contracting with OutreachSystems to do the job for you.
If you decide to clean up duplicates manually, remember that there are different procedures for legacy and non-legacy records, as discussed in the following two sections:
- Cleaning Up Non-Legacy Duplicate Client Records
- Cleaning Up Legacy Duplicate Client Records
- SBA EDMIS Fiscal Year Rollback Exception
Warning: Non-database administrators should not attempt to consolidate and delete duplicate records without first consulting their database administrator. Improper handling of duplicates can negatively impact the audit integrity of the WebCATS database.
Note: This article’s primary focus is the cleaning up of duplicate clients and their associated contacts. However, there may also be duplicate contact records in the database that are not associated with client records. WebCATS built-in duplicate contact reports, mentioned previously, will help identify these. Once duplicate clients and their associated contacts have been cleaned up, a similar process can be followed to clean up duplicate contacts.
Because non-legacy records do not require special audit integrity considerations, their clean-up process is relatively straightforward. Following is the recommended process for manually handling non-legacy duplicate client records.
- For each set of identified duplicate records, you will need to designate one of the duplicate client records as a "keeper" record to be retained in the database. It is helpful to mark the keeper record with an asterisk [*] so that you can quickly see where the activity records need to be moved to.
- Once you have identified your "keeper" record, you will now want to transfer all counseling, milestone, investment, or award sessions from the duplicate record(s) to the keeper record. This is done by opening the session record in edit mode and changing the client association to the keeper client record.
- With all the sessions now associated with the keeper record, it is now time to transfer all contacts (even duplicate contacts—you'll take care of that in step 4) from the non-keeper client record(s) to the keeper record. To accomplish this, repeat steps a-d for each non-keeper client record:
- Contacts must belong to the same center as the client they are associated with. For this reason, you must open the non-keeper client record in edit mode and modify its center association to match that of the keeper client’s center.
- For each secondary contact (i.e. non-primary contact), if any, that is associated with the non-keeper client record, change the contact record's company association to the keeper record.
- You're now left with one primary contact for the non-keeper client record. Because a contact can only belong to one company and because all companies must have at least one contact, you'll need to delete that client record in order to "free" the contact.
- Once you've freed the primary contact from the non-keeper client record, you can now associate it with the keeper client record. At this point, all contacts from all duplicate client records should be associated with the keeper client record, and the keeper record should be the only client record left in the database from the initial set of duplicates.
- In most cases, you'll now have duplicate contact records attached to the keeper client record. These also need to be cleaned up, following steps similar to those used in cleaning up the duplicate clients:
- For each set of identified duplicate contact records, identify the keeper contact record to be retained with the keeper client record. As with keeper client records, we recommend using an asterisk in the keeper contact record's name to help you easily identify it.
- For the non-keeper contact records, make a note of all conference attendance activity and replicate it for the keeper contact record. For example, if the non-keeper contact record for John Smith attended a conference that the keeper contact record for John Smith is not marked as having attended, you'll need to add that activity to the keeper contact record for John Smith.
- Transfer any inquiry records from the non-keeper contacts to the keeper contact.
- Delete the non-keeper duplicate contact records.
Determining the keeper record might be based on some pre-defined criteria such as which record was most recently added to the database, which had the most recent activity or which contains the highest number of completed data fields. Determining which duplicate has the most accurate information in it is not feasible programmatically and may require a phone call to the client itself. Unfortunately, this is a highly subjective and labor-intensive process.
Note: Using an asterisk in the keeper record's name, as suggested in step 1, will help you easily identify the keeper client record.
Note: Re-associating client activity to the keeper client record will not affect center or counselor ownership of the activity. Center and counselor reports should not be affected by consolidating client activity to a single client record.
The end result of this non-legacy clean-up process should be a single client record with all session records, contacts, and contact activity consolidated to it. Any duplicates of the client or contact records should have successfully been removed from the database.
When only one legacy client record exists for a set of duplicates, it must be designated as the keeper record and the non-legacy process detailed in the previous section can be followed for all non-keeper duplicate records. In other words, if Company X exists in the database three times, but only one of the three client records has previously-reported counseling sessions associated with it, then this is the client record that must be designated as the keeper record, even if the other two records potentially have more complete information.
If you have a set of duplicate client records where more than one of the duplicates has history with either SBA or DLA, then you will want to follow the next set of clean-up steps.
Note: Keep in mind that if you are funded by SBA and you have a significant number of duplicates whose activity was first reported to EDMIS during the current fiscal year (OutreachSystems refers to these as FY-legacy duplicates), you may want to consider asking SBA to allow you to roll back data submitted thus far for the current year before performing cleanup on those FY-legacy records. For more information on this option, see SBA EDMIS Fiscal Year Rollback Exception.
To clean up a set of duplicates where more than one of the duplicates is a legacy client record:
- Set up a “DUPLICATES Center” center record. This newly created center will be used to store duplicate legacy records that, although they were not designated as keeper records, cannot be deleted due to the negative impact deleting them would have on your audit integrity. Permissions to this center should be tightly restricted to prevent non-administrator users from accessing these legacy duplicate records.
- Designate one of the legacy duplicate records as the keeper record. If there are any non-legacy duplicates as part of this duplicate set, consolidate their activity and contacts to this keeper record and then delete them, using the same process described in the previous section.
- Once all non-legacy records are consolidated and deleted, it's now time to focus on the remaining legacy duplicate records. You can’t delete these legacy records, yet you want to clean them up to prevent future confusion. So the secondary, tertiary, and quaternary legacy client records for the same company must be edited as follows:
- Modify the client status to “inactive.”
- Remove the company phone number.
- Append the text “RESTRICTED DUPLICATE” into the Company Name field.
- Transfer the client to the "DUPLICATES Center" along with its contacts.
- Append the text "RESTRICTED DUPLICATE" into the Last Name field of all the contacts you just transferred to the "DUPLICATES Center" and remove their e-mail addresses.
- Retain the existing client, counselor, and center ownership of all associated client activity session records.
- Do not transfer the activity sessions of these "RESTRICTED DUPLICATE" client records to the legacy keeper client record.
- Do not transfer the activity sessions of these "RESTRICTED DUPLICATE" client records to the newly-created "DUPLICATES Center."
The end result of this process is that the activity sessions and contacts associated with all non-legacy client records in the set of duplicates should be consolidated to the legacy keeper record, and the non-legacy client records should be deleted. All "RESTRICTED DUPLICATE" legacy duplicate client records and their associated contact records should be retained in the locked-down "DUPLICATES Center" to prevent future use. All activity sessions associated with "RESTRICTED DUPLICATE" legacy duplicate client records should remain where they occurred for audit integrity.
Data that has been uploaded to EDMIS in the previous fiscal year is locked by the SBA and any duplicate legacy records that were uploaded from WebCATS prior to the current fiscal year should be handled as described in the previous section.
However, there may be client duplicates in WebCATS whose initial upload to EDMIS occurred during the current fiscal year. In these cases, there may be an opportunity to correct the WebCATS database for the current year by consolidating these duplicates using the non-legacy clean-up process and then updating EDMIS with the corrected data. We will refer to these potential clean-up clients as FY-legacy duplicates.
It is important to recognize that when duplicates are uploaded to EDMIS, the client numbers are being inflated while at the same time the extended engagement and long-term client numbers are being under counted.
For this reason, the SBA has tentatively agreed to make allowances for programs to take corrective action and consolidate these current fiscal year duplicates. On a case by case basis, they have agreed to consider wiping out the current year’s EDMIS data to allow a more accurate EDMIS export for the entire year to be resubmitted.
To do this current year database rollback, your network will need to request approval from Bruce Purdy at SBA. To assess whether or not to grant this approval, the SBA will need to know the extent of the duplicate problem within your database. In other words, do you have two current fiscal year duplicate client records or 200? The duplicate analysis reports referenced at the beginning of this FAQ will help you determine the extent of the problem. You can export these reports to an Excel spreadsheet and then sort on the “First EDMIS Session” column for an easy determination of the number of FY-legacy clients that might be cleaned up pending SBA approval.
As stated at the beginning of this FAQ, duplicate client record creation can be attributed to user permissions for the most part. However, it may also be due to lack of understanding by the users. If a user doesn’t understand the implications of creating a duplicate client then no matter what tools you make available to them, they will still create their “own” copy of the same company.
Therefore, in addition to some of the recommended settings we have listed below, we also recommend that you make sure that all users realize the need to check for duplicates across all centers every time a new client or contact is created in the database.
Here are the permissions settings as recommended by OutreachSystems to ensure that the user has the ability to query for duplicate records before adding new companies to the database, and to ensure that these queries are taking all centers into account:
- System administrators should set WebCATS' Quick Search feature to query all centers. This is done from NeoCATS (Manage|NeoCATS).
- System administrators should activate automated duplicate checks. This is also done from NeoCATS (Manage|NeoCATS).
- At the very least, system administrators should set up all user accounts with the permissions necessary to view client and contact records across all centers. But ideally, we recommend that edit permissions also be bumped up so that users can also edit client and contact records across all centers.
OutreachSystems is aware that many SBA-funded centers have been concerned about sharing data across centers due to statute language regarding unlawful client information disclosure. To clear up any confusion on this, we had Bruce Purdy, EDMIS Program Manager, weigh in on the topic. According to Bruce, SBA strongly recommends sharing client information across a program network:
"[...] by not using the same client record within the network, not only will the EEC count be incorrect, but so will things like job creation, capital infusion, etc. Cross center counseling is common especially when one center may have the expertise that another center does not. The only way these clients can obtain that counseling is to go to multiple centers/counselors. The language in the statute and the Notice of Award are basically to protect the client from unlawful disclosure. Sharing that information within the SBDC network or consortium would not in my opinion be a break from this disclosure."
Besides changing user permissions, what else can you do to avoid duplicates in the future? We suggest the following corrective actions:
- All users in a network should be reminded to allow pop-ups from WebCATS to ensure that the automatic duplicate check feature in WebCATS is functioning on their machine. WebCATS also performs an automatic pop-up check at the login page to warn users of the need to allow pop-ups from the WebCATS site. For more information on allowing pop-ups, see the FAQ titled How do I disable pop-up blockers in WebCATS?.
- WebCATS user training should emphasize the ability to add cross-center activity to clients at other centers and the importance of doing this rather than creating a duplicate client record at the user's own center. Users need to understand that duplicates are a serious issue.
- The database administrator should run the duplicate analysis reports in WebCATS before submitting data to the program's funding partner. For SBA-funded centers, this means running the reports on a quarterly basis prior to uploading the quarterly EDMIS export. This will identify potential duplicates for remedial action before contaminating the EDMIS database and should become part of the EDMIS processing routine.
Note: If a "DUPLICATES Center" exists in your WebCATS database, it should be excluded when running the duplicate analysis reports in order to ensure that "RESTRICTED DUPLICATE" legacy duplicates are filtered out of the report's output.
Want more? Browse our extensive list of WebCATS FAQs.