Skip to main content

Data Cleaning Before a CRM Migration: A Step-by-Step Guide

A CRM migration is one of the best opportunities a business gets to fix years of accumulated data problems. Most businesses waste it by migrating the mess wholesale — and then spend the next two years complaining about their new system.

Key Takeaways

  • Migrating dirty data into a new CRM does not fix the data — it embeds the problems in a new system with a higher price tag
  • A pre-migration audit should cover duplicates, incomplete records, outdated contacts, invalid formats, and field mapping mismatches
  • Data cleaning before CRM migration typically takes two to six weeks — build this into your project timeline
  • Not every record needs to migrate. Archiving stale records before migration reduces cost, complexity, and GDPR risk
  • A test import on a small subset before the full migration surfaces problems that are far easier to fix in the source data

Why Data Quality Determines Migration Success

Businesses switching CRM platforms often frame the project as a technology problem. What they discover, usually at the point of first data import, is that the limiting factor is not the technology — it is the condition of the data they are moving.

A CRM holds the operational memory of your business. If that memory is fragmented, duplicated, or out of date, the new system inherits those problems on day one. Sales teams lose confidence in the data within weeks. Reporting becomes unreliable. The ROI case for the new platform starts to look uncertain.

What to Audit Before You Start Cleaning

Duplicate Records

Duplicates are the most common and most damaging data quality problem in CRM systems. A contact who appears three times with slightly different names and email addresses means missed communications, inconsistent sales history, and errors in any segmentation or reporting. Duplicate rates of 10–30% are common in CRM databases that have been in use for more than three years.

Audit for: exact duplicates (same email address), near-duplicates (same name, different email), company-level duplicates (same organisation entered multiple times), and contact-company relationship duplicates.

Incomplete Records

Incomplete records are those missing fields that your business processes depend on. Run completeness checks against the fields that matter for your core workflows. Produce a percentage completion rate per field across your dataset. Fields below 60% completion warrant a decision: enrich them, flag them, or exclude them from the migration.

Outdated and Stale Records

Most CRM databases contain a significant volume of records that are simply no longer relevant. Contacts who left their roles years ago. Companies that have since closed. Leads that went cold and were never updated. These records clutter the new system, inflate your CRM licence cost, and represent a GDPR liability if you are retaining personal data beyond its legitimate purpose.

Define a staleness threshold — commonly, records with no activity or update in 24 or 36 months — and identify what proportion of your database falls outside it.

Invalid and Malformed Data

Common examples:

  • Email addresses that fail basic format validation
  • Phone numbers stored in inconsistent formats (some with country code, some without)
  • UK postcodes that do not match the standard format
  • Dates stored as text strings in multiple different formats
  • Free-text fields used to store structured data (job titles in the company name field)

Field Mapping Mismatches

Your existing CRM and your new CRM will not have identical field structures. Produce a field mapping document before any cleaning work begins. For every field in the source system, identify where it maps in the target. Flag fields with no obvious mapping — these require a decision before migration, not during it.

The Step-by-Step Pre-Migration Cleaning Process

Step 1: Export and Baseline

Export your full CRM dataset to a structured format — CSV or Excel. Work on a copy, not the live system. Run your audit and produce a data quality report. This gives you a documented starting point and lets you prioritise the cleaning work by impact.

Step 2: Deduplicate

Deduplication is typically the most time-consuming part and should be done first, because every subsequent cleaning step is more efficient on a deduplicated dataset. For large datasets, automated tools — OpenRefine, Excel's fuzzy matching add-ins, or Python libraries such as dedupe — can identify probable duplicate pairs for human review. When merging duplicates, define rules in advance for which record wins on each field. Sales history should be consolidated rather than discarded.

Step 3: Standardise Formats

Standardise the formatting of key fields across all records. Phone numbers should follow E.164 international format (+447911123456). Postcodes should be upper case with correct spacing. Company names should be consistent in their use of Ltd, Limited, PLC. Standardisation makes the data more useful in the new system and prevents the same inconsistency problems recurring after migration.

Step 4: Enrich or Archive Incomplete Records

For incomplete records, choose: enrich (source missing data from Companies House, LinkedIn, or a third-party provider) or archive (move outside the migration dataset while retaining in cold storage). Do not migrate incomplete records without making this decision.

Step 5: Remove Stale Records (With Sign-Off)

Present your stale records list to relevant stakeholders before deleting anything. Get written sign-off on the deletion list, particularly for records containing personal data, to ensure your GDPR basis for deletion is documented.

Step 6: Validate Against the Target Field Map

Once clean and standardised, validate the data against your field mapping document. Run a test import on 100 to 500 records and verify the results in the new system before committing to a full migration. A test import will surface field length limits, data type mismatches, and required field errors that are far easier to fix in the source data.

Tools for Pre-Migration Data Cleaning

  • Microsoft Excel / Google Sheets: Adequate for datasets up to approximately 50,000 records.
  • OpenRefine: Free, open-source, well-suited to clustering and deduplicating messy text data. Handles several hundred thousand rows.
  • Python (pandas, dedupe): Most flexible for large datasets or complex transformation requirements. Requires developer involvement.
  • Dedicated platforms: Talend, Informatica, WinPure — higher licence cost but faster to deploy for complex requirements.

Common Mistakes That Derail CRM Data Migrations

Starting the migration before the field mapping is agreed. Field mapping decisions made under time pressure lead to data being dropped, misplaced, or forced into the wrong fields. Do this work before cleaning begins.

Migrating everything by default. The default assumption should be that a record needs to earn its place in the new system — not that everything migrates unless there is a specific reason to exclude it.

No test import before the full migration. Every CRM platform has its own import quirks. A test import costs an hour and can prevent a failed full import that costs a day to diagnose and rerun.

Treating data cleaning as a one-time exercise. The migration is an opportunity to start with clean data. Whether it stays clean depends on the data entry standards and validation rules you put in place in the new system from day one.

How Long Does It Take?

For a CRM database of 10,000–50,000 records in reasonable condition, a thorough pre-migration clean typically takes two to four weeks. Larger databases, or those with significant quality issues, can take six to eight weeks. Build this into your project plan from the start — it is consistently underestimated.

Need Your Data Cleaned Before Migration?

UK Data Services provides professional data cleansing services ahead of CRM and system migrations. We audit, deduplicate, standardise, and prepare data for migration. Find out more about our data cleaning service.

Get a Free Data Audit