Skip to main content

How to Standardise Customer Names in Your Database: Rules, Edge Cases and Tools

A practical guide to name standardisation in UK customer databases — handling titles, compound surnames, business names, Companies House formatting and web form errors.

Why Name Standardisation Is Harder Than It Looks

Names are deceptively simple. On the surface, a customer name is just a string of text. In practice, names are one of the most inconsistent data types in any CRM or database — and in the UK, which has an unusually diverse mix of British, Irish, Welsh, Scottish and international naming conventions, the challenge is compounded further.

Name standardisation matters because it is the foundation of reliable deduplication, accurate mail-merge personalisation, and effective matching across systems. If your database holds "Dr Sarah O'Brien-Walsh" in one record and "sarah obrien walsh" in another, a simple string match will not catch the duplicate. And if your mailing system addresses a letter to "MR JAMES SMITH" when the customer has a Ph.D., the personalisation failure sends a message about how well you actually know your customers.

Title Handling: More Complex Than a Lookup Table

The obvious titles — Mr, Mrs, Miss, Ms, Dr — are well understood. But a real-world customer database will contain a much wider range, and each needs deliberate handling:

  • Prof / Professor — common in academic, medical and legal sectors. Should be stored consistently (decide whether to abbreviate).
  • Rev / Reverend / The Revd — used in religious contexts. The Church of England uses "The Revd" in formal correspondence; many databases simply use "Rev".
  • Sir / Dame / Lord / Lady / Baroness — titles for peers and knights that affect the full address format entirely. "Sir James" takes the first name, not the surname. Treating these as simple title prefixes will produce embarrassing personalisation errors.
  • Mx — gender-neutral title increasingly used in UK databases. Should be included in any title lookup table created since around 2015.
  • Cllr / Cllr. / Councillor — common in local government and community datasets.

A key decision is whether to store title as a controlled vocabulary (only values from an approved list are permitted) or as a free-text field. Controlled vocabulary is strongly preferable for data quality, but requires enforcing at the point of entry and cleansing historical free-text entries.

One common error is storing punctuation inconsistently: "Mr." with a full stop versus "Mr" without. Standardise to one convention — the UK convention is generally to omit the full stop after titles that are contractions (Mr, Mrs, Dr), whilst retaining it after abbreviations.

Capitalisation Rules and Common Failures

The simplest capitalisation rule — title case, first letter of each word uppercased — fails on a surprising number of real names:

  • Surnames with particles: "de", "van", "von", "le", "la", "du" are typically lowercased within a name (e.g. "Jean-Paul de Villiers", "Hans van der Berg"). Blindly applying title case produces "De Villiers" and "Van Der Berg".
  • Irish and Scottish prefixes: "O'" (O'Brien, O'Sullivan) and "Mc"/"Mac" (McDonald, MacPherson) require specific capitalisation. "Mc" names capitalise the letter after "Mc" — McDonald not Mcdonald.
  • Hyphenated surnames: Both halves should be capitalised (Smith-Jones, not Smith-jones). Automated title-case functions usually handle this correctly, but only if the hyphen is present. Data that has been entered as "Smith Jones" (space, no hyphen) is a different problem entirely.
  • ALL CAPS legacy data: Many older databases, particularly those migrated from mainframe systems or third-party list purchases, store names in ALL CAPITALS. Converting to title case is straightforward for most names but requires the exceptions above to be handled correctly.

Parsing Full Name Strings into Components

A significant data quality problem arises when names have been captured in a single free-text field rather than structured title/first name/last name fields. This is common with older web forms, imported spreadsheets, and legacy CRM migrations.

Parsing "Dr Sarah Jane O'Brien-Walsh" into its components is non-trivial. The challenges include:

  • Distinguishing title from first name when titles are not in a known list
  • Identifying middle names — is "Sarah Jane" a double first name or a first name plus middle name?
  • Compound and hyphenated surnames — is the last word always the surname, or could "O'Brien" be a first name?
  • Generational suffixes like "Jr", "Sr", "III" that appear at the end of the string

Rule-based parsers handle the common cases well but will always produce errors on unusual names. For large datasets, a combination of automated parsing and a manual review queue for low-confidence results is the pragmatic approach.

Business Name vs Personal Name Fields

Many B2B databases confuse the issue by storing the contact name and the organisation name inconsistently. Typical problems include:

  • The company name field contains the contact person's name ("John Smith" instead of "Acme Ltd")
  • The person name field contains the job title ("Managing Director")
  • A single name field is used for both individuals and organisations, making it impossible to apply person-specific formatting rules

The fix requires a classification step — determining whether a given name record refers to an individual or an organisation — before any formatting rules can be applied. This can be done with a combination of keyword matching (Ltd, PLC, LLP, & Sons, Group, etc.) and probabilistic name scoring.

Company Name Formatting and Companies House Conventions

For B2B databases, company name standardisation brings its own rules. The authoritative reference for UK company names is Companies House, which stores names in a specific canonical format. Key considerations include:

  • "The" prefix: Companies House places "The" at the end in a comma-separated format — "Smith Group, The" rather than "The Smith Group". This is the legal registered format. However, for marketing databases, the natural language form ("The Smith Group") is usually more appropriate, so you need to decide which convention to follow and apply it consistently.
  • Legal suffixes: Ltd, Limited, PLC, LLP, CIC — these should be standardised to a consistent abbreviation. "Limited" and "Ltd" in different records of the same company will prevent deduplication. Companies House uses "Limited" in full; most CRM systems use "Ltd". Pick one.
  • Punctuation in company names: Ampersands (&) versus "and", full stops in acronyms (B.B.C. vs BBC), brackets. Standardise across your database.
  • Trading names vs registered names: A company may trade as "Greggs" but be registered as "Greggs PLC". Your database should ideally hold both, flagged clearly, rather than mixing them across records.

Common Errors Introduced by Web Forms

A large proportion of name data quality problems originate at the point of capture — the web form. Common issues include:

  • All-caps entry: Some users, particularly older ones, type in capitals throughout. A simple transformation rule catches this but must be applied carefully (see above).
  • First name in last name field: Surprisingly common, particularly when the field order or labelling is unclear. Pattern-matching (is the "last name" a known UK first name?) can flag these for review.
  • Placeholder values: "Test", "N/A", "Anonymous", "asdf", "xxxxx". These should be caught by validation at point of entry, but if they have entered the database, a lookup against a list of known placeholder strings will surface them.
  • Emoji and special characters: Forms that do not restrict input may accept emoji or characters outside the expected character set. These need sanitising, particularly for databases feeding into print mailing workflows.
  • Double spaces and leading/trailing whitespace: Easy to overlook but will break exact-match deduplication. Always trim and collapse whitespace as a baseline transformation.

Building a Name Standardisation Pipeline

A robust name standardisation process typically runs in this order:

  • Trim and normalise whitespace
  • Remove or flag placeholder and invalid values
  • Classify record as individual or organisation
  • For individuals: extract title, parse into components, apply capitalisation rules with exceptions
  • For organisations: standardise legal suffix, handle "The" convention, normalise punctuation
  • Flag low-confidence results for manual review

Done well, this process transforms name data from a source of errors and duplicates into a reliable matching key — which is the foundation of all subsequent data quality work.

Need Help Cleaning Your Data?

UK Data Services handles data cleansing, deduplication and quality improvement projects for UK businesses. See our data cleaning services or get in touch for a no-obligation consultation.

Get a Free Consultation