Skip to main content

UK Business Data Standards: Companies House, VAT Numbers and SIC Codes

A practical guide to UK business identifiers — Companies Registration Numbers, VAT numbers, EORI, SIC codes and charity numbers — and how to use them for B2B data quality.

Why Business Identifiers Are the Backbone of B2B Data Quality

In consumer data, the gold standard matching key is typically a combination of name, address and date of birth. In B2B data, the equivalent is the suite of official business identifiers — registration numbers, VAT numbers, and the various other codes that uniquely identify a legal entity in UK public records. When these identifiers are captured, validated and stored correctly in your customer database, they give you a reliable, authoritative key for deduplication, credit checking, compliance screening, and market analysis that no amount of name-matching can replicate.

Yet many UK business databases are remarkably poor at capturing these identifiers. CRM systems that were designed with consumers in mind often lack dedicated fields for CRN or VAT number, leading companies to either ignore them entirely or store them inconsistently in notes fields. This article covers the key UK business identifiers you should be capturing and how to use them to improve your B2B data quality.

Companies Registration Number (CRN)

The Companies Registration Number — sometimes called the Company Number or CRN — is the unique identifier assigned to every company registered at Companies House. It is the single most reliable identifier for a UK limited company or LLP.

Format and Validation

The standard CRN format is 8 characters. For companies incorporated in England and Wales, this is typically 8 digits (e.g. 12345678), though older companies may have leading zeros that some systems inadvertently strip (00123456 becoming 123456 — an invalid number). Scottish companies are prefixed with "SC" (e.g. SC123456), Northern Ireland companies with "NI" (e.g. NI123456), and Limited Liability Partnerships with "OC" (e.g. OC123456).

A common data quality error is storing CRNs without leading zeros or prefix codes, making them unresolvable against the Companies House API. Validation rules to implement:

  • Total length must be exactly 8 characters (including any prefix)
  • England/Wales: 8 digits, zero-padded on the left if fewer than 8 numeric digits were entered
  • Scotland: "SC" followed by exactly 6 digits
  • Northern Ireland: "NI" followed by exactly 6 digits

Companies House as the Authoritative Source

The Companies House API (available free at developer.company-information.service.gov.uk) allows you to look up any registered company by CRN and retrieve its canonical registered name, registered address, company status (active, dissolved, dormant), SIC codes, and officer information. For B2B data quality purposes, this is invaluable:

  • Verify that a CRN you hold actually exists and matches the company name in your database
  • Retrieve the canonical registered address and compare it to your records
  • Check whether a company has been dissolved — useful for suppressing communications to non-trading entities
  • Enrich records that have a CRN but are missing address or sector information

The API rate limit is generous for most enrichment use cases, and the data is updated daily. For large-scale enrichment projects, bulk data downloads are available from the Companies House data products page.

VAT Registration Numbers

The UK VAT registration number format is "GB" followed by 9 digits — for example, GB123456789. Some businesses also have branch identifiers appended (GB123456789 001), though for most data quality purposes the 9-digit component is sufficient.

Post-Brexit, businesses trading with the EU may hold both a UK VAT number (GB prefix) and an EU VAT number from a member state. It is important to store these separately and not conflate them — they are issued by different authorities and follow different validation algorithms.

VAT number validation uses a modulus-97 check digit algorithm, which means you can validate the mathematical integrity of a number without any API call. Libraries exist in most programming languages to perform this check. HMRC also provides a VAT number lookup API that confirms whether a number is currently registered, which is useful for credit and compliance checking on new customer accounts.

Important caveat: not all UK businesses are VAT registered. The registration threshold is currently £90,000 annual turnover, meaning smaller businesses and sole traders below this threshold will not have a VAT number. Absence of a VAT number does not indicate an invalid business record.

EORI Numbers Post-Brexit

The Economic Operators Registration and Identification (EORI) number has become increasingly important for UK businesses engaged in import/export since Brexit. UK EORI numbers follow the format GB followed by the VAT number followed by 000 (e.g. GB123456789000), though standalone EORI numbers (not derived from a VAT number) also exist.

For B2B databases serving the trade, logistics or manufacturing sectors, capturing EORI numbers enables more reliable matching against customs and freight documentation. Businesses that trade internationally but lack an EORI in your database may be worth flagging for data enrichment.

SIC Codes: Industry Classification

Standard Industrial Classification (SIC) codes are used in the UK to classify business activities. Companies House requires all registered companies to have at least one SIC code, making them a reliable field for industry segmentation when sourced from Companies House data.

The UK uses a 5-digit SIC 2007 code system. For example, 62012 is "Business and domestic software development" and 47190 is "Other retail sale in non-specialised stores". The full code list is published by the Office for National Statistics.

Data quality issues with SIC codes in commercial databases include:

  • Using the older 4-digit SIC 2003 codes mixed with 5-digit SIC 2007 codes
  • Incorrect codes entered at company formation that were never updated to reflect the actual business activity
  • Over-generic codes (e.g. 99999 "Dormant company") that provide no useful segmentation
  • Missing SIC codes on records that were not sourced from Companies House

When building B2B prospect lists or segmenting your customer base by industry, SIC codes sourced directly from Companies House are more reliable than self-reported industry fields from web forms or CRM dropdown selections.

Charity Registration Numbers

Charities operating in the UK are registered with different regulators depending on their jurisdiction:

  • England and Wales: The Charity Commission for England and Wales issues a numeric registration number, typically 6-7 digits (e.g. 1234567). Charities with annual income over £5,000 must register.
  • Scotland: The Office of the Scottish Charity Regulator (OSCR) issues numbers in the format SC followed by 6 digits (e.g. SC012345).
  • Northern Ireland: The Charity Commission for Northern Ireland issues numbers with the prefix NIC followed by 6 digits (e.g. NIC100000).

For databases serving the charity sector, validating and standardising charity numbers against the relevant regulator's public API enables reliable deduplication and enrichment. Both the Charity Commission and OSCR provide publicly accessible search APIs.

Note that a charity can also be a limited company and therefore have both a Companies House CRN and a charity registration number. These are different identifiers and should be stored in separate fields.

Using Business Identifiers for Deduplication

The practical value of capturing these identifiers becomes clear in deduplication exercises. Name-based matching for B2B records is notoriously unreliable — "Acme Ltd", "Acme Limited", "ACME LTD" and "Acme (UK) Ltd" may or may not be the same entity, and without a CRN to resolve the ambiguity, you cannot be certain. With a valid CRN on both records, you can match with certainty in a single comparison.

The recommended approach for B2B deduplication is to use business identifiers as the primary matching key where available, and fall back to fuzzy name and address matching only where identifiers are absent. This produces both more accurate matches and a clear priority list for data enrichment — records lacking CRNs become your highest-priority enrichment targets.

Need Help Cleaning Your Data?

UK Data Services handles data cleansing, deduplication and quality improvement projects for UK businesses. See our data cleaning services or get in touch for a no-obligation consultation.

Get a Free Consultation