Legal and Compliance

Contributor Identity Verification: Closing the CLA Loophole

Luca Bertrand January 7, 2026 8 min read

Abstract visualization of identity verification in contributor compliance

A signed CLA is a legal agreement between your organization and a specific person. The legal weight of that agreement depends entirely on your ability to demonstrate that the person who signed it is the same person who made the contributions it's supposed to cover. Most CLA implementations handle the first half well — collecting a signature — and handle the second half loosely, if at all.

The identity chain problem is this: your commit history records a git author email and name. Your CLA system has a signature associated with a web form session and a different email address. Your CI check maps the GitHub username from the PR to the CLA record. At each step, you're making an identity assertion — and the chain of those assertions is only as strong as its weakest link.

The Three Identity Gaps

Gap 1: Git commit identity vs. platform account identity. A contributor's git commit is authored with whatever name and email is configured in their local git config. That email may be different from their platform account's primary email, a work email, a personal email, or one of several aliases. GitHub, GitLab, and Bitbucket each have mechanisms to link multiple email addresses to a single account, but contributors don't always configure this — and the platform's identity APIs have varying levels of access to secondary email addresses.

The practical consequence: a contributor who commits with their work email but signs the CLA using their GitHub account (OAuth-authenticated) produces two distinct identifiers. A CLA system that matches by email will miss the connection unless it also checks the contributor's verified platform email list. A CLA system that matches by platform account username is more reliable but requires platform authentication during the signing step.

Gap 2: Signer identity vs. CLA legal subject. When a contributor signs a CLA using OAuth authentication (signing in with GitHub, GitLab, or Bitbucket), the system can establish that the person who signed had access to that platform account at signing time. What it cannot independently verify is that the person who created the platform account, and who has access to it now, is the same person who will be committing under that account going forward. Account sharing, account transfers after job changes, and inherited team bot accounts are all scenarios where platform identity continuity breaks down.

Gap 3: CCLA coverage vs. employment reality. A CCLA signed by a company covers employees whose employment relationship was current at the time of signing and whose names appear on the covered contributor list. Employment relationships change continuously. An engineer who was covered under their previous employer's CCLA and has since changed jobs remains covered by the old CCLA for the contributions they made while employed there — but new contributions under the same GitHub account are no longer covered by any CCLA unless their new employer has also signed one.

This is the identity gap that's hardest to detect automatically, because the contributor's platform account identity is unchanged across the employment transition.

Identity Verification Approaches at Different Assurance Levels

Level 1: Platform OAuth authentication (minimum baseline). The contributor signs the CLA by authenticating with their GitHub/GitLab/Bitbucket account. This establishes that the signer had control of the platform account at signing time and links the CLA to a specific platform user ID. The platform user ID (not the display name or email, which can change) is the stable identifier for the CLA record. This is the minimum acceptable standard for individual contributors.

Level 2: Email verification plus platform authentication. The contributor signs via OAuth and provides an email address that is then verified (confirmation link sent). The email address is linked to the CLA record in addition to the platform user ID. This provides a fallback identity when commit author emails need to be matched to CLA records. For corporate contributors, the verified email should be a work email confirming the employment relationship.

Level 3: Employer-verified identity (CCLA level). For corporate CLAs, the CCLA is signed by an authorized representative who is verified against the company's formal authorization chain. Covered employees are listed by work email and platform account, and the list is maintained by a designated administrator at the company (not self-reported by individual contributors). Changes to the covered list are authorized through a documented process.

Level 4: Active employment monitoring. For high-assurance programs, the identity verification layer includes ongoing employment status checks — typically through email domain monitoring and periodic re-verification. A contributor whose email domain changes (suggesting a job change) triggers a re-verification workflow. This is operationally intensive and appropriate for projects where CCLA coverage accuracy is critical.

We're not saying all projects need Level 4 verification. The appropriate assurance level depends on the project's IP sensitivity and the commercial stakes of a coverage gap. Most enterprise OSPO programs can operate effectively at Level 2 for individual contributors and Level 3 for corporate contributors.

The Bot Account Problem

A category of identity challenge that's purely operational: bot accounts — automated commit authors used by CI systems, documentation generators, changelog tools, and the like — frequently appear as commit authors in pull requests. A CLA check that encounters a bot-authored commit and flags it as unsigned creates noise that erodes trust in the CLA enforcement system.

The standard handling: maintain an exempt list of known bot account identifiers (platform user IDs or email address patterns matching your internal bot naming convention) that are automatically exempted from CLA checks. Exemption decisions should be explicit and auditable — a bot that's exempted shouldn't silently bypass CLA checks for all commits authored under its identity.

The tricky case is when a human contributor uses an account that looks like a bot (a username ending in -bot that was created by a human for project-specific work) or when a bot account is used for commits that do contain human-authored code (because the bot is processing and reformatting contributed code before committing). The right policy: when in doubt, require manual review of the exemption rather than automatic bypass.

Identity Verification in Due Diligence

The point at which identity verification quality becomes visible is IP due diligence. Acquirers' counsel reviewing CLA records will typically ask: how do you know the person who signed the CLA is the same person who authored the commits?

The answer needs to be a documented identity chain, not an assertion. If your CLA system records: platform user ID at signing time, the email addresses associated with that account at signing time, and a mapping of that platform user ID to commit author identities — you have a chain you can trace. If your CLA system records an email address that a contributor typed into a form, with no verification that the email is under their control, the chain breaks.

Consider an open-source data integration project that went through acquisition due diligence. The target company's CLA program had collected 230 signatures over three years, using a form-based system where contributors entered their email address and clicked a checkbox agreeing to the CLA. During diligence, the acquiring company's counsel reviewed a sample of 30 contributor records and found that 11 of them had email addresses in the CLA system that didn't match any commit author in the git history for that contributor. Some had signed with a work email but committed from a personal email; others had signed once and then contributed under a different account. The CLA records existed, but the identity chain couldn't be traced for 37% of the sampled contributors. The deal closed but with an IP representation holdback that cost the target company several weeks of negotiation.

Implementation Principles

Three implementation principles for building identity-sound CLA systems:

Use platform user IDs as primary identifiers, not email addresses. Email addresses change and can be reused. A platform user ID (GitHub's integer user ID, GitLab's integer user ID) is stable for the life of the account. Record it at signing time and use it as the canonical key for CLA lookups during PR checks.
Record all email addresses associated with the platform account at signing time. The contributor's verified email list at signing time, retrieved from the platform API, gives you the best available mapping between their platform identity and their git commit identities. Store this list as part of the CLA record.
Make identity mismatches visible, not silent failures. When a commit author can't be mapped to a CLA record through any of the available identity signals, surface this explicitly — as an unresolvable identity, not as "unsigned contributor." The distinction matters for audit purposes: an unsigned contributor is a compliance gap; an unresolvable identity is an identity resolution failure that needs different remediation.

The identity verification layer is the part of CLA infrastructure that gets the least attention during initial implementation and causes the most problems during later review. Building it correctly requires thinking about the audit endpoint before the signing flow — knowing what you'll need to prove in a due diligence review tells you what identity data you need to capture at signing time, before the question is ever asked.

← Back to Blog