Change Control for AI Models: Managing Updates in GxP Systems

6 March 2026 · LLMOps.Pro · 6 min read

Every pharma quality professional knows the golden rule: nothing changes without change control. Whether it's a tablet press setting, a cleaning procedure, or a LIMS configuration, any modification must be assessed, approved, documented, and verified before it goes live. But what happens when the "system" in question is an AI model that evolves with new document ingestions, updated embeddings, and shifting retrieval logic? Welcome to one of the most nuanced challenges in modern GxP compliance.

Why AI Models Demand a New Change Control Mindset

Traditional change control in validated environments follows a well-understood lifecycle. A change request is raised, an impact assessment is performed, testing is executed, and stakeholders sign off. This works beautifully for deterministic systems—update a field in your ERP, and you can predict exactly what the output will be.

AI systems like ComplianceRAG introduce a fundamentally different dynamic. Consider the components that can change:

The document corpus: New SOPs, updated regulatory guidance, or revised validation protocols are ingested into the retrieval index.
The embedding model: The model that converts text into vector representations may be updated to improve semantic understanding.
The retrieval logic: Parameters like chunk size, overlap, similarity thresholds, and re-ranking algorithms may be tuned.
The language model (LLM): The underlying generative model may receive patches, version upgrades, or be swapped entirely.
Prompt templates: System prompts and instructions that shape how the AI formulates answers can be modified.

Each of these changes can alter the system's outputs in ways that range from subtle to dramatic. A QA manager who asks "What is the maximum hold time for Buffer X?" today might get a slightly different—or significantly different—answer after any one of these modifications. In a GxP environment, that variability must be governed.

Categorizing Changes: Not All Updates Are Created Equal

A risk-based approach is essential. Borrowing from GAMP5 principles and ICH Q9 risk management frameworks, changes to an AI compliance system can be categorized into tiers:

Category 1 – Routine Document Updates: Adding a newly approved SOP or retiring an obsolete one. These are the most frequent changes and represent updates to the knowledge base, not the model itself. Impact is localized to queries that reference the affected documents.
Category 2 – Configuration Changes: Adjusting retrieval parameters, modifying prompt templates, or updating chunking strategies. These affect how the system processes and presents information, even if the underlying data and models remain the same.
Category 3 – Model Changes: Upgrading the embedding model, updating the LLM version, or switching providers entirely. These represent the highest-risk modifications because they can alter behavior across the entire system in ways that are difficult to predict exhaustively.

A useful mental model: Category 1 is like updating a reference binder on the shop floor. Category 2 is like recalibrating an instrument. Category 3 is like replacing the instrument entirely.

Building a Change Control Procedure for ComplianceRAG

Here's a practical framework that pharma QA teams can adapt for managing AI system changes within their existing quality management systems:

Step 1: Change Request and Classification

Every change begins with a formal change request (CR) in your QMS—no exceptions. The CR should document what is being changed, why, and classify the change according to the categories above. For Category 1 changes, a streamlined workflow with predefined acceptance criteria may be appropriate. For Category 3 changes, a full impact assessment involving QA, IT, regulatory affairs, and subject matter experts is warranted.

Step 2: Impact Assessment

This is where AI-specific considerations come into play. The impact assessment should address:

Which query types or compliance domains could be affected?
Could the change introduce contradictory or outdated information?
Does the change affect audit trail integrity or traceability?
Is there a risk of answer regression—correct answers that become incorrect after the update?

For example, if your organization updates its deviation management SOP and ingests the new version into ComplianceRAG, the impact assessment should verify that the old version is properly archived, that the new version is correctly chunked and indexed, and that queries referencing deviation procedures now return information from the updated document.

Step 3: Regression Testing with a Golden Query Set

This is arguably the most critical step for AI systems. Every validated deployment of ComplianceRAG should maintain a golden query set—a curated collection of representative compliance questions with pre-approved expected answers and source citations.

Before any change is promoted to production, the golden query set is executed against the updated system. Results are compared against baseline outputs, and deviations are flagged for review. This is not a pass/fail test in the binary sense; some variation in phrasing is expected and acceptable. The review criteria should focus on:

Factual accuracy: Are the key facts, numbers, and requirements correct?
Source fidelity: Are the cited documents appropriate and current?
Completeness: Does the response address the question fully, without omitting critical requirements?
No hallucination: Does the system fabricate information not present in the source documents?

In practice, a team at a mid-sized CDMO we've worked with maintains a golden set of 150 queries spanning GMP operations, cleaning validation, equipment qualification, and deviation management. They run this battery after every Category 2 or Category 3 change, with results reviewed and signed off by a QA lead before deployment.

Step 4: Controlled Deployment and Rollback Planning

AI changes should follow the same deployment rigor as any validated system update. This means deploying to a staging environment first, executing the regression suite, and only promoting to production after documented approval. Equally important is maintaining a rollback capability—the ability to revert to the previous version of the model, embeddings, or configuration within a defined timeframe if post-deployment monitoring reveals issues.

ComplianceRAG supports versioned deployments, meaning every combination of document corpus, embedding model, retrieval configuration, and prompt template is tracked as a discrete, reproducible system state. This makes rollback straightforward and auditable.

Step 5: Post-Implementation Review

After a defined stabilization period (typically 2–4 weeks for Category 2 and 3 changes), a post-implementation review should evaluate:

User-reported anomalies or unexpected answers
Confidence score distributions compared to baseline
Human escalation rates—a spike may indicate degraded retrieval quality
Feedback from QA reviewers on answer quality

Addressing the Elephant in the Room: Continuous Learning

Some organizations ask whether ComplianceRAG "learns" from user interactions in ways that could introduce uncontrolled changes. The answer is deliberately no. ComplianceRAG's retrieval-augmented architecture means the system generates answers from a controlled, curated document corpus—not from user conversations or interaction history. There is no implicit model drift. Every change to the system's knowledge or behavior is explicit, traceable, and subject to change control.

In GxP terms: the system's validated state is deterministic and reproducible at any point in time. That's not an accident—it's a design decision rooted in regulatory reality.

Making It Work Within Your Existing QMS

The good news is that you don't need to reinvent your quality system to manage AI changes. The principles of change control—assessment, testing, approval, documentation, review—are the same. What changes is the specificity of the procedures: what to test, how to evaluate non-deterministic outputs, and how to maintain traceability across model components rather than monolithic software versions.

Start by adding an AI-specific appendix to your existing change control SOP. Define the change categories, establish your golden query set, and integrate regression testing into your deployment workflow. Over time, as your organization gains experience, these procedures will become as routine as any other change control activity on your site.

Change control for AI isn't a problem to be feared—it's a discipline to be mastered. And for pharma organizations that get it right, it becomes a competitive advantage: the ability to keep compliance tools current, accurate, and inspection-ready while maintaining the rigor that regulators expect.

Running compliance on manual search? See how ComplianceRAG handles this.

See It In Action