← Back to all posts

Computer System Validation in the Age of AI: Rethinking CSV for LLMs

Computer System Validation (CSV) has been the backbone of pharmaceutical quality assurance for decades. The process is well-understood: define requirements, test against specifications, document everything, and maintain a validated state through controlled change management. But when the "system" in question is a Large Language Model that generates unique responses to natural language queries, the traditional CSV playbook starts to show its age.

The challenge isn't whether AI tools need validation—they clearly do under GAMP 5 and FDA guidance. The question is how to validate a system that doesn't behave like traditional software, and how to do it without creating validation documentation that's obsolete before the ink dries.

Why Traditional CSV Approaches Fall Short for LLMs

Traditional CSV relies on deterministic behavior. If you input X, the system outputs Y, every single time. Test cases are written to verify this predictability. But LLMs, by their nature, are probabilistic. Ask the same compliance question twice, and you'll get two different answers—both potentially correct, but worded differently, with different examples, or emphasizing different aspects of the regulation.

This creates immediate friction with conventional validation practices:

  • Test script specificity: A traditional test case might verify that clicking "Calculate Batch Yield" returns a specific value. How do you write a test case for "What are the temperature requirements for our lyophilization process?" when the answer structure will vary each time?
  • Regression testing: Every software update triggers regression testing to ensure existing functionality wasn't broken. But when the model is updated or retrained, virtually every output changes—does that mean everything "broke"?
  • Traceability matrices: URS to FRS to test cases works beautifully for defined features. But how do you trace a requirement like "shall provide accurate answers to GMP questions" to specific test cases when the universe of possible questions is infinite?

A Risk-Based Framework for AI System Validation

The solution isn't to abandon validation principles—it's to adapt them. GAMP 5 Second Edition already points the way with its emphasis on risk-based approaches and critical thinking over rigid templates. For AI systems in pharma, this means shifting focus from output validation to process validation.

Consider how you might validate ComplianceRAG or a similar AI compliance assistant:

1. Define the System's Intended Use and Boundaries

Be specific about what the AI will and won't do. For example: "Provide guidance on existing SOPs and regulatory requirements" is validatable. "Make compliance decisions" is not—and shouldn't be the intended use. Your URS should explicitly state that the system is a decision-support tool, with human review required for critical actions.

2. Validate the Retrieval Mechanism, Not Just the Generation

For RAG-based systems like ComplianceRAG, the retrieval component is actually more critical than the language model itself. You can validate that:

  • The system correctly identifies and retrieves relevant documents for a given query
  • Source citations are accurate and traceable to the original document
  • The document corpus is complete, version-controlled, and regularly updated
  • Access controls prevent retrieval of documents outside the user's authorization scope

These are testable, repeatable criteria that align well with traditional CSV methods.

3. Establish Qualitative Acceptance Criteria

Instead of testing for exact output matches, define qualitative criteria that answers must meet:

  • Factual accuracy (answer must reflect the source documents)
  • Completeness (answer must address all aspects of the question)
  • Appropriate citations (answer must reference specific SOPs or regulations)
  • Appropriate uncertainty handling (system must acknowledge when it lacks information)

Your test cases then become: "Does this answer meet the qualitative criteria?" This is assessed by qualified SMEs, not by string matching in a test script.

Practical Testing Strategies for LLM Validation

A pharmaceutical quality team validating an AI compliance assistant might structure their Installation Qualification (IQ), Operational Qualification (OQ), and Performance Qualification (PQ) like this:

IQ: Verify the system is installed per specifications—model version, document corpus loaded, access controls configured, audit trail functional, integration points established.
OQ: Test the retrieval mechanism with a defined set of queries. Verify that relevant documents are retrieved, citations are accurate, and the system handles edge cases (ambiguous queries, documents with conflicting information, questions outside the knowledge base).
PQ: Subject matter experts from QA, manufacturing, and validation use the system with real-world compliance questions. They assess whether answers are accurate, appropriately sourced, and useful. This isn't pass/fail on exact wording—it's assessment of fitness for intended use.

Ongoing Validation and Change Control

Perhaps the biggest shift from traditional CSV is recognizing that AI systems require continuous validation, not just periodic revalidation. The model may not change, but the knowledge base certainly will as SOPs are updated, new regulations are issued, and validation protocols are revised.

Your change control process should address:

  • Document updates: When an SOP is revised, how quickly is it ingested into the system? What testing confirms it's properly indexed and retrievable?
  • Model updates: If the underlying LLM is upgraded, what regression testing is required? (Hint: focus on the qualitative criteria, not output matching)
  • Performance monitoring: Are you tracking user feedback, citation accuracy, and instances where the system declined to answer? These metrics indicate validated state drift.

Documentation That Adds Value

Finally, validation documentation for AI systems should be leaner and more focused than traditional CSV packages. A 500-page validation report that no one reads doesn't demonstrate compliance—it demonstrates checkbox mentality.

Focus your documentation on:

  • Clear risk assessment and rationale for your validation approach
  • Evidence that the system performs as intended for its specific use case
  • Procedures for maintaining validated state as the knowledge base evolves
  • Training records showing users understand the system's capabilities and limitations

The goal isn't to prove the AI is perfect—it's to demonstrate you have appropriate controls for an AI-assisted process, with human oversight where it matters.

Computer System Validation isn't obsolete in the age of AI. But it does require quality professionals to apply critical thinking rather than templates, and to focus validation efforts where they'll actually reduce risk. Done right, CSV for AI tools like ComplianceRAG can be more rigorous and more efficient than traditional approaches—because it's validating what actually matters.

Running compliance on manual search? See how ComplianceRAG handles this.

See It In Action