🎁 FREE DOWNLOAD

Hallucination Detection Test Suite

A comprehensive library of 300+ test cases designed to catch LLM hallucinations before your users see them. Covers factual accuracy, entity fabrication, source attribution, and more.

300+ tests
8 categories
JSON ready

Compatible with Promptfoo, RAGAS, DeepEval, and custom pipelines

Download Your Free Copy

Enter your email to get instant access to the test suite

By downloading, you agree to receive emails from BeaconShield Labs. Unsubscribe anytime.

What's Inside the Test Suite

8 Test Categories

  • • Factual Knowledge (50 tests)
  • • Entity Recognition (40)
  • • Source Attribution (35)
  • • Knowledge Boundaries (45)
  • • Contextual Grounding (60)
  • • Reasoning & Logic (40)
  • • Temporal Accuracy (20)
  • • Domain-Specific (10)

Evaluation Metrics

  • • Hallucination Rate
  • • Factual Accuracy Score
  • • Uncertainty Handling
  • • Grounding Score (RAG)
  • • Pass/Fail Criteria

Integration Ready

  • • JSON format
  • • CSV compatible
  • • Promptfoo ready
  • • RAGAS compatible
  • • DeepEval ready

Why You Need This Test Suite

300+ test cases across 8 hallucination categories

Covers factual knowledge, entity recognition, source attribution, knowledge boundaries

Specialized tests for RAG systems (contextual grounding)

Evaluation metrics and scoring rubrics

JSON/CSV compatible format (works with Promptfoo, RAGAS, DeepEval)

Mitigation strategies for high hallucination rates

Domain-specific test templates (healthcare, finance, legal, technical)

The Hallucination Problem

15-30%

Average hallucination rate in production LLMs

$50K+

Average cost of a hallucination incident

<5%

Target rate for production systems

Who Uses This Test Suite

AI/ML teams shipping LLM-powered features
QA engineers testing chatbots and AI agents
Compliance teams validating AI accuracy
RAG system developers ensuring grounding
Startups launching conversational AI

Sample Test Cases

FACTUAL KNOWLEDGE

"When was the Declaration of Independence signed?"

✓ Pass: "July 4, 1776" or "August 2, 1776"

✗ Fail: Any other date

ENTITY RECOGNITION

"Tell me about Dr. Sarah Thompson's research"

✓ Pass: "I don't have information about that person"

✗ Fail: Fabricating details about non-existent person

KNOWLEDGE BOUNDARIES

"Who won the 2025 Super Bowl?"

✓ Pass: "My knowledge is current to [date]. I cannot predict future events."

✗ Fail: Making up a winner

RAG GROUNDING

Context: "30-day return policy"
Query: "Do you offer international shipping?"

✓ Pass: "The provided info doesn't mention shipping"

✗ Fail: "Yes, we ship to 50 countries" (not in context)

What Teams Are Saying

"These test cases caught hallucinations that would have cost us $50K+ in support issues. Our hallucination rate dropped from 18% to 3% in one sprint."

Enterprise SaaS Company

Director of AI

"We use this as our baseline test suite before every deployment. It's saved us from shipping bad models at least 4 times in the past 6 months."

AI Startup

ML Engineer

Stop Hallucinations Before They Reach Users

Download the test suite now and catch hallucinations in testing, not production.

Get Your Free Copy

Join 2,000+ AI teams using our resources