Tensalis API Reference
Version: 5.4.1
Base URL: https://tensalis-engine-zlqsb5lbna-uc.a.run.app
Status: Production
Overview
Tensalis provides hallucination detection for LLM outputs through a simple REST API. The system uses Natural Language Inference (NLI) to detect factual contradictions that maintain high semantic similarity—the "embedding similarity trap" that defeats traditional RAG evaluation tools.
Key Features: - 500x cheaper than LLM-as-judge approaches ($0.01 vs $5.00 per 1K verifications) - Model-agnostic - Works with GPT-3.5, GPT-4, Gemini, Claude, and all LLMs - Production-ready - 2-4 second latency, deployed on Google Cloud Run - Binary decision - Simple VALIDATED/BLOCKED classification for easy integration
Authentication
Current Version: No authentication required (will be added in v6.0)
For production deployments, API keys will be required:
Authorization: Bearer YOUR_API_KEY
Endpoints
POST /v1/verify
Verify the faithfulness of an LLM response against source context.
Endpoint:
POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify
Headers:
Content-Type: application/json
Request Body:
{
"response": "string (required)",
"reference_facts": ["string"] (required),
"threshold": float (optional, default: 0.85),
"return_details": boolean (optional, default: false)
}
Parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
response |
string | Yes | - | The LLM-generated text to verify |
reference_facts |
array[string] | Yes | - | Source context/documents (1-10 items) |
threshold |
float | No | 0.85 | Confidence threshold (0.0-1.0). Higher = stricter. |
return_details |
boolean | No | false | Return detailed NLI scores and reasoning |
Response Format
Success Response (200 OK)
Basic Response (return_details: false):
{
"status": "VALIDATED",
"confidence": 0.92,
"processing_time_ms": 1847,
"model_version": "5.4.1"
}
Detailed Response (return_details: true):
{
"status": "BLOCKED",
"confidence": 0.73,
"processing_time_ms": 2134,
"model_version": "5.4.1",
"details": {
"entailment_score": 0.73,
"contradiction_score": 0.18,
"neutral_score": 0.09,
"classification": "NEUTRAL",
"reasoning": "Response contains claims that cannot be verified from source context"
}
}
Field Descriptions:
| Field | Type | Description |
|---|---|---|
status |
string | VALIDATED - Response is faithful to context BLOCKED - Response contradicts or cannot be verified from context |
confidence |
float | NLI entailment score (0.0-1.0). Higher = more confident the response follows from context. |
processing_time_ms |
integer | Total verification latency in milliseconds |
model_version |
string | Tensalis version used for verification |
details.entailment_score |
float | Probability response is entailed by context |
details.contradiction_score |
float | Probability response contradicts context |
details.neutral_score |
float | Probability relationship is neutral/unrelated |
details.classification |
string | Raw NLI classification: ENTAILMENT, CONTRADICTION, or NEUTRAL |
details.reasoning |
string | Human-readable explanation of the decision |
Decision Logic
The verification system uses a two-tier classification based on confidence threshold:
IF entailment_score >= threshold (default: 0.85):
status = "VALIDATED"
ELSE:
status = "BLOCKED"
Threshold Recommendations:
| Use Case | Threshold | Validation Rate* | Description |
|---|---|---|---|
| High-stakes (medical, legal, financial) | 0.90 | ~35-45% | Strictest - blocks anything uncertain |
| Production default | 0.85 | ~47-55% | Balanced - recommended for most applications |
| Permissive (internal tools, drafts) | 0.75 | ~65-75% | More lenient - fewer false blocks |
*Validation rates based on adversarial testing with temperature 0.7 and leading questions. Production rates typically 70-85% with proper prompting (temperature 0.3, "stick to facts" instructions).
Error Responses
400 Bad Request
{
"error": "Missing required field: response",
"status": "error",
"code": 400
}
Common Causes:
- Missing response or reference_facts fields
- Invalid threshold value (must be 0.0-1.0)
- Empty arrays or strings
422 Unprocessable Entity
{
"error": "Response text too long (max 2000 tokens)",
"status": "error",
"code": 422
}
Common Causes: - Response >2000 tokens (use chunking for longer texts) - Reference facts >5000 tokens total - Invalid characters in input
500 Internal Server Error
{
"error": "Model inference failed",
"status": "error",
"code": 500
}
Common Causes: - Model loading issues (transient - retry after 5s) - Cloud Run scaling (cold start - first request may timeout)
503 Service Unavailable
{
"error": "Service temporarily unavailable",
"status": "error",
"code": 503,
"retry_after": 60
}
Common Causes:
- System maintenance
- Rate limiting (wait retry_after seconds)
Code Examples
Python
Basic Verification:
import requests
def verify_response(llm_output: str, context: list[str]) -> dict:
"""Verify LLM output against source context."""
response = requests.post(
"https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
json={
"response": llm_output,
"reference_facts": context
},
timeout=10
)
response.raise_for_status()
return response.json()
# Example usage
context = [
"Our return policy allows returns within 30 days of purchase.",
"All returns must include original packaging and receipt."
]
llm_output = "You can return items within 30 days with the receipt."
result = verify_response(llm_output, context)
if result["status"] == "VALIDATED":
print(f"✅ Response verified (confidence: {result['confidence']:.2%})")
else:
print(f"❌ Response blocked (confidence: {result['confidence']:.2%})")
Output:
✅ Response verified (confidence: 92.00%)
With Error Handling:
import requests
from typing import Optional
def verify_with_retry(
llm_output: str,
context: list[str],
threshold: float = 0.85,
max_retries: int = 3
) -> Optional[dict]:
"""Verify with automatic retry on transient failures."""
for attempt in range(max_retries):
try:
response = requests.post(
"https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
json={
"response": llm_output,
"reference_facts": context,
"threshold": threshold,
"return_details": True
},
timeout=10
)
if response.status_code == 200:
return response.json()
elif response.status_code == 503:
# Service unavailable - wait and retry
retry_after = response.json().get("retry_after", 5)
print(f"Service busy, retrying in {retry_after}s...")
time.sleep(retry_after)
continue
else:
# Non-retryable error
response.raise_for_status()
except requests.exceptions.Timeout:
print(f"Timeout on attempt {attempt + 1}/{max_retries}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
else:
raise
return None
# Example usage
result = verify_with_retry(
llm_output="Returns accepted within 90 days.", # Wrong!
context=["Return policy: 30 days from purchase."],
threshold=0.85
)
if result:
print(f"Status: {result['status']}")
print(f"Confidence: {result['confidence']:.2%}")
print(f"Reasoning: {result['details']['reasoning']}")
Output:
Status: BLOCKED
Confidence: 67.00%
Reasoning: Response contains numerical value (90 days) that contradicts context (30 days)
JavaScript / Node.js
async function verifyResponse(llmOutput, context, threshold = 0.85) {
try {
const response = await fetch(
'https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify',
{
method: 'POST',
headers: {
'Content-Type': 'application/json',
},
body: JSON.stringify({
response: llmOutput,
reference_facts: context,
threshold: threshold
})
}
);
if (!response.ok) {
throw new Error(`Verification failed: ${response.statusText}`);
}
const result = await response.json();
return result;
} catch (error) {
console.error('Verification error:', error);
throw error;
}
}
// Example usage
const context = [
"Battery life: up to 8 hours of continuous use",
"Charging time: 2 hours for full charge"
];
const llmOutput = "The battery lasts 8 hours and charges in 2 hours.";
verifyResponse(llmOutput, context)
.then(result => {
console.log(`Status: ${result.status}`);
console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);
})
.catch(error => {
console.error('Failed to verify:', error);
});
cURL
Basic Request:
curl -X POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify \
-H "Content-Type: application/json" \
-d '{
"response": "The product ships within 24 hours.",
"reference_facts": ["Standard shipping: 2-3 business days processing"],
"threshold": 0.85
}'
Response:
{
"status": "BLOCKED",
"confidence": 0.71,
"processing_time_ms": 1923,
"model_version": "5.4.1"
}
Detailed Request:
curl -X POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify \
-H "Content-Type: application/json" \
-d '{
"response": "Multi-factor authentication is required for all accounts.",
"reference_facts": ["MFA is optional but recommended for enhanced security"],
"threshold": 0.85,
"return_details": true
}'
Response:
{
"status": "BLOCKED",
"confidence": 0.42,
"processing_time_ms": 2087,
"model_version": "5.4.1",
"details": {
"entailment_score": 0.42,
"contradiction_score": 0.51,
"neutral_score": 0.07,
"classification": "CONTRADICTION",
"reasoning": "Response states MFA is 'required' but context indicates 'optional' - semantic antonyms detected despite 95.8% embedding similarity"
}
}
Performance Characteristics
Latency
| Metric | Value |
|---|---|
| Median (P50) | 1.8-2.2 seconds |
| 95th percentile (P95) | 2.8-3.5 seconds |
| 99th percentile (P99) | 4.0-5.0 seconds |
| Cold start | 8-12 seconds (first request after idle) |
Latency Breakdown: - Model loading: ~30ms (cached) - Embedding generation: ~150ms - NLI inference: ~1500ms (DeBERTa-large-MNLI) - Network overhead: ~200ms
Optimization Tips: - Use async/parallel requests for batch processing - Implement request batching (coming in v6.0) - Cache results for identical request pairs - Consider streaming responses (coming in v6.1)
Throughput
| Configuration | Requests/Second |
|---|---|
| Single instance | 0.5-0.8 RPS |
| Auto-scaled (10 instances) | 5-8 RPS |
| Production recommended | Keep <5 RPS per instance |
Accuracy
Based on multi-model testing across GPT-3.5-Turbo and GPT-4:
| Test Type | Validation Rate | Block Rate | Notes |
|---|---|---|---|
| Adversarial testing | 47-48% | 52-53% | Temperature 0.7, leading questions |
| Production typical | 70-85% | 15-30% | Temperature 0.3, proper prompting |
| Contradiction detection | 100% | 0% | All tested contradictions caught |
Test Details: - 50 adversarial scenarios with leading questions - 6 contradiction pairs with 85-95% embedding similarity - Consistent performance across GPT-3.5 and GPT-4 - Zero false negatives on known contradictions
Rate Limits
Current (No Auth): - No enforced rate limits - Fair use policy: <100 requests/hour per IP
Coming in v6.0 (With API Keys):
| Tier | Requests/Month | Rate Limit | Price |
|---|---|---|---|
| Free | 10,000 | 10/minute | $0 |
| Starter | 100,000 | 30/minute | $99/month |
| Pro | 1,000,000 | 100/minute | $499/month |
| Enterprise | Unlimited | Custom | Custom |
Best Practices
1. Chunking Long Responses
The API has a 2000-token limit per request. For longer responses:
def chunk_and_verify(long_response: str, context: list[str]) -> list[dict]:
"""Verify a long response by chunking into paragraphs."""
chunks = long_response.split('\n\n') # Split by paragraph
results = []
for i, chunk in enumerate(chunks):
if not chunk.strip():
continue
result = verify_response(chunk, context)
results.append({
'chunk_index': i,
'text': chunk,
'status': result['status'],
'confidence': result['confidence']
})
return results
2. Context Selection
Good Context:
context = [
"Pricing: $99/month for Pro plan",
"Features: Unlimited projects, 10GB storage, priority support",
"Free trial: 14 days, no credit card required"
]
Bad Context (Too Vague):
context = [
"We have various pricing tiers available",
"Multiple features are included",
"Trial period is offered"
]
Tip: Include specific facts, numbers, and details. The NLI model works best with concrete information.
3. Threshold Tuning
Start with default (0.85) and adjust based on false positive/negative rates:
# Test different thresholds on your data
thresholds = [0.75, 0.80, 0.85, 0.90, 0.95]
test_cases = [...] # Your test data
for threshold in thresholds:
validated = sum(
1 for case in test_cases
if verify_response(case['response'], case['context'], threshold)['status'] == 'VALIDATED'
)
print(f"Threshold {threshold}: {validated}/{len(test_cases)} validated ({validated/len(test_cases):.1%})")
Output:
Threshold 0.75: 38/50 validated (76.0%)
Threshold 0.80: 32/50 validated (64.0%)
Threshold 0.85: 24/50 validated (48.0%) ← Production default
Threshold 0.90: 18/50 validated (36.0%)
Threshold 0.95: 12/50 validated (24.0%)
4. Error Handling
Always implement timeout and retry logic:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
def create_session():
"""Create session with automatic retries."""
session = requests.Session()
retry_strategy = Retry(
total=3,
status_forcelist=[429, 500, 502, 503, 504],
backoff_factor=1, # Wait 1s, 2s, 4s between retries
allowed_methods=["POST"]
)
adapter = HTTPAdapter(max_retries=retry_strategy)
session.mount("https://", adapter)
return session
# Use it
session = create_session()
response = session.post(
"https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
json={"response": "...", "reference_facts": [...]},
timeout=10
)
5. Monitoring Integration
Log verification results for analysis:
import logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
def verify_and_log(llm_output: str, context: list[str]) -> dict:
"""Verify and log results for monitoring."""
result = verify_response(llm_output, context)
logger.info(
f"Verification: status={result['status']}, "
f"confidence={result['confidence']:.2f}, "
f"latency={result['processing_time_ms']}ms"
)
# Send to your monitoring system (Datadog, New Relic, etc)
# metrics.increment(f"tensalis.{result['status'].lower()}")
# metrics.histogram("tensalis.confidence", result['confidence'])
# metrics.histogram("tensalis.latency", result['processing_time_ms'])
return result
Common Issues & Solutions
Issue: High block rate (>60% blocked)
Causes: - Threshold too high (try lowering to 0.75-0.80) - LLM adding elaborations not in context - Context too vague or incomplete - Using adversarial prompting (high temperature, leading questions)
Solutions:
# 1. Lower threshold
result = verify_response(output, context, threshold=0.75)
# 2. Improve LLM prompting
prompt = """
Answer based ONLY on the following context.
Do not add information not explicitly stated.
Be concise and factual.
Context: {context}
Question: {question}
"""
# 3. Add more complete context
context = [
# Add all relevant details from source documents
# Include specific facts, numbers, requirements
]
Issue: Slow response times (>5 seconds)
Causes: - Cold start (first request after idle) - Network latency - Long input text - Cloud Run scaling up
Solutions:
# 1. Implement warming (ping periodically)
import schedule
import time
def keep_warm():
try:
verify_response("test", ["test"], timeout=5)
except:
pass
schedule.every(5).minutes.do(keep_warm)
# 2. Use async/await for multiple requests
import asyncio
import aiohttp
async def verify_async(llm_output, context):
async with aiohttp.ClientSession() as session:
async with session.post(
"https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
json={"response": llm_output, "reference_facts": context}
) as response:
return await response.json()
# Verify multiple responses in parallel
results = await asyncio.gather(
verify_async(output1, context1),
verify_async(output2, context2),
verify_async(output3, context3)
)
Issue: False positives (should be blocked but validated)
Causes: - Response uses synonyms/paraphrasing correctly - Context is ambiguous - Threshold too low
Solution:
# Use higher threshold for critical applications
result = verify_response(
llm_output,
context,
threshold=0.90 # Stricter
)
# Or request detailed scores to investigate
result = verify_response(
llm_output,
context,
threshold=0.85,
return_details=True
)
if result['status'] == 'VALIDATED' and result['details']['neutral_score'] > 0.15:
# High neutral score suggests uncertainty - consider blocking
print("Warning: Validated but uncertain (high neutral score)")
Webhook Integration (Coming in v6.0)
Subscribe to verification events:
# Register webhook
POST /v1/webhooks
{
"url": "https://your-app.com/tensalis-webhook",
"events": ["verification.completed", "verification.failed"]
}
# Your webhook endpoint receives:
POST https://your-app.com/tensalis-webhook
{
"event": "verification.completed",
"timestamp": "2025-12-17T10:30:00Z",
"data": {
"verification_id": "ver_1234567890",
"status": "BLOCKED",
"confidence": 0.73,
"processing_time_ms": 1847
}
}
Changelog
v5.4.1 (Current)
- ✅ Production deployment on Google Cloud Run
- ✅ Two-tier binary classification (VALIDATED/BLOCKED)
- ✅ 85% universal confidence threshold
- ✅ Multi-model validation (GPT-3.5, GPT-4)
- ✅ 47-48% adversarial validation rate
- ✅ 100% contradiction detection accuracy
v6.0 (Planned Q1 2026)
- 🔄 API key authentication
- 🔄 Rate limiting per tier
- 🔄 Batch processing endpoint
- 🔄 Webhook notifications
- 🔄 Usage dashboard
v6.1 (Planned Q2 2026)
- 🔄 Streaming responses
- 🔄 GPU acceleration (<300ms latency)
- 🔄 Custom model fine-tuning
- 🔄 Multi-language support
Support
- Documentation: https://docs.tensalis.com
- Email: support@tensalis.com
- Status Page: https://status.tensalis.com
- Discord: https://discord.gg/tensalis
Legal
- Terms of Service: https://tensalis.com/terms
- Privacy Policy: https://tensalis.com/privacy
- SLA: 99.5% uptime guarantee (Enterprise tier)
Last Updated: December 17, 2025
API Version: 5.4.1