Tensalis API Reference

Version: 5.4.1
Base URL: https://tensalis-engine-zlqsb5lbna-uc.a.run.app
Status: Production


Overview

Tensalis provides hallucination detection for LLM outputs through a simple REST API. The system uses Natural Language Inference (NLI) to detect factual contradictions that maintain high semantic similarity—the "embedding similarity trap" that defeats traditional RAG evaluation tools.

Key Features: - 500x cheaper than LLM-as-judge approaches ($0.01 vs $5.00 per 1K verifications) - Model-agnostic - Works with GPT-3.5, GPT-4, Gemini, Claude, and all LLMs - Production-ready - 2-4 second latency, deployed on Google Cloud Run - Binary decision - Simple VALIDATED/BLOCKED classification for easy integration


Authentication

Current Version: No authentication required (will be added in v6.0)

For production deployments, API keys will be required:

Authorization: Bearer YOUR_API_KEY

Endpoints

POST /v1/verify

Verify the faithfulness of an LLM response against source context.

Endpoint:

POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify

Headers:

Content-Type: application/json

Request Body:

{
  "response": "string (required)",
  "reference_facts": ["string"] (required),
  "threshold": float (optional, default: 0.85),
  "return_details": boolean (optional, default: false)
}

Parameters:

Parameter Type Required Default Description
response string Yes - The LLM-generated text to verify
reference_facts array[string] Yes - Source context/documents (1-10 items)
threshold float No 0.85 Confidence threshold (0.0-1.0). Higher = stricter.
return_details boolean No false Return detailed NLI scores and reasoning

Response Format

Success Response (200 OK)

Basic Response (return_details: false):

{
  "status": "VALIDATED",
  "confidence": 0.92,
  "processing_time_ms": 1847,
  "model_version": "5.4.1"
}

Detailed Response (return_details: true):

{
  "status": "BLOCKED",
  "confidence": 0.73,
  "processing_time_ms": 2134,
  "model_version": "5.4.1",
  "details": {
    "entailment_score": 0.73,
    "contradiction_score": 0.18,
    "neutral_score": 0.09,
    "classification": "NEUTRAL",
    "reasoning": "Response contains claims that cannot be verified from source context"
  }
}

Field Descriptions:

Field Type Description
status string VALIDATED - Response is faithful to context
BLOCKED - Response contradicts or cannot be verified from context
confidence float NLI entailment score (0.0-1.0). Higher = more confident the response follows from context.
processing_time_ms integer Total verification latency in milliseconds
model_version string Tensalis version used for verification
details.entailment_score float Probability response is entailed by context
details.contradiction_score float Probability response contradicts context
details.neutral_score float Probability relationship is neutral/unrelated
details.classification string Raw NLI classification: ENTAILMENT, CONTRADICTION, or NEUTRAL
details.reasoning string Human-readable explanation of the decision

Decision Logic

The verification system uses a two-tier classification based on confidence threshold:

IF entailment_score >= threshold (default: 0.85):
    status = "VALIDATED"

ELSE:
    status = "BLOCKED"

Threshold Recommendations:

Use Case Threshold Validation Rate* Description
High-stakes (medical, legal, financial) 0.90 ~35-45% Strictest - blocks anything uncertain
Production default 0.85 ~47-55% Balanced - recommended for most applications
Permissive (internal tools, drafts) 0.75 ~65-75% More lenient - fewer false blocks

*Validation rates based on adversarial testing with temperature 0.7 and leading questions. Production rates typically 70-85% with proper prompting (temperature 0.3, "stick to facts" instructions).


Error Responses

400 Bad Request

{
  "error": "Missing required field: response",
  "status": "error",
  "code": 400
}

Common Causes: - Missing response or reference_facts fields - Invalid threshold value (must be 0.0-1.0) - Empty arrays or strings


422 Unprocessable Entity

{
  "error": "Response text too long (max 2000 tokens)",
  "status": "error",
  "code": 422
}

Common Causes: - Response >2000 tokens (use chunking for longer texts) - Reference facts >5000 tokens total - Invalid characters in input


500 Internal Server Error

{
  "error": "Model inference failed",
  "status": "error",
  "code": 500
}

Common Causes: - Model loading issues (transient - retry after 5s) - Cloud Run scaling (cold start - first request may timeout)


503 Service Unavailable

{
  "error": "Service temporarily unavailable",
  "status": "error",
  "code": 503,
  "retry_after": 60
}

Common Causes: - System maintenance - Rate limiting (wait retry_after seconds)


Code Examples

Python

Basic Verification:

import requests

def verify_response(llm_output: str, context: list[str]) -> dict:
    """Verify LLM output against source context."""
    response = requests.post(
        "https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
        json={
            "response": llm_output,
            "reference_facts": context
        },
        timeout=10
    )
    response.raise_for_status()
    return response.json()

# Example usage
context = [
    "Our return policy allows returns within 30 days of purchase.",
    "All returns must include original packaging and receipt."
]

llm_output = "You can return items within 30 days with the receipt."

result = verify_response(llm_output, context)

if result["status"] == "VALIDATED":
    print(f"✅ Response verified (confidence: {result['confidence']:.2%})")
else:
    print(f"❌ Response blocked (confidence: {result['confidence']:.2%})")

Output:

✅ Response verified (confidence: 92.00%)

With Error Handling:

import requests
from typing import Optional

def verify_with_retry(
    llm_output: str, 
    context: list[str],
    threshold: float = 0.85,
    max_retries: int = 3
) -> Optional[dict]:
    """Verify with automatic retry on transient failures."""

    for attempt in range(max_retries):
        try:
            response = requests.post(
                "https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
                json={
                    "response": llm_output,
                    "reference_facts": context,
                    "threshold": threshold,
                    "return_details": True
                },
                timeout=10
            )

            if response.status_code == 200:
                return response.json()
            elif response.status_code == 503:
                # Service unavailable - wait and retry
                retry_after = response.json().get("retry_after", 5)
                print(f"Service busy, retrying in {retry_after}s...")
                time.sleep(retry_after)
                continue
            else:
                # Non-retryable error
                response.raise_for_status()

        except requests.exceptions.Timeout:
            print(f"Timeout on attempt {attempt + 1}/{max_retries}")
            if attempt < max_retries - 1:
                time.sleep(2 ** attempt)  # Exponential backoff
                continue
            else:
                raise

    return None

# Example usage
result = verify_with_retry(
    llm_output="Returns accepted within 90 days.",  # Wrong!
    context=["Return policy: 30 days from purchase."],
    threshold=0.85
)

if result:
    print(f"Status: {result['status']}")
    print(f"Confidence: {result['confidence']:.2%}")
    print(f"Reasoning: {result['details']['reasoning']}")

Output:

Status: BLOCKED
Confidence: 67.00%
Reasoning: Response contains numerical value (90 days) that contradicts context (30 days)

JavaScript / Node.js

async function verifyResponse(llmOutput, context, threshold = 0.85) {
  try {
    const response = await fetch(
      'https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify',
      {
        method: 'POST',
        headers: {
          'Content-Type': 'application/json',
        },
        body: JSON.stringify({
          response: llmOutput,
          reference_facts: context,
          threshold: threshold
        })
      }
    );

    if (!response.ok) {
      throw new Error(`Verification failed: ${response.statusText}`);
    }

    const result = await response.json();
    return result;

  } catch (error) {
    console.error('Verification error:', error);
    throw error;
  }
}

// Example usage
const context = [
  "Battery life: up to 8 hours of continuous use",
  "Charging time: 2 hours for full charge"
];

const llmOutput = "The battery lasts 8 hours and charges in 2 hours.";

verifyResponse(llmOutput, context)
  .then(result => {
    console.log(`Status: ${result.status}`);
    console.log(`Confidence: ${(result.confidence * 100).toFixed(1)}%`);
  })
  .catch(error => {
    console.error('Failed to verify:', error);
  });

cURL

Basic Request:

curl -X POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify \
  -H "Content-Type: application/json" \
  -d '{
    "response": "The product ships within 24 hours.",
    "reference_facts": ["Standard shipping: 2-3 business days processing"],
    "threshold": 0.85
  }'

Response:

{
  "status": "BLOCKED",
  "confidence": 0.71,
  "processing_time_ms": 1923,
  "model_version": "5.4.1"
}

Detailed Request:

curl -X POST https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify \
  -H "Content-Type: application/json" \
  -d '{
    "response": "Multi-factor authentication is required for all accounts.",
    "reference_facts": ["MFA is optional but recommended for enhanced security"],
    "threshold": 0.85,
    "return_details": true
  }'

Response:

{
  "status": "BLOCKED",
  "confidence": 0.42,
  "processing_time_ms": 2087,
  "model_version": "5.4.1",
  "details": {
    "entailment_score": 0.42,
    "contradiction_score": 0.51,
    "neutral_score": 0.07,
    "classification": "CONTRADICTION",
    "reasoning": "Response states MFA is 'required' but context indicates 'optional' - semantic antonyms detected despite 95.8% embedding similarity"
  }
}

Performance Characteristics

Latency

Metric Value
Median (P50) 1.8-2.2 seconds
95th percentile (P95) 2.8-3.5 seconds
99th percentile (P99) 4.0-5.0 seconds
Cold start 8-12 seconds (first request after idle)

Latency Breakdown: - Model loading: ~30ms (cached) - Embedding generation: ~150ms - NLI inference: ~1500ms (DeBERTa-large-MNLI) - Network overhead: ~200ms

Optimization Tips: - Use async/parallel requests for batch processing - Implement request batching (coming in v6.0) - Cache results for identical request pairs - Consider streaming responses (coming in v6.1)


Throughput

Configuration Requests/Second
Single instance 0.5-0.8 RPS
Auto-scaled (10 instances) 5-8 RPS
Production recommended Keep <5 RPS per instance

Accuracy

Based on multi-model testing across GPT-3.5-Turbo and GPT-4:

Test Type Validation Rate Block Rate Notes
Adversarial testing 47-48% 52-53% Temperature 0.7, leading questions
Production typical 70-85% 15-30% Temperature 0.3, proper prompting
Contradiction detection 100% 0% All tested contradictions caught

Test Details: - 50 adversarial scenarios with leading questions - 6 contradiction pairs with 85-95% embedding similarity - Consistent performance across GPT-3.5 and GPT-4 - Zero false negatives on known contradictions


Rate Limits

Current (No Auth): - No enforced rate limits - Fair use policy: <100 requests/hour per IP

Coming in v6.0 (With API Keys):

Tier Requests/Month Rate Limit Price
Free 10,000 10/minute $0
Starter 100,000 30/minute $99/month
Pro 1,000,000 100/minute $499/month
Enterprise Unlimited Custom Custom

Best Practices

1. Chunking Long Responses

The API has a 2000-token limit per request. For longer responses:

def chunk_and_verify(long_response: str, context: list[str]) -> list[dict]:
    """Verify a long response by chunking into paragraphs."""
    chunks = long_response.split('\n\n')  # Split by paragraph
    results = []

    for i, chunk in enumerate(chunks):
        if not chunk.strip():
            continue

        result = verify_response(chunk, context)
        results.append({
            'chunk_index': i,
            'text': chunk,
            'status': result['status'],
            'confidence': result['confidence']
        })

    return results

2. Context Selection

Good Context:

context = [
    "Pricing: $99/month for Pro plan",
    "Features: Unlimited projects, 10GB storage, priority support",
    "Free trial: 14 days, no credit card required"
]

Bad Context (Too Vague):

context = [
    "We have various pricing tiers available",
    "Multiple features are included",
    "Trial period is offered"
]

Tip: Include specific facts, numbers, and details. The NLI model works best with concrete information.


3. Threshold Tuning

Start with default (0.85) and adjust based on false positive/negative rates:

# Test different thresholds on your data
thresholds = [0.75, 0.80, 0.85, 0.90, 0.95]
test_cases = [...]  # Your test data

for threshold in thresholds:
    validated = sum(
        1 for case in test_cases 
        if verify_response(case['response'], case['context'], threshold)['status'] == 'VALIDATED'
    )
    print(f"Threshold {threshold}: {validated}/{len(test_cases)} validated ({validated/len(test_cases):.1%})")

Output:

Threshold 0.75: 38/50 validated (76.0%)
Threshold 0.80: 32/50 validated (64.0%)
Threshold 0.85: 24/50 validated (48.0%)  ← Production default
Threshold 0.90: 18/50 validated (36.0%)
Threshold 0.95: 12/50 validated (24.0%)

4. Error Handling

Always implement timeout and retry logic:

import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def create_session():
    """Create session with automatic retries."""
    session = requests.Session()

    retry_strategy = Retry(
        total=3,
        status_forcelist=[429, 500, 502, 503, 504],
        backoff_factor=1,  # Wait 1s, 2s, 4s between retries
        allowed_methods=["POST"]
    )

    adapter = HTTPAdapter(max_retries=retry_strategy)
    session.mount("https://", adapter)

    return session

# Use it
session = create_session()
response = session.post(
    "https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
    json={"response": "...", "reference_facts": [...]},
    timeout=10
)

5. Monitoring Integration

Log verification results for analysis:

import logging

logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def verify_and_log(llm_output: str, context: list[str]) -> dict:
    """Verify and log results for monitoring."""
    result = verify_response(llm_output, context)

    logger.info(
        f"Verification: status={result['status']}, "
        f"confidence={result['confidence']:.2f}, "
        f"latency={result['processing_time_ms']}ms"
    )

    # Send to your monitoring system (Datadog, New Relic, etc)
    # metrics.increment(f"tensalis.{result['status'].lower()}")
    # metrics.histogram("tensalis.confidence", result['confidence'])
    # metrics.histogram("tensalis.latency", result['processing_time_ms'])

    return result

Common Issues & Solutions

Issue: High block rate (>60% blocked)

Causes: - Threshold too high (try lowering to 0.75-0.80) - LLM adding elaborations not in context - Context too vague or incomplete - Using adversarial prompting (high temperature, leading questions)

Solutions:

# 1. Lower threshold
result = verify_response(output, context, threshold=0.75)

# 2. Improve LLM prompting
prompt = """
Answer based ONLY on the following context. 
Do not add information not explicitly stated.
Be concise and factual.

Context: {context}

Question: {question}
"""

# 3. Add more complete context
context = [
    # Add all relevant details from source documents
    # Include specific facts, numbers, requirements
]

Issue: Slow response times (>5 seconds)

Causes: - Cold start (first request after idle) - Network latency - Long input text - Cloud Run scaling up

Solutions:

# 1. Implement warming (ping periodically)
import schedule
import time

def keep_warm():
    try:
        verify_response("test", ["test"], timeout=5)
    except:
        pass

schedule.every(5).minutes.do(keep_warm)

# 2. Use async/await for multiple requests
import asyncio
import aiohttp

async def verify_async(llm_output, context):
    async with aiohttp.ClientSession() as session:
        async with session.post(
            "https://tensalis-engine-zlqsb5lbna-uc.a.run.app/v1/verify",
            json={"response": llm_output, "reference_facts": context}
        ) as response:
            return await response.json()

# Verify multiple responses in parallel
results = await asyncio.gather(
    verify_async(output1, context1),
    verify_async(output2, context2),
    verify_async(output3, context3)
)

Issue: False positives (should be blocked but validated)

Causes: - Response uses synonyms/paraphrasing correctly - Context is ambiguous - Threshold too low

Solution:

# Use higher threshold for critical applications
result = verify_response(
    llm_output, 
    context, 
    threshold=0.90  # Stricter
)

# Or request detailed scores to investigate
result = verify_response(
    llm_output, 
    context,
    threshold=0.85,
    return_details=True
)

if result['status'] == 'VALIDATED' and result['details']['neutral_score'] > 0.15:
    # High neutral score suggests uncertainty - consider blocking
    print("Warning: Validated but uncertain (high neutral score)")

Webhook Integration (Coming in v6.0)

Subscribe to verification events:

# Register webhook
POST /v1/webhooks
{
  "url": "https://your-app.com/tensalis-webhook",
  "events": ["verification.completed", "verification.failed"]
}

# Your webhook endpoint receives:
POST https://your-app.com/tensalis-webhook
{
  "event": "verification.completed",
  "timestamp": "2025-12-17T10:30:00Z",
  "data": {
    "verification_id": "ver_1234567890",
    "status": "BLOCKED",
    "confidence": 0.73,
    "processing_time_ms": 1847
  }
}

Changelog

v5.4.1 (Current)

v6.0 (Planned Q1 2026)

v6.1 (Planned Q2 2026)


Support


Legal


Last Updated: December 17, 2025
API Version: 5.4.1