Vaidya.ai

Chat Completions API

OpenAI-compatible chat endpoint with Vaidya-specific case selector for healthcare workflows.

Endpoint

POST https://api.vaidya-dev.fractal.ai/chat/completions

OpenAI-compatible chat completions on the development API host. Use optional case to select credit-based healthcare workflows (see Case Values).

Try it (cURL)

curl --request POST \
  --url https://api.vaidya-dev.fractal.ai/chat/completions \
  --header "authorization: Bearer $VAIDYA_API_KEY" \
  --header "content-type: application/json" \
  --data '{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Explain the pathogenesis of rheumatoid arthritis."
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7
}'

Authentication

Pass API key in headers:

Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/json

Request Body

Required Fields

FieldTypeRequiredDescription
modelstringYesModel identifier. Example: Vaidya-v2
casestringNoWorkflow selector for healthcare modes and credit pricing (see Case Values); omit for general chat when your deployment allows it
messagesarrayYesChat history in OpenAI-compatible role/content format

Optional Fields

FieldTypeDefaultDescription
max_tokensintegermodel defaultMax tokens in generated response
temperaturenumber0.7Controls randomness
top_pnumber1.0Nucleus sampling
presence_penaltynumber0Encourages novel tokens
frequency_penaltynumber0Penalizes repetition
streambooleanfalseIf true, stream partial responses
stream_optionsobjectnullStreaming controls. Example: {"include_usage": true}
userstringnullEnd-user identifier for tracing/abuse monitoring
metadataobjectnullClient-defined metadata

Case Values

CaseCreditsComplexityFile Upload
symptom_qa1LowNo
drug_lookup2LowNo
facility_search2LowNo
health_score5MediumNo
lab_report_text10MediumNo
lab_report_pdf15MediumYes
lab_report_image20HighYes
health_plan25HighNo

Message Schema

Each message follows OpenAI chat format:

{
  "role": "user",
  "content": "Your prompt text"
}

Supported role values: system, user, assistant.

File Inputs (PDF and Image Cases)

For lab_report_pdf and lab_report_image, pass multimodal content in messages[].content.

{
  "type": "file_url",
  "file_url": {
    "url": "https://example.com/report.pdf"
  }
}
{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/report-image.jpg"
  }
}
{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Extract biomarkers and summarize abnormalities."
    },
    {
      "type": "file_url",
      "file_url": {
        "url": "https://example.com/lab-report.pdf"
      }
    }
  ]
}

Canonical Example (Drug Lookup)

{
  "model": "Vaidya-v2",
  "case": "drug_lookup",
  "messages": [
    {
      "role": "user",
      "content": "Can I take ibuprofen with paracetamol? Include side effects and cautions."
    }
  ],
  "max_tokens": 300,
  "temperature": 0.2
}

Python Example

import requests
import json

url = "https://api.vaidya-dev.fractal.ai/chat/completions"

payload = {
    "model": "Vaidya-v2",
    "case": "drug_lookup",
    "messages": [
        {
            "role": "user",
            "content": "Can I take ibuprofen with paracetamol?"
        }
    ],
    "max_tokens": 300,
    "temperature": 0.2
}

headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)

Streaming Requests

Set stream: true to receive Server-Sent Events (SSE) chunks instead of one final JSON response.

Streaming Request Example

{
  "model": "Vaidya-v2",
  "case": "symptom_qa",
  "messages": [
    {
      "role": "user",
      "content": "I have mild fever and sore throat. Share quick triage and red flags."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 300,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Python Streaming Example

import requests
import json

url = "https://api.vaidya-dev.fractal.ai/chat/completions"
headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}
payload = {
    "model": "Vaidya-v2",
    "case": "drug_lookup",
    "messages": [
        {"role": "user", "content": "Can I combine ibuprofen and paracetamol?"}
    ],
    "stream": True,
    "stream_options": {"include_usage": True},
    "temperature": 0.2,
    "max_tokens": 300
}

with requests.post(url, headers=headers, data=json.dumps(payload), stream=True, timeout=120) as r:
    r.raise_for_status()
    for line in r.iter_lines(decode_unicode=True):
        if not line:
            continue
        if line.startswith("data: "):
            data = line[len("data: "):]
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0].get("delta", {})
            text = delta.get("content")
            if text:
                print(text, end="", flush=True)

Compatibility Note for case

Preferred format: top-level case. If legacy clients send case inside a message object, migrate to top-level case for forward compatibility.

{
  "model": "Vaidya-v2",
  "case": "drug_lookup",
  "messages": [
    { "role": "user", "content": "..." }
  ]
}

Successful Response

Non-streaming Response Shape

{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1761710000,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "You can usually take them together in standard doses..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 145,
    "completion_tokens": 120,
    "total_tokens": 265
  }
}

Response Fields

FieldTypeDescription
idstringUnique response ID
objectstringResponse object type
createdintegerUnix timestamp
modelstringModel used for generation
choicesarrayOne or more generated outputs
choices[].message.rolestringTypically assistant
choices[].message.contentstringFinal generated text
choices[].finish_reasonstringstop, length, etc.
usageobjectToken usage stats

Streaming Response Structure (SSE)

Each SSE event is sent as a data: line containing a JSON chunk. Stream ends with data: [DONE].

{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710100,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710101,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "You can usually combine them in standard doses, but..."
      },
      "finish_reason": null
    }
  ]
}
{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710102,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 145,
    "completion_tokens": 118,
    "total_tokens": 263
  }
}

vLLM-Compatible Extra Parameters

For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).

from openai import OpenAI

client = OpenAI(
    api_key="<VAIDYA_API_KEY>",
    base_url="https://api.vaidya-dev.fractal.ai"
)

resp = client.chat.completions.create(
    model="Vaidya-v2",
    messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
    extra_body={
        "case": "lab_report_text",
        "top_k": 40,
        "min_p": 0.05,
        "repetition_penalty": 1.05,
        "structured_outputs": {
            "choice": ["normal", "abnormal"]
        }
    }
)
print(resp.choices[0].message.content)
{
  "model": "Vaidya-v2",
  "case": "health_plan",
  "messages": [
    {
      "role": "user",
      "content": "Create a 2-week actionable plan for prediabetes."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 800,
  "top_k": 40,
  "min_p": 0.05,
  "repetition_penalty": 1.05
}

Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.

Error Responses

Error Shape

{
  "error": {
    "message": "Invalid case value: med_lookup",
    "type": "invalid_request_error",
    "code": "invalid_case"
  }
}

Common Errors

HTTP StatusCodeMeaningFix
400invalid_caseUnsupported case valueUse supported case enum
400invalid_messagesMalformed messages payloadValidate role/content schema
400missing_file_inputFile required for selected caseAttach PDF/image URL
401invalid_api_keyMissing/invalid credentialsSet valid Bearer key
413payload_too_largeRequest too largeReduce content/file size
422unprocessable_entityValidation failureFix field types and enums
429rate_limit_exceededToo many requestsRetry with exponential backoff
500internal_server_errorServer issueRetry with request ID logging
503service_unavailableTemporary downtimeRetry after delay

LiteLLM clients

If you use LiteLLM against this API, HTTP errors are mapped to OpenAI-style Python exceptions (see LiteLLM Exception Mapping). The JSON body from Vaidya (including error.code values in the table above) usually appears in the exception message; use status_code for branching and retries.

HTTP statusLiteLLM exception (import from litellm or catch openai equivalent)
400BadRequestError — subclasses may include ContextWindowExceededError, ContentPolicyViolationError, UnsupportedParamsError, ImageFetchError when applicable
401AuthenticationError
403PermissionDeniedError
404NotFoundError
408APITimeoutError
413Usually BadRequestError; always check e.status_code
422UnprocessableEntityError
429RateLimitError
500APIError, APIConnectionError, or InternalServerError
503ServiceUnavailableError

Exceptions also expose llm_provider and provider_specific_fields when the upstream response includes extra detail. On LiteLLM Proxy, team budgets may raise BudgetExceededError before the request reaches Vaidya.

Rate Limiting and Retries

Recommended retry strategy:

  • Retry on 429, 500, 503.
  • Use exponential backoff with jitter.
  • Honor Retry-After header when present.

Best Practices

For end-to-end guidance (system prompts, temperature by use case, multi-turn context, tokens, retries, security, and UI disclaimers), see Best Practices.

API-specific reminders:

  • Keep temperature low for medical/factual use cases.
  • Keep prompts structured and explicit.
  • Use top-level case consistently when you rely on workflow routing.
  • Include enough prior conversation context in messages.
  • For facility_search, always include location.
  • For file-based cases, include clear task instruction plus file item.

Use-Case Example Requests

{
  "model": "Vaidya-v2",
  "case": "symptom_qa",
  "messages": [
    {
      "role": "user",
      "content": "Fever and sore throat for 2 days, no breathing issue. Please share likely causes, home care, and red flags."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 400
}
{
  "model": "Vaidya-v2",
  "case": "lab_report_pdf",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Extract biomarkers, flag abnormal ranges, and summarize key implications in simple language."
        },
        {
          "type": "file_url",
          "file_url": {
            "url": "https://example.com/reports/lipid-cbc-report.pdf"
          }
        }
      ]
    }
  ],
  "temperature": 0.2,
  "max_tokens": 1100
}
{
  "model": "Vaidya-v2",
  "case": "lab_report_image",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Read this report image, extract values, and explain abnormalities. Mention if OCR confidence is low."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/reports/report-photo.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 0.2,
  "max_tokens": 1200
}

On this page