Chat Completions API

OpenAI-compatible chat endpoint with Vaidya-specific case selector for healthcare workflows.

Endpoint

POST https://api.vaidya-dev.fractal.ai/chat/completions

OpenAI-compatible chat completions on the development API host. Use optional case to select credit-based healthcare workflows (see Case Values).

Try it (cURL)

curl --request POST \
  --url https://api.vaidya-dev.fractal.ai/chat/completions \
  --header "authorization: Bearer $VAIDYA_API_KEY" \
  --header "content-type: application/json" \
  --data '{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Explain the pathogenesis of rheumatoid arthritis."
    }
  ],
  "max_tokens": 1000,
  "temperature": 0.7
}'

Authentication

Pass API key in headers:

Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/json

Request Body

Required Fields

Field	Type	Required	Description
`model`	string	Yes	Model identifier. Example: `Vaidya-v2`
`case`	string	No	Workflow selector for healthcare modes and credit pricing (see Case Values); omit for general chat when your deployment allows it
`messages`	array	Yes	Chat history in OpenAI-compatible role/content format

Optional Fields

Field	Type	Default	Description
`max_tokens`	integer	model default	Max tokens in generated response
`temperature`	number	`0.7`	Controls randomness
`top_p`	number	`1.0`	Nucleus sampling
`presence_penalty`	number	`0`	Encourages novel tokens
`frequency_penalty`	number	`0`	Penalizes repetition
`stream`	boolean	`false`	If true, stream partial responses
`stream_options`	object	null	Streaming controls. Example: `{"include_usage": true}`
`user`	string	null	End-user identifier for tracing/abuse monitoring
`metadata`	object	null	Client-defined metadata

Case Values

Case	Credits	Complexity	File Upload
`symptom_qa`	1	Low	No
`drug_lookup`	2	Low	No
`facility_search`	2	Low	No
`health_score`	5	Medium	No
`lab_report_text`	10	Medium	No
`lab_report_pdf`	15	Medium	Yes
`lab_report_image`	20	High	Yes
`health_plan`	25	High	No

Message Schema

Each message follows OpenAI chat format:

{
  "role": "user",
  "content": "Your prompt text"
}

Supported role values: system, user, assistant.

File Inputs (PDF and Image Cases)

For lab_report_pdf and lab_report_image, pass multimodal content in messages[].content.

{
  "type": "file_url",
  "file_url": {
    "url": "https://example.com/report.pdf"
  }
}

{
  "type": "image_url",
  "image_url": {
    "url": "https://example.com/report-image.jpg"
  }
}

{
  "role": "user",
  "content": [
    {
      "type": "text",
      "text": "Extract biomarkers and summarize abnormalities."
    },
    {
      "type": "file_url",
      "file_url": {
        "url": "https://example.com/lab-report.pdf"
      }
    }
  ]
}

Canonical Example (Drug Lookup)

{
  "model": "Vaidya-v2",
  "case": "drug_lookup",
  "messages": [
    {
      "role": "user",
      "content": "Can I take ibuprofen with paracetamol? Include side effects and cautions."
    }
  ],
  "max_tokens": 300,
  "temperature": 0.2
}

Python Example

import requests
import json

url = "https://api.vaidya-dev.fractal.ai/chat/completions"

payload = {
    "model": "Vaidya-v2",
    "case": "drug_lookup",
    "messages": [
        {
            "role": "user",
            "content": "Can I take ibuprofen with paracetamol?"
        }
    ],
    "max_tokens": 300,
    "temperature": 0.2
}

headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)

Streaming Requests

Set stream: true to receive Server-Sent Events (SSE) chunks instead of one final JSON response.

Streaming Request Example

{
  "model": "Vaidya-v2",
  "case": "symptom_qa",
  "messages": [
    {
      "role": "user",
      "content": "I have mild fever and sore throat. Share quick triage and red flags."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 300,
  "stream": true,
  "stream_options": {
    "include_usage": true
  }
}

Python Streaming Example

import requests
import json

url = "https://api.vaidya-dev.fractal.ai/chat/completions"
headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}
payload = {
    "model": "Vaidya-v2",
    "case": "drug_lookup",
    "messages": [
        {"role": "user", "content": "Can I combine ibuprofen and paracetamol?"}
    ],
    "stream": True,
    "stream_options": {"include_usage": True},
    "temperature": 0.2,
    "max_tokens": 300
}

with requests.post(url, headers=headers, data=json.dumps(payload), stream=True, timeout=120) as r:
    r.raise_for_status()
    for line in r.iter_lines(decode_unicode=True):
        if not line:
            continue
        if line.startswith("data: "):
            data = line[len("data: "):]
            if data == "[DONE]":
                break
            chunk = json.loads(data)
            delta = chunk["choices"][0].get("delta", {})
            text = delta.get("content")
            if text:
                print(text, end="", flush=True)

Compatibility Note for `case`

Preferred format: top-level case. If legacy clients send case inside a message object, migrate to top-level case for forward compatibility.

{
  "model": "Vaidya-v2",
  "case": "drug_lookup",
  "messages": [
    { "role": "user", "content": "..." }
  ]
}

Successful Response

Non-streaming Response Shape

{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1761710000,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "You can usually take them together in standard doses..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 145,
    "completion_tokens": 120,
    "total_tokens": 265
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique response ID
`object`	string	Response object type
`created`	integer	Unix timestamp
`model`	string	Model used for generation
`choices`	array	One or more generated outputs
`choices[].message.role`	string	Typically `assistant`
`choices[].message.content`	string	Final generated text
`choices[].finish_reason`	string	`stop`, `length`, etc.
`usage`	object	Token usage stats

Streaming Response Structure (SSE)

Each SSE event is sent as a data: line containing a JSON chunk. Stream ends with data: [DONE].

{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710100,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {
        "role": "assistant"
      },
      "finish_reason": null
    }
  ]
}

{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710101,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {
        "content": "You can usually combine them in standard doses, but..."
      },
      "finish_reason": null
    }
  ]
}

{
  "id": "chatcmpl_stream_123",
  "object": "chat.completion.chunk",
  "created": 1761710102,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "delta": {},
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 145,
    "completion_tokens": 118,
    "total_tokens": 263
  }
}

vLLM-Compatible Extra Parameters

For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).

from openai import OpenAI

client = OpenAI(
    api_key="<VAIDYA_API_KEY>",
    base_url="https://api.vaidya-dev.fractal.ai"
)

resp = client.chat.completions.create(
    model="Vaidya-v2",
    messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
    extra_body={
        "case": "lab_report_text",
        "top_k": 40,
        "min_p": 0.05,
        "repetition_penalty": 1.05,
        "structured_outputs": {
            "choice": ["normal", "abnormal"]
        }
    }
)
print(resp.choices[0].message.content)

{
  "model": "Vaidya-v2",
  "case": "health_plan",
  "messages": [
    {
      "role": "user",
      "content": "Create a 2-week actionable plan for prediabetes."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 800,
  "top_k": 40,
  "min_p": 0.05,
  "repetition_penalty": 1.05
}

Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.

Error Responses

Error Shape

{
  "error": {
    "message": "Invalid case value: med_lookup",
    "type": "invalid_request_error",
    "code": "invalid_case"
  }
}

Common Errors

HTTP Status	Code	Meaning	Fix
400	`invalid_case`	Unsupported `case` value	Use supported case enum
400	`invalid_messages`	Malformed messages payload	Validate role/content schema
400	`missing_file_input`	File required for selected case	Attach PDF/image URL
401	`invalid_api_key`	Missing/invalid credentials	Set valid Bearer key
413	`payload_too_large`	Request too large	Reduce content/file size
422	`unprocessable_entity`	Validation failure	Fix field types and enums
429	`rate_limit_exceeded`	Too many requests	Retry with exponential backoff
500	`internal_server_error`	Server issue	Retry with request ID logging
503	`service_unavailable`	Temporary downtime	Retry after delay

If you use LiteLLM against this API, HTTP errors are mapped to OpenAI-style Python exceptions (see LiteLLM Exception Mapping). The JSON body from Vaidya (including error.code values in the table above) usually appears in the exception message; use status_code for branching and retries.

HTTP status	LiteLLM exception (import from `litellm` or catch `openai` equivalent)
400	`BadRequestError` — subclasses may include `ContextWindowExceededError`, `ContentPolicyViolationError`, `UnsupportedParamsError`, `ImageFetchError` when applicable
401	`AuthenticationError`
403	`PermissionDeniedError`
404	`NotFoundError`
408	`APITimeoutError`
413	Usually `BadRequestError`; always check `e.status_code`
422	`UnprocessableEntityError`
429	`RateLimitError`
500	`APIError`, `APIConnectionError`, or `InternalServerError`
503	`ServiceUnavailableError`

Exceptions also expose llm_provider and provider_specific_fields when the upstream response includes extra detail. On LiteLLM Proxy, team budgets may raise BudgetExceededError before the request reaches Vaidya.

Rate Limiting and Retries

Recommended retry strategy:

Retry on 429, 500, 503.
Use exponential backoff with jitter.
Honor Retry-After header when present.

Best Practices

For end-to-end guidance (system prompts, temperature by use case, multi-turn context, tokens, retries, security, and UI disclaimers), see Best Practices.

API-specific reminders:

Keep temperature low for medical/factual use cases.
Keep prompts structured and explicit.
Use top-level case consistently when you rely on workflow routing.
Include enough prior conversation context in messages.
For facility_search, always include location.
For file-based cases, include clear task instruction plus file item.

Use-Case Example Requests

{
  "model": "Vaidya-v2",
  "case": "symptom_qa",
  "messages": [
    {
      "role": "user",
      "content": "Fever and sore throat for 2 days, no breathing issue. Please share likely causes, home care, and red flags."
    }
  ],
  "temperature": 0.3,
  "max_tokens": 400
}

{
  "model": "Vaidya-v2",
  "case": "facility_search",
  "messages": [
    {
      "role": "user",
      "content": "Find nearby internal medicine clinics in Koramangala, Bengaluru, open after 7 PM."
    }
  ],
  "temperature": 0.2,
  "max_tokens": 500
}

{
  "model": "Vaidya-v2",
  "case": "lab_report_pdf",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Extract biomarkers, flag abnormal ranges, and summarize key implications in simple language."
        },
        {
          "type": "file_url",
          "file_url": {
            "url": "https://example.com/reports/lipid-cbc-report.pdf"
          }
        }
      ]
    }
  ],
  "temperature": 0.2,
  "max_tokens": 1100
}

{
  "model": "Vaidya-v2",
  "case": "lab_report_image",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "Read this report image, extract values, and explain abnormalities. Mention if OCR confidence is low."
        },
        {
          "type": "image_url",
          "image_url": {
            "url": "https://example.com/reports/report-photo.jpg"
          }
        }
      ]
    }
  ],
  "temperature": 0.2,
  "max_tokens": 1200
}