Chat Completions API
OpenAI-compatible chat endpoint with Vaidya-specific case selector for healthcare workflows.
Endpoint
POST https://api.vaidya-dev.fractal.ai/chat/completionsOpenAI-compatible chat completions on the development API host. Use optional case to select credit-based healthcare workflows (see Case Values).
Try it (cURL)
curl --request POST \
--url https://api.vaidya-dev.fractal.ai/chat/completions \
--header "authorization: Bearer $VAIDYA_API_KEY" \
--header "content-type: application/json" \
--data '{
"model": "Vaidya-v2",
"messages": [
{
"role": "user",
"content": "Explain the pathogenesis of rheumatoid arthritis."
}
],
"max_tokens": 1000,
"temperature": 0.7
}'Authentication
Pass API key in headers:
Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/jsonRequest Body
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier. Example: Vaidya-v2 |
case | string | No | Workflow selector for healthcare modes and credit pricing (see Case Values); omit for general chat when your deployment allows it |
messages | array | Yes | Chat history in OpenAI-compatible role/content format |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
max_tokens | integer | model default | Max tokens in generated response |
temperature | number | 0.7 | Controls randomness |
top_p | number | 1.0 | Nucleus sampling |
presence_penalty | number | 0 | Encourages novel tokens |
frequency_penalty | number | 0 | Penalizes repetition |
stream | boolean | false | If true, stream partial responses |
stream_options | object | null | Streaming controls. Example: {"include_usage": true} |
user | string | null | End-user identifier for tracing/abuse monitoring |
metadata | object | null | Client-defined metadata |
Case Values
| Case | Credits | Complexity | File Upload |
|---|---|---|---|
symptom_qa | 1 | Low | No |
drug_lookup | 2 | Low | No |
facility_search | 2 | Low | No |
health_score | 5 | Medium | No |
lab_report_text | 10 | Medium | No |
lab_report_pdf | 15 | Medium | Yes |
lab_report_image | 20 | High | Yes |
health_plan | 25 | High | No |
Message Schema
Each message follows OpenAI chat format:
{
"role": "user",
"content": "Your prompt text"
}Supported role values: system, user, assistant.
File Inputs (PDF and Image Cases)
For lab_report_pdf and lab_report_image, pass multimodal content in messages[].content.
{
"type": "file_url",
"file_url": {
"url": "https://example.com/report.pdf"
}
}{
"type": "image_url",
"image_url": {
"url": "https://example.com/report-image.jpg"
}
}{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract biomarkers and summarize abnormalities."
},
{
"type": "file_url",
"file_url": {
"url": "https://example.com/lab-report.pdf"
}
}
]
}Canonical Example (Drug Lookup)
{
"model": "Vaidya-v2",
"case": "drug_lookup",
"messages": [
{
"role": "user",
"content": "Can I take ibuprofen with paracetamol? Include side effects and cautions."
}
],
"max_tokens": 300,
"temperature": 0.2
}Python Example
import requests
import json
url = "https://api.vaidya-dev.fractal.ai/chat/completions"
payload = {
"model": "Vaidya-v2",
"case": "drug_lookup",
"messages": [
{
"role": "user",
"content": "Can I take ibuprofen with paracetamol?"
}
],
"max_tokens": 300,
"temperature": 0.2
}
headers = {
"Authorization": "Bearer <VAIDYA_API_KEY>",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)Streaming Requests
Set stream: true to receive Server-Sent Events (SSE) chunks instead of one final JSON response.
Streaming Request Example
{
"model": "Vaidya-v2",
"case": "symptom_qa",
"messages": [
{
"role": "user",
"content": "I have mild fever and sore throat. Share quick triage and red flags."
}
],
"temperature": 0.3,
"max_tokens": 300,
"stream": true,
"stream_options": {
"include_usage": true
}
}Python Streaming Example
import requests
import json
url = "https://api.vaidya-dev.fractal.ai/chat/completions"
headers = {
"Authorization": "Bearer <VAIDYA_API_KEY>",
"Content-Type": "application/json"
}
payload = {
"model": "Vaidya-v2",
"case": "drug_lookup",
"messages": [
{"role": "user", "content": "Can I combine ibuprofen and paracetamol?"}
],
"stream": True,
"stream_options": {"include_usage": True},
"temperature": 0.2,
"max_tokens": 300
}
with requests.post(url, headers=headers, data=json.dumps(payload), stream=True, timeout=120) as r:
r.raise_for_status()
for line in r.iter_lines(decode_unicode=True):
if not line:
continue
if line.startswith("data: "):
data = line[len("data: "):]
if data == "[DONE]":
break
chunk = json.loads(data)
delta = chunk["choices"][0].get("delta", {})
text = delta.get("content")
if text:
print(text, end="", flush=True)Compatibility Note for case
Preferred format: top-level case. If legacy clients send case inside a message object, migrate to top-level case for forward compatibility.
{
"model": "Vaidya-v2",
"case": "drug_lookup",
"messages": [
{ "role": "user", "content": "..." }
]
}Successful Response
Non-streaming Response Shape
{
"id": "chatcmpl_abc123",
"object": "chat.completion",
"created": 1761710000,
"model": "Vaidya-v2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "You can usually take them together in standard doses..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 145,
"completion_tokens": 120,
"total_tokens": 265
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique response ID |
object | string | Response object type |
created | integer | Unix timestamp |
model | string | Model used for generation |
choices | array | One or more generated outputs |
choices[].message.role | string | Typically assistant |
choices[].message.content | string | Final generated text |
choices[].finish_reason | string | stop, length, etc. |
usage | object | Token usage stats |
Streaming Response Structure (SSE)
Each SSE event is sent as a data: line containing a JSON chunk. Stream ends with data: [DONE].
{
"id": "chatcmpl_stream_123",
"object": "chat.completion.chunk",
"created": 1761710100,
"model": "Vaidya-v2",
"choices": [
{
"index": 0,
"delta": {
"role": "assistant"
},
"finish_reason": null
}
]
}{
"id": "chatcmpl_stream_123",
"object": "chat.completion.chunk",
"created": 1761710101,
"model": "Vaidya-v2",
"choices": [
{
"index": 0,
"delta": {
"content": "You can usually combine them in standard doses, but..."
},
"finish_reason": null
}
]
}{
"id": "chatcmpl_stream_123",
"object": "chat.completion.chunk",
"created": 1761710102,
"model": "Vaidya-v2",
"choices": [
{
"index": 0,
"delta": {},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 145,
"completion_tokens": 118,
"total_tokens": 263
}
}vLLM-Compatible Extra Parameters
For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).
from openai import OpenAI
client = OpenAI(
api_key="<VAIDYA_API_KEY>",
base_url="https://api.vaidya-dev.fractal.ai"
)
resp = client.chat.completions.create(
model="Vaidya-v2",
messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
extra_body={
"case": "lab_report_text",
"top_k": 40,
"min_p": 0.05,
"repetition_penalty": 1.05,
"structured_outputs": {
"choice": ["normal", "abnormal"]
}
}
)
print(resp.choices[0].message.content){
"model": "Vaidya-v2",
"case": "health_plan",
"messages": [
{
"role": "user",
"content": "Create a 2-week actionable plan for prediabetes."
}
],
"temperature": 0.3,
"max_tokens": 800,
"top_k": 40,
"min_p": 0.05,
"repetition_penalty": 1.05
}Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.
Error Responses
Error Shape
{
"error": {
"message": "Invalid case value: med_lookup",
"type": "invalid_request_error",
"code": "invalid_case"
}
}Common Errors
| HTTP Status | Code | Meaning | Fix |
|---|---|---|---|
| 400 | invalid_case | Unsupported case value | Use supported case enum |
| 400 | invalid_messages | Malformed messages payload | Validate role/content schema |
| 400 | missing_file_input | File required for selected case | Attach PDF/image URL |
| 401 | invalid_api_key | Missing/invalid credentials | Set valid Bearer key |
| 413 | payload_too_large | Request too large | Reduce content/file size |
| 422 | unprocessable_entity | Validation failure | Fix field types and enums |
| 429 | rate_limit_exceeded | Too many requests | Retry with exponential backoff |
| 500 | internal_server_error | Server issue | Retry with request ID logging |
| 503 | service_unavailable | Temporary downtime | Retry after delay |
LiteLLM clients
If you use LiteLLM against this API, HTTP errors are mapped to OpenAI-style Python exceptions (see LiteLLM Exception Mapping). The JSON body from Vaidya (including error.code values in the table above) usually appears in the exception message; use status_code for branching and retries.
| HTTP status | LiteLLM exception (import from litellm or catch openai equivalent) |
|---|---|
| 400 | BadRequestError — subclasses may include ContextWindowExceededError, ContentPolicyViolationError, UnsupportedParamsError, ImageFetchError when applicable |
| 401 | AuthenticationError |
| 403 | PermissionDeniedError |
| 404 | NotFoundError |
| 408 | APITimeoutError |
| 413 | Usually BadRequestError; always check e.status_code |
| 422 | UnprocessableEntityError |
| 429 | RateLimitError |
| 500 | APIError, APIConnectionError, or InternalServerError |
| 503 | ServiceUnavailableError |
Exceptions also expose llm_provider and provider_specific_fields when the upstream response includes extra detail. On LiteLLM Proxy, team budgets may raise BudgetExceededError before the request reaches Vaidya.
Rate Limiting and Retries
Recommended retry strategy:
- Retry on
429,500,503. - Use exponential backoff with jitter.
- Honor
Retry-Afterheader when present.
Best Practices
For end-to-end guidance (system prompts, temperature by use case, multi-turn context, tokens, retries, security, and UI disclaimers), see Best Practices.
API-specific reminders:
- Keep
temperaturelow for medical/factual use cases. - Keep prompts structured and explicit.
- Use top-level
caseconsistently when you rely on workflow routing. - Include enough prior conversation context in
messages. - For
facility_search, always include location. - For file-based cases, include clear task instruction plus file item.
Use-Case Example Requests
{
"model": "Vaidya-v2",
"case": "symptom_qa",
"messages": [
{
"role": "user",
"content": "Fever and sore throat for 2 days, no breathing issue. Please share likely causes, home care, and red flags."
}
],
"temperature": 0.3,
"max_tokens": 400
}{
"model": "Vaidya-v2",
"case": "facility_search",
"messages": [
{
"role": "user",
"content": "Find nearby internal medicine clinics in Koramangala, Bengaluru, open after 7 PM."
}
],
"temperature": 0.2,
"max_tokens": 500
}{
"model": "Vaidya-v2",
"case": "lab_report_pdf",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Extract biomarkers, flag abnormal ranges, and summarize key implications in simple language."
},
{
"type": "file_url",
"file_url": {
"url": "https://example.com/reports/lipid-cbc-report.pdf"
}
}
]
}
],
"temperature": 0.2,
"max_tokens": 1100
}{
"model": "Vaidya-v2",
"case": "lab_report_image",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Read this report image, extract values, and explain abnormalities. Mention if OCR confidence is low."
},
{
"type": "image_url",
"image_url": {
"url": "https://example.com/reports/report-photo.jpg"
}
}
]
}
],
"temperature": 0.2,
"max_tokens": 1200
}
