Chat Completions API

OpenAI-compatible chat endpoint for healthcare AI — supports Symptom Checker, Wellness Management, Emergency Assist, Patient Journey Assist, and Administrator Assist.

Endpoint

POST https://api.vaidya.ai/vaidya/chat/completions

OpenAI-compatible chat completions on the development API host. The model implicitly understands healthcare requests and responds to the appropriate use case.

Try it (cURL)

curl --request POST \
  --url https://api.vaidya.ai/vaidya/chat/completions \
  --header "authorization: Bearer $VAIDYA_API_KEY" \
  --header "content-type: application/json" \
  --data '{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Explain the pathogenesis of rheumatoid arthritis."
    }
  ],
  "temperature": 0.2
}'

Authentication

Pass API key in headers:

Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/json

Request Body

Required Fields

Field	Type	Required	Description
`model`	string	Yes	Model identifier. Here `Vaidya-v2`
`messages`	array	Yes	Chat history in OpenAI-compatible role/content format

Optional Fields

Field	Type	Default	Description
`temperature`	number	`1.0`	Controls randomness
`top_p`	number	`1.0`	Nucleus sampling
`n`	integer	`1`	Number of completions to generate
`presence_penalty`	number	`0.0`	Encourages novel tokens
`frequency_penalty`	number	`0.0`	Penalizes repetition
`stream`	boolean	`false`	If true, stream partial responses
`stream_options`	object	null	Streaming controls. Example: `{"include_usage": true}`
`user`	string	null	End-user identifier for tracing/abuse monitoring
`metadata`	object	null	Client-defined metadata

Note: max_tokens is overridden by Vaidya-v2 since it is a reasoning model. Token allocation is managed internally.

Message Schema

Each message follows OpenAI chat format:

{
  "role": "user",
  "content": "Your prompt text"
}

Supported role values: system, user, assistant.

Python Example

import requests
import json

url = "https://api.vaidya.ai/vaidya/chat/completions"

payload = {
    "model": "Vaidya-v2",
    "messages": [
        {
            "role": "user",
            "content": "Can I take ibuprofen with paracetamol?"
        }
    ],
    "temperature": 0.2
}

headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)

Successful Response

{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1761710000,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are some home remedies..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 720,
    "total_tokens": 765
  }
}

Response Fields

Field	Type	Description
`id`	string	Unique response ID
`object`	string	Response object type
`created`	integer	Unix timestamp
`model`	string	Model used for generation
`choices`	array	One or more generated outputs
`choices[].message.role`	string	Typically `assistant`
`choices[].message.content`	string	Final generated text
`choices[].finish_reason`	string	`stop`, `length` (when the internal limit is reached), etc.
`usage`	object	Token usage stats

The choices array may contain multiple completions if n > 1 is specified in the request.
The completion_tokens field is the sum of reasoning and output tokens generated in this response.
The prompt_tokens field counts the input tokens from the messages array.

vLLM-Compatible Extra Parameters

For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).

from openai import OpenAI

client = OpenAI(
    api_key="<VAIDYA_API_KEY>",
    base_url="https://api.vaidya.ai/vaidya",
)

resp = client.chat.completions.create(
    model="Vaidya-v2",
    messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
    extra_body={
        "top_k": 40,
        "min_p": 0.05,
        "repetition_penalty": 1.05,
        "structured_outputs": {
            "choice": ["normal", "abnormal"]
        }
    }
)
print(resp.choices[0].message.content)

{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Create a 2-week actionable plan for prediabetes."
    }
  ],
  "temperature": 0.3,
  "top_k": 40,
  "min_p": 0.05,
  "repetition_penalty": 1.05
}

Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.

Error Responses

Error Shape

{
  "error": {
    "message": "Invalid request: malformed messages payload",
    "type": "invalid_request_error",
    "code": "invalid_messages"
  }
}

Common Errors

HTTP Status	Code	Meaning	Fix
400	`invalid_messages`	Malformed messages payload	Validate role/content schema
401	`invalid_api_key`	Missing/invalid credentials	Set valid Bearer key
422	`unprocessable_entity`	Validation failure	Fix field types and enums
429	`rate_limit_exceeded`	Too many requests	Retry with exponential backoff
500	`internal_server_error`	Server issue	Retry with request ID logging
503	`service_unavailable`	Temporary downtime	Retry after delay

Rate Limiting and Retries

Recommended retry strategy:

Retry on 429, 500, 503.
Use exponential backoff with jitter.
Honor Retry-After header when present.

Chat Completions API

On this page