Chat Completions API
OpenAI-compatible chat endpoint for healthcare AI — supports Symptom Checker, Wellness Management, Emergency Assist, Patient Journey Assist, and Administrator Assist.
Endpoint
POST https://api.vaidya.ai/vaidya/chat/completionsOpenAI-compatible chat completions on the development API host. The model implicitly understands healthcare requests and responds to the appropriate use case.
Try it (cURL)
curl --request POST \
--url https://api.vaidya.ai/vaidya/chat/completions \
--header "authorization: Bearer $VAIDYA_API_KEY" \
--header "content-type: application/json" \
--data '{
"model": "Vaidya-v2",
"messages": [
{
"role": "user",
"content": "Explain the pathogenesis of rheumatoid arthritis."
}
],
"temperature": 0.2
}'Authentication
Pass API key in headers:
Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/jsonRequest Body
Required Fields
| Field | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model identifier. Here Vaidya-v2 |
messages | array | Yes | Chat history in OpenAI-compatible role/content format |
Optional Fields
| Field | Type | Default | Description |
|---|---|---|---|
temperature | number | 1.0 | Controls randomness |
top_p | number | 1.0 | Nucleus sampling |
n | integer | 1 | Number of completions to generate |
presence_penalty | number | 0.0 | Encourages novel tokens |
frequency_penalty | number | 0.0 | Penalizes repetition |
stream | boolean | false | If true, stream partial responses |
stream_options | object | null | Streaming controls. Example: {"include_usage": true} |
user | string | null | End-user identifier for tracing/abuse monitoring |
metadata | object | null | Client-defined metadata |
Note: max_tokens is overridden by Vaidya-v2 since it is a reasoning model. Token allocation is managed internally.
Message Schema
Each message follows OpenAI chat format:
{
"role": "user",
"content": "Your prompt text"
}Supported role values: system, user, assistant.
Python Example
import requests
import json
url = "https://api.vaidya.ai/vaidya/chat/completions"
payload = {
"model": "Vaidya-v2",
"messages": [
{
"role": "user",
"content": "Can I take ibuprofen with paracetamol?"
}
],
"temperature": 0.2
}
headers = {
"Authorization": "Bearer <VAIDYA_API_KEY>",
"Content-Type": "application/json"
}
response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)Successful Response
{
"id": "chatcmpl_abc123",
"object": "chat.completion",
"created": 1761710000,
"model": "Vaidya-v2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Here are some home remedies..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 45,
"completion_tokens": 720,
"total_tokens": 765
}
}Response Fields
| Field | Type | Description |
|---|---|---|
id | string | Unique response ID |
object | string | Response object type |
created | integer | Unix timestamp |
model | string | Model used for generation |
choices | array | One or more generated outputs |
choices[].message.role | string | Typically assistant |
choices[].message.content | string | Final generated text |
choices[].finish_reason | string | stop, length (when the internal limit is reached), etc. |
usage | object | Token usage stats |
- The
choicesarray may contain multiple completions ifn > 1is specified in the request. - The
completion_tokensfield is the sum of reasoning and output tokens generated in this response. - The
prompt_tokensfield counts the input tokens from themessagesarray.
vLLM-Compatible Extra Parameters
For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).
from openai import OpenAI
client = OpenAI(
api_key="<VAIDYA_API_KEY>",
base_url="https://api.vaidya.ai/vaidya",
)
resp = client.chat.completions.create(
model="Vaidya-v2",
messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
extra_body={
"top_k": 40,
"min_p": 0.05,
"repetition_penalty": 1.05,
"structured_outputs": {
"choice": ["normal", "abnormal"]
}
}
)
print(resp.choices[0].message.content){
"model": "Vaidya-v2",
"messages": [
{
"role": "user",
"content": "Create a 2-week actionable plan for prediabetes."
}
],
"temperature": 0.3,
"top_k": 40,
"min_p": 0.05,
"repetition_penalty": 1.05
}Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.
Error Responses
Error Shape
{
"error": {
"message": "Invalid request: malformed messages payload",
"type": "invalid_request_error",
"code": "invalid_messages"
}
}Common Errors
| HTTP Status | Code | Meaning | Fix |
|---|---|---|---|
| 400 | invalid_messages | Malformed messages payload | Validate role/content schema |
| 401 | invalid_api_key | Missing/invalid credentials | Set valid Bearer key |
| 422 | unprocessable_entity | Validation failure | Fix field types and enums |
| 429 | rate_limit_exceeded | Too many requests | Retry with exponential backoff |
| 500 | internal_server_error | Server issue | Retry with request ID logging |
| 503 | service_unavailable | Temporary downtime | Retry after delay |
Rate Limiting and Retries
Recommended retry strategy:
- Retry on
429,500,503. - Use exponential backoff with jitter.
- Honor
Retry-Afterheader when present.

