Vaidya.ai

Chat Completions API

OpenAI-compatible chat endpoint for healthcare AI — supports Symptom Checker, Wellness Management, Emergency Assist, Patient Journey Assist, and Administrator Assist.

Endpoint

POST https://api.vaidya.ai/vaidya/chat/completions

OpenAI-compatible chat completions on the development API host. The model implicitly understands healthcare requests and responds to the appropriate use case.

Try it (cURL)

curl --request POST \
  --url https://api.vaidya.ai/vaidya/chat/completions \
  --header "authorization: Bearer $VAIDYA_API_KEY" \
  --header "content-type: application/json" \
  --data '{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Explain the pathogenesis of rheumatoid arthritis."
    }
  ],
  "temperature": 0.2
}'

Authentication

Pass API key in headers:

Authorization: Bearer <VAIDYA_API_KEY>
Content-Type: application/json

Request Body

Required Fields

FieldTypeRequiredDescription
modelstringYesModel identifier. Here Vaidya-v2
messagesarrayYesChat history in OpenAI-compatible role/content format

Optional Fields

FieldTypeDefaultDescription
temperaturenumber1.0Controls randomness
top_pnumber1.0Nucleus sampling
ninteger1Number of completions to generate
presence_penaltynumber0.0Encourages novel tokens
frequency_penaltynumber0.0Penalizes repetition
streambooleanfalseIf true, stream partial responses
stream_optionsobjectnullStreaming controls. Example: {"include_usage": true}
userstringnullEnd-user identifier for tracing/abuse monitoring
metadataobjectnullClient-defined metadata

Note: max_tokens is overridden by Vaidya-v2 since it is a reasoning model. Token allocation is managed internally.

Message Schema

Each message follows OpenAI chat format:

{
  "role": "user",
  "content": "Your prompt text"
}

Supported role values: system, user, assistant.

Python Example

import requests
import json

url = "https://api.vaidya.ai/vaidya/chat/completions"

payload = {
    "model": "Vaidya-v2",
    "messages": [
        {
            "role": "user",
            "content": "Can I take ibuprofen with paracetamol?"
        }
    ],
    "temperature": 0.2
}

headers = {
    "Authorization": "Bearer <VAIDYA_API_KEY>",
    "Content-Type": "application/json"
}

response = requests.post(url, headers=headers, data=json.dumps(payload), timeout=60)
print(response.status_code)
print(response.text)

Successful Response

{
  "id": "chatcmpl_abc123",
  "object": "chat.completion",
  "created": 1761710000,
  "model": "Vaidya-v2",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Here are some home remedies..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 45,
    "completion_tokens": 720,
    "total_tokens": 765
  }
}

Response Fields

FieldTypeDescription
idstringUnique response ID
objectstringResponse object type
createdintegerUnix timestamp
modelstringModel used for generation
choicesarrayOne or more generated outputs
choices[].message.rolestringTypically assistant
choices[].message.contentstringFinal generated text
choices[].finish_reasonstringstop, length (when the internal limit is reached), etc.
usageobjectToken usage stats
  • The choices array may contain multiple completions if n > 1 is specified in the request.
  • The completion_tokens field is the sum of reasoning and output tokens generated in this response.
  • The prompt_tokens field counts the input tokens from the messages array.

vLLM-Compatible Extra Parameters

For vLLM-backed deployments, extra sampling and structured generation controls are commonly passed via extra_body (OpenAI client style).

from openai import OpenAI

client = OpenAI(
    api_key="<VAIDYA_API_KEY>",
    base_url="https://api.vaidya.ai/vaidya",
)

resp = client.chat.completions.create(
    model="Vaidya-v2",
    messages=[{"role": "user", "content": "Summarize this in 5 bullets."}],
    extra_body={
        "top_k": 40,
        "min_p": 0.05,
        "repetition_penalty": 1.05,
        "structured_outputs": {
            "choice": ["normal", "abnormal"]
        }
    }
)
print(resp.choices[0].message.content)
{
  "model": "Vaidya-v2",
  "messages": [
    {
      "role": "user",
      "content": "Create a 2-week actionable plan for prediabetes."
    }
  ],
  "temperature": 0.3,
  "top_k": 40,
  "min_p": 0.05,
  "repetition_penalty": 1.05
}

Common vLLM extra parameters: top_k, min_p, repetition_penalty, seed, stop_token_ids, structured_outputs.

Error Responses

Error Shape

{
  "error": {
    "message": "Invalid request: malformed messages payload",
    "type": "invalid_request_error",
    "code": "invalid_messages"
  }
}

Common Errors

HTTP StatusCodeMeaningFix
400invalid_messagesMalformed messages payloadValidate role/content schema
401invalid_api_keyMissing/invalid credentialsSet valid Bearer key
422unprocessable_entityValidation failureFix field types and enums
429rate_limit_exceededToo many requestsRetry with exponential backoff
500internal_server_errorServer issueRetry with request ID logging
503service_unavailableTemporary downtimeRetry after delay

Rate Limiting and Retries

Recommended retry strategy:

  • Retry on 429, 500, 503.
  • Use exponential backoff with jitter.
  • Honor Retry-After header when present.

On this page