> For the complete documentation index, see [llms.txt](https://docs.tensorx.ai/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.tensorx.ai/api-reference/streaming.md).

# Streaming

Get responses in real-time as they're generated.

***

## Overview

Streaming delivers the model's response piece by piece instead of waiting for the complete answer. This provides:

* **Better user experience** - Show progress immediately
* **Lower perceived latency** - Users see content within milliseconds
* **Efficient for long responses** - Don't wait for entire generation

***

## Basic Usage

Add `stream: true` to your request:

{% tabs %}
{% tab title="Python" %}

```python
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://api.tensorx.ai/v1"
)

stream = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[{"role": "user", "content": "Write a haiku about coding"}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
```

{% endtab %}

{% tab title="JavaScript" %}

```javascript
import OpenAI from 'openai';

const client = new OpenAI({
  apiKey: 'YOUR_API_KEY',
  baseURL: 'https://api.tensorx.ai/v1'
});

const stream = await client.chat.completions.create({
  model: 'deepseek/deepseek-chat-v3.1',
  messages: [{ role: 'user', content: 'Write a haiku about coding' }],
  stream: true
});

for await (const chunk of stream) {
  const content = chunk.choices[0]?.delta?.content;
  if (content) {
    process.stdout.write(content);
  }
}
```

{% endtab %}

{% tab title="curl" %}

```bash
curl -N https://api.tensorx.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "model": "deepseek/deepseek-chat-v3.1",
    "messages": [{"role": "user", "content": "Write a haiku about coding"}],
    "stream": true
  }'
```

{% endtab %}
{% endtabs %}

***

## Getting Token Usage

To track token usage with streaming, add `stream_options`:

```python
stream = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[{"role": "user", "content": "Hello!"}],
    stream=True,
    stream_options={"include_usage": True}
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="")
    
    # Usage appears in the final chunk
    if chunk.usage:
        print(f"\n\nTokens: {chunk.usage.total_tokens}")
```

{% hint style="info" %}
Token usage is included in the final chunk when `include_usage: true`.
{% endhint %}

***

## Server-Sent Events Format

Under the hood, streaming uses Server-Sent Events (SSE):

```
data: {"id":"chatcmpl-xyz","choices":[{"delta":{"role":"assistant"},"index":0}]}

data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":"Hello"},"index":0}]}

data: {"id":"chatcmpl-xyz","choices":[{"delta":{"content":" there"},"index":0}]}

data: {"id":"chatcmpl-xyz","choices":[{"delta":{},"index":0,"finish_reason":"stop"}],"usage":{"prompt_tokens":10,"completion_tokens":5,"total_tokens":15}}

data: [DONE]
```

Each line:

* Starts with `data:`
* Contains a JSON chunk or `[DONE]`
* `delta` contains the new content piece
* `finish_reason` appears in the final content chunk

***

## Raw HTTP Streaming

If you're not using an SDK:

```python
import requests
import json

response = requests.post(
    'https://api.tensorx.ai/v1/chat/completions',
    headers={
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json'
    },
    json={
        'model': 'deepseek/deepseek-chat-v3.1',
        'messages': [{'role': 'user', 'content': 'Count to 5'}],
        'stream': True
    },
    stream=True  # Important!
)

for line in response.iter_lines():
    if line:
        line = line.decode('utf-8')
        if line.startswith('data: '):
            data = line[6:]  # Remove 'data: ' prefix
            if data == '[DONE]':
                break
            chunk = json.loads(data)
            content = chunk['choices'][0]['delta'].get('content', '')
            if content:
                print(content, end='', flush=True)
```

***

## Accumulating the Full Response

Sometimes you need both streaming display AND the complete text:

```python
full_response = ""

stream = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[{"role": "user", "content": "Explain recursion"}],
    stream=True
)

for chunk in stream:
    content = chunk.choices[0].delta.content
    if content:
        full_response += content
        print(content, end="", flush=True)

# Now you have the complete response
print(f"\n\nTotal length: {len(full_response)} characters")
```

***

## When to Use Streaming

### ✅ Use Streaming For

| Use Case                    | Why                              |
| --------------------------- | -------------------------------- |
| Chat interfaces             | Users see responses immediately  |
| Long-form content           | Articles, stories, documentation |
| Code generation             | See code as it's written         |
| Real-time applications      | Live transcription, assistants   |
| Large outputs (>500 tokens) | Better user experience           |

### ❌ Don't Use Streaming For

| Use Case                      | Why                            |
| ----------------------------- | ------------------------------ |
| Batch processing              | Adds complexity, no UX benefit |
| Background jobs               | No one watching                |
| JSON mode / structured output | Need complete valid JSON       |
| Function calling              | Wait for complete tool\_calls  |
| Short responses (<100 tokens) | Negligible difference          |
| Validation required           | Need full response first       |

***

## Handling Finish Reasons

Check `finish_reason` in the final chunk:

```python
for chunk in stream:
    if chunk.choices[0].finish_reason:
        reason = chunk.choices[0].finish_reason
        
        if reason == "stop":
            print("\n✓ Complete")
        elif reason == "length":
            print("\n⚠️ Truncated (max_tokens reached)")
        elif reason == "content_filter":
            print("\n⚠️ Content filtered")
        elif reason == "tool_calls":
            print("\n→ Function call requested")
```

***

## Error Handling

Streams can fail mid-response. Always handle errors:

```python
try:
    stream = client.chat.completions.create(
        model="deepseek/deepseek-chat-v3.1",
        messages=[{"role": "user", "content": "Hello"}],
        stream=True
    )
    
    for chunk in stream:
        if chunk.choices[0].delta.content:
            print(chunk.choices[0].delta.content, end="")
            
except Exception as e:
    print(f"\n\nStream error: {e}")
    # Optionally retry without streaming
```

***

## Streaming with Function Calling

When using tools, the model may return tool calls. These come all at once (not streamed character by character):

```python
stream = client.chat.completions.create(
    model="deepseek/deepseek-chat-v3.1",
    messages=[{"role": "user", "content": "What's the weather?"}],
    tools=tools,
    stream=True
)

tool_calls = []

for chunk in stream:
    delta = chunk.choices[0].delta
    
    # Accumulate tool calls
    if delta.tool_calls:
        for tc in delta.tool_calls:
            # Tool calls stream in pieces too
            if tc.index >= len(tool_calls):
                tool_calls.append({"id": "", "function": {"name": "", "arguments": ""}})
            if tc.id:
                tool_calls[tc.index]["id"] = tc.id
            if tc.function:
                if tc.function.name:
                    tool_calls[tc.index]["function"]["name"] = tc.function.name
                if tc.function.arguments:
                    tool_calls[tc.index]["function"]["arguments"] += tc.function.arguments
```

{% hint style="info" %}
For simpler code, consider using non-streaming requests when function calling is involved.
{% endhint %}

***

## See Also

* [Chat Completions](/api-reference/chat-completions.md) - Full API reference
* [Function Calling](/api-reference/function-calling.md) - Using tools
* [Rate Limits](/api-reference/rate-limits.md) - Request limits


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.tensorx.ai/api-reference/streaming.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
