OpenAI Chat Completions API Deep Dive: Messages, Roles, and Parameters
Understand the message format, system/user/assistant roles, temperature, max_tokens, top_p, and other parameters that control OpenAI chat completion behavior.
The Anatomy of a Chat Completion Request
Every interaction with OpenAI's chat models goes through the Chat Completions API. Understanding how messages, roles, and parameters work together is essential for getting consistent, high-quality outputs from your applications. This post breaks down every component you need to master.
Message Roles Explained
The messages array is the core of every request. Each message has a role and content:
flowchart TD
START["OpenAI Chat Completions API Deep Dive: Messages, …"] --> A
A["The Anatomy of a Chat Completion Request"]
A --> B
B["Message Roles Explained"]
B --> C
C["Building Multi-Turn Conversations"]
C --> D
D["Key Parameters"]
D --> E
E["Practical Parameter Combinations"]
E --> F
F["FAQ"]
F --> DONE["Key Takeaways"]
style START fill:#4f46e5,stroke:#4338ca,color:#fff
style DONE fill:#059669,stroke:#047857,color:#fff
from openai import OpenAI
client = OpenAI()
messages = [
{"role": "system", "content": "You are a senior Python developer who writes concise, production-ready code."},
{"role": "user", "content": "Write a function to validate email addresses."},
{"role": "assistant", "content": "Here is a robust email validator using regex..."},
{"role": "user", "content": "Now add support for checking MX records."},
]
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
)
Here is what each role does:
- system — Sets the assistant's personality, behavior, and constraints. Processed first and given special weight. Use it for instructions that should persist across the entire conversation.
- user — Messages from the human. These are the questions, prompts, and inputs.
- assistant — Previous responses from the model. Including these creates multi-turn conversations.
Building Multi-Turn Conversations
The API is stateless. You must send the full conversation history with each request:
flowchart TD
ROOT["OpenAI Chat Completions API Deep Dive: Messa…"]
ROOT --> P0["Key Parameters"]
P0 --> P0C0["temperature and top_p"]
P0 --> P0C1["max_tokens"]
P0 --> P0C2["stop sequences"]
P0 --> P0C3["n — Multiple Completions"]
ROOT --> P1["FAQ"]
P1 --> P1C0["Should I always include a system messag…"]
P1 --> P1C1["What happens when the conversation exce…"]
P1 --> P1C2["Is temperature=0 truly deterministic?"]
style ROOT fill:#4f46e5,stroke:#4338ca,color:#fff
style P0 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
style P1 fill:#e0e7ff,stroke:#6366f1,color:#1e293b
conversation = [
{"role": "system", "content": "You are a helpful math tutor. Show your work step by step."},
]
def chat(user_message: str) -> str:
conversation.append({"role": "user", "content": user_message})
response = client.chat.completions.create(
model="gpt-4o",
messages=conversation,
)
assistant_message = response.choices[0].message.content
conversation.append({"role": "assistant", "content": assistant_message})
return assistant_message
print(chat("What is the derivative of x^3 + 2x?"))
print(chat("Now integrate the result."))
Each call sends the growing conversation list, so the model sees the full context.
Key Parameters
temperature and top_p
These control randomness. Use one or the other, not both simultaneously:
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
# Deterministic output — great for code generation, data extraction
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=0.0,
)
# Creative output — good for brainstorming, creative writing
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
temperature=1.2,
)
temperature ranges from 0 to 2. At 0, the model is nearly deterministic. At higher values, outputs become more varied and creative.
max_tokens
Limits the length of the generated response:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
max_tokens=500, # cap response at 500 tokens
)
# Check if the response was cut off
if response.choices[0].finish_reason == "length":
print("Warning: response was truncated")
stop sequences
Tell the model to stop generating when it encounters specific strings:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "List 5 Python web frameworks, one per line."}],
stop=["6."], # stop before a 6th item
)
n — Multiple Completions
Generate multiple responses in a single request:
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
n=3,
temperature=0.8,
)
for i, choice in enumerate(response.choices):
print(f"Option {i + 1}: {choice.message.content}")
Practical Parameter Combinations
| Use Case | temperature | max_tokens | Notes |
|---|---|---|---|
| Code generation | 0.0 | 2000 | Deterministic, longer output |
| Classification | 0.0 | 10 | Short, consistent labels |
| Creative writing | 1.0 | 1000 | Varied, expressive |
| Summarization | 0.3 | 300 | Slightly varied but focused |
FAQ
Should I always include a system message?
It is not required, but strongly recommended. Without a system message, the model uses a generic helpful assistant persona. A well-crafted system message dramatically improves consistency and output quality.
What happens when the conversation exceeds the model's context window?
The API returns an error if total tokens (messages + response) exceed the model's limit. You need to implement conversation trimming — removing older messages or summarizing them to stay within the token budget.
Is temperature=0 truly deterministic?
Nearly, but not perfectly. OpenAI has noted that identical requests may occasionally produce slightly different outputs due to floating-point computation differences across their infrastructure. For most practical purposes, temperature=0 is effectively deterministic.
#OpenAI #ChatCompletions #APIParameters #Python #LLM #AgenticAI #LearnAI #AIEngineering
Written by
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.