OpenAI Fine-Tuning API: Training Custom Models Step by Step
A complete walkthrough of fine-tuning models through the OpenAI API, covering data preparation in JSONL format, file upload, training job creation, evaluation, and deploying your custom model.
Why Fine-Tune Through OpenAI
The OpenAI fine-tuning API lets you train a customized version of GPT-4o-mini, GPT-4o, or other supported models on your own data. The result is a model that behaves the way you want without requiring a long system prompt on every request. OpenAI manages the training infrastructure, GPU allocation, and model hosting. You provide the training data and configuration.
Fine-tuning through the API is particularly valuable when you need consistent output formatting, domain-specific terminology, or a particular reasoning style that few-shot prompting cannot reliably achieve.
Step 1: Prepare Your Training Data
Training data must be in JSONL format where each line is a JSON object containing a messages array. Each message has a role (system, user, or assistant) and content field.
import json
training_examples = [
{
"messages": [
{
"role": "system",
"content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
},
{
"role": "user",
"content": "Patient presents with acute bronchitis."
},
{
"role": "assistant",
"content": "J20.9: Acute bronchitis, unspecified"
}
]
},
{
"messages": [
{
"role": "system",
"content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
},
{
"role": "user",
"content": "Diagnosis: Type 2 diabetes with diabetic chronic kidney disease, stage 3."
},
{
"role": "assistant",
"content": "E11.22: Type 2 diabetes mellitus with diabetic chronic kidney disease\nN18.3: Chronic kidney disease, stage 3 (moderate)"
}
]
},
]
# Write to JSONL
with open("training_data.jsonl", "w") as f:
for example in training_examples:
f.write(json.dumps(example) + "\n")
Step 2: Validate Your Data
Before uploading, validate that every line parses correctly and follows the expected schema. OpenAI provides a data preparation utility, but you can also validate manually.
import json
def validate_training_file(filepath: str) -> dict:
errors = []
valid_count = 0
with open(filepath, "r") as f:
for line_num, line in enumerate(f, 1):
try:
data = json.loads(line)
except json.JSONDecodeError:
errors.append(f"Line {line_num}: Invalid JSON")
continue
if "messages" not in data:
errors.append(f"Line {line_num}: Missing 'messages' key")
continue
messages = data["messages"]
roles = [m.get("role") for m in messages]
if "assistant" not in roles:
errors.append(f"Line {line_num}: No assistant message")
continue
for msg in messages:
if "content" not in msg or not msg["content"].strip():
errors.append(f"Line {line_num}: Empty content in {msg.get('role')}")
continue
valid_count += 1
return {
"total_lines": line_num,
"valid": valid_count,
"errors": errors[:20],
}
result = validate_training_file("training_data.jsonl")
print(f"Valid examples: {result['valid']}/{result['total_lines']}")
Step 3: Upload the Training File
from openai import OpenAI
client = OpenAI()
# Upload training file
training_file = client.files.create(
file=open("training_data.jsonl", "rb"),
purpose="fine-tune",
)
print(f"File ID: {training_file.id}")
# Output: File ID: file-abc123...
# Optionally upload a validation file
validation_file = client.files.create(
file=open("validation_data.jsonl", "rb"),
purpose="fine-tune",
)
Step 4: Create the Fine-Tuning Job
job = client.fine_tuning.jobs.create(
training_file=training_file.id,
validation_file=validation_file.id,
model="gpt-4o-mini-2024-07-18",
hyperparameters={
"n_epochs": 3,
"batch_size": "auto",
"learning_rate_multiplier": "auto",
},
suffix="medical-coder", # Custom name suffix
)
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")
The suffix parameter adds a custom label to your model name, making it easy to identify: ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123.
See AI Voice Agents Handle Real Calls
Book a free demo or calculate how much you can save with AI voice automation.
Step 5: Monitor Training Progress
import time
def monitor_job(client, job_id: str, poll_interval: int = 30):
while True:
job = client.fine_tuning.jobs.retrieve(job_id)
print(f"Status: {job.status}")
if job.status == "succeeded":
print(f"Fine-tuned model: {job.fine_tuned_model}")
return job.fine_tuned_model
if job.status == "failed":
print(f"Error: {job.error}")
return None
# List recent events
events = client.fine_tuning.jobs.list_events(
fine_tuning_job_id=job_id, limit=5
)
for event in events.data:
print(f" [{event.created_at}] {event.message}")
time.sleep(poll_interval)
model_name = monitor_job(client, job.id)
Step 6: Use Your Fine-Tuned Model
Once training succeeds, use the fine-tuned model exactly like any other OpenAI model.
response = client.chat.completions.create(
model=model_name, # ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123
messages=[
{
"role": "system",
"content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
},
{
"role": "user",
"content": "Patient diagnosed with essential hypertension and hyperlipidemia."
},
],
temperature=0.0,
)
print(response.choices[0].message.content)
# I10: Essential (primary) hypertension
# E78.5: Hyperlipidemia, unspecified
Step 7: Evaluate Against the Base Model
Always compare your fine-tuned model against the base model on a held-out test set.
import json
def evaluate_model(client, model: str, test_file: str) -> dict:
correct = 0
total = 0
with open(test_file, "r") as f:
for line in f:
example = json.loads(line)
messages = example["messages"]
expected = messages[-1]["content"]
prompt = messages[:-1]
response = client.chat.completions.create(
model=model,
messages=prompt,
temperature=0.0,
)
predicted = response.choices[0].message.content.strip()
if predicted == expected:
correct += 1
total += 1
return {"model": model, "accuracy": correct / total, "total": total}
base_results = evaluate_model(client, "gpt-4o-mini", "test_data.jsonl")
ft_results = evaluate_model(client, model_name, "test_data.jsonl")
print(f"Base model accuracy: {base_results['accuracy']:.1%}")
print(f"Fine-tuned accuracy: {ft_results['accuracy']:.1%}")
FAQ
How much does fine-tuning cost on the OpenAI API?
Training costs depend on the model and the number of tokens in your training data. For GPT-4o-mini, training costs approximately $3.00 per million tokens. A dataset of 500 examples at 500 tokens each totals about 250K tokens per epoch — roughly $0.75 per epoch. With 3 epochs, that is about $2.25 total for training. Inference on fine-tuned models costs the same as the base model.
How long does a fine-tuning job take?
Most fine-tuning jobs complete in 15 minutes to 2 hours, depending on dataset size and the number of epochs. Smaller datasets with 3 epochs typically finish in under 30 minutes. The OpenAI platform queues jobs, so there may be additional wait time during peak demand.
Can I fine-tune a fine-tuned model further with new data?
Yes. You can use a previously fine-tuned model as the base for a new fine-tuning job. This is useful for iterative improvement — train on your initial dataset, evaluate, then fine-tune again on a curated set of examples where the model performed poorly. Just reference the fine-tuned model ID as the model parameter.
#OpenAI #FineTuning #CustomModels #API #GPT #AgenticAI #LearnAI #AIEngineering
CallSphere Team
Expert insights on AI voice agents and customer communication automation.
Try CallSphere AI Voice Agents
See how AI voice agents work for your industry. Live demo available -- no signup required.