OpenAI Fine-Tuning API: Training Custom Models Step by Step

Why Fine-Tune Through OpenAI

The OpenAI fine-tuning API lets you train a customized version of GPT-4o-mini, GPT-4o, or other supported models on your own data. The result is a model that behaves the way you want without requiring a long system prompt on every request. OpenAI manages the training infrastructure, GPU allocation, and model hosting. You provide the training data and configuration.

Fine-tuning through the API is particularly valuable when you need consistent output formatting, domain-specific terminology, or a particular reasoning style that few-shot prompting cannot reliably achieve.

Step 1: Prepare Your Training Data

Training data must be in JSONL format where each line is a JSON object containing a messages array. Each message has a role (system, user, or assistant) and content field.

import json

training_examples = [
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
            },
            {
                "role": "user",
                "content": "Patient presents with acute bronchitis."
            },
            {
                "role": "assistant",
                "content": "J20.9: Acute bronchitis, unspecified"
            }
        ]
    },
    {
        "messages": [
            {
                "role": "system",
                "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
            },
            {
                "role": "user",
                "content": "Diagnosis: Type 2 diabetes with diabetic chronic kidney disease, stage 3."
            },
            {
                "role": "assistant",
                "content": "E11.22: Type 2 diabetes mellitus with diabetic chronic kidney disease\nN18.3: Chronic kidney disease, stage 3 (moderate)"
            }
        ]
    },
]

# Write to JSONL
with open("training_data.jsonl", "w") as f:
    for example in training_examples:
        f.write(json.dumps(example) + "\n")

Step 2: Validate Your Data

Before uploading, validate that every line parses correctly and follows the expected schema. OpenAI provides a data preparation utility, but you can also validate manually.

import json

def validate_training_file(filepath: str) -> dict:
    errors = []
    valid_count = 0

    with open(filepath, "r") as f:
        for line_num, line in enumerate(f, 1):
            try:
                data = json.loads(line)
            except json.JSONDecodeError:
                errors.append(f"Line {line_num}: Invalid JSON")
                continue

            if "messages" not in data:
                errors.append(f"Line {line_num}: Missing 'messages' key")
                continue

            messages = data["messages"]
            roles = [m.get("role") for m in messages]

            if "assistant" not in roles:
                errors.append(f"Line {line_num}: No assistant message")
                continue

            for msg in messages:
                if "content" not in msg or not msg["content"].strip():
                    errors.append(f"Line {line_num}: Empty content in {msg.get('role')}")
                    continue

            valid_count += 1

    return {
        "total_lines": line_num,
        "valid": valid_count,
        "errors": errors[:20],
    }

result = validate_training_file("training_data.jsonl")
print(f"Valid examples: {result['valid']}/{result['total_lines']}")

Step 3: Upload the Training File

from openai import OpenAI

client = OpenAI()

# Upload training file
training_file = client.files.create(
    file=open("training_data.jsonl", "rb"),
    purpose="fine-tune",
)
print(f"File ID: {training_file.id}")
# Output: File ID: file-abc123...

# Optionally upload a validation file
validation_file = client.files.create(
    file=open("validation_data.jsonl", "rb"),
    purpose="fine-tune",
)

Step 4: Create the Fine-Tuning Job

job = client.fine_tuning.jobs.create(
    training_file=training_file.id,
    validation_file=validation_file.id,
    model="gpt-4o-mini-2024-07-18",
    hyperparameters={
        "n_epochs": 3,
        "batch_size": "auto",
        "learning_rate_multiplier": "auto",
    },
    suffix="medical-coder",  # Custom name suffix
)
print(f"Job ID: {job.id}")
print(f"Status: {job.status}")

The suffix parameter adds a custom label to your model name, making it easy to identify: ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123.

See AI Voice Agents Handle Real Calls

Book a free demo or calculate how much you can save with AI voice automation.

Book a Demo ROI Calculator

Step 5: Monitor Training Progress

import time

def monitor_job(client, job_id: str, poll_interval: int = 30):
    while True:
        job = client.fine_tuning.jobs.retrieve(job_id)
        print(f"Status: {job.status}")

        if job.status == "succeeded":
            print(f"Fine-tuned model: {job.fine_tuned_model}")
            return job.fine_tuned_model

        if job.status == "failed":
            print(f"Error: {job.error}")
            return None

        # List recent events
        events = client.fine_tuning.jobs.list_events(
            fine_tuning_job_id=job_id, limit=5
        )
        for event in events.data:
            print(f"  [{event.created_at}] {event.message}")

        time.sleep(poll_interval)

model_name = monitor_job(client, job.id)

Step 6: Use Your Fine-Tuned Model

Once training succeeds, use the fine-tuned model exactly like any other OpenAI model.

response = client.chat.completions.create(
    model=model_name,  # ft:gpt-4o-mini-2024-07-18:your-org:medical-coder:abc123
    messages=[
        {
            "role": "system",
            "content": "You are a medical coding assistant. Output ICD-10 codes in the format CODE: DESCRIPTION."
        },
        {
            "role": "user",
            "content": "Patient diagnosed with essential hypertension and hyperlipidemia."
        },
    ],
    temperature=0.0,
)
print(response.choices[0].message.content)
# I10: Essential (primary) hypertension
# E78.5: Hyperlipidemia, unspecified

Step 7: Evaluate Against the Base Model

Always compare your fine-tuned model against the base model on a held-out test set.

import json

def evaluate_model(client, model: str, test_file: str) -> dict:
    correct = 0
    total = 0

    with open(test_file, "r") as f:
        for line in f:
            example = json.loads(line)
            messages = example["messages"]
            expected = messages[-1]["content"]
            prompt = messages[:-1]

            response = client.chat.completions.create(
                model=model,
                messages=prompt,
                temperature=0.0,
            )
            predicted = response.choices[0].message.content.strip()
            if predicted == expected:
                correct += 1
            total += 1

    return {"model": model, "accuracy": correct / total, "total": total}

base_results = evaluate_model(client, "gpt-4o-mini", "test_data.jsonl")
ft_results = evaluate_model(client, model_name, "test_data.jsonl")

print(f"Base model accuracy: {base_results['accuracy']:.1%}")
print(f"Fine-tuned accuracy: {ft_results['accuracy']:.1%}")

FAQ

How much does fine-tuning cost on the OpenAI API?

Training costs depend on the model and the number of tokens in your training data. For GPT-4o-mini, training costs approximately $3.00 per million tokens. A dataset of 500 examples at 500 tokens each totals about 250K tokens per epoch — roughly $0.75 per epoch. With 3 epochs, that is about $2.25 total for training. Inference on fine-tuned models costs the same as the base model.

How long does a fine-tuning job take?

Most fine-tuning jobs complete in 15 minutes to 2 hours, depending on dataset size and the number of epochs. Smaller datasets with 3 epochs typically finish in under 30 minutes. The OpenAI platform queues jobs, so there may be additional wait time during peak demand.

Can I fine-tune a fine-tuned model further with new data?

Yes. You can use a previously fine-tuned model as the base for a new fine-tuning job. This is useful for iterative improvement — train on your initial dataset, evaluate, then fine-tune again on a curated set of examples where the model performed poorly. Just reference the fine-tuned model ID as the model parameter.

#OpenAI #FineTuning #CustomModels #API #GPT #AgenticAI #LearnAI #AIEngineering