I've fine-tuned models. I've optimized prompts. I've done both extensively in production. Here's what nobody tells you: the cost comparison isn't even close.
The Prompting Path
Prompting is essentially free. You're just paying for API calls. Let's say:
- 100k prompts/month
- Average 1k tokens per prompt
- $3/1M input tokens (Claude 3.5 Sonnet)
- $15/1M output tokens
Monthly cost: ~$450/month
That's it. No training infrastructure, no dataset preparation, no fine-tuning expertise required.
The Fine-Tuning Path
Fine-tuning has multiple cost components:
Dataset preparation: Cleaning, formatting, quality control. Maybe 40 hours at $100/hour = $4,000 one-time.
Training compute: Let's say you fine-tune a fine-tuned model. ~$500-2000 per training run.
Inference costs: Fine-tuned models often need more tokens for good outputs. Let's say 20% more.
Maintenance: When models update, you often need to retrain. This is recurring.
Monthly cost: ~$2,000-5,000/month (once you're up and running)
The Quality Question
But what about quality? Doesn't fine-tuning produce better results?
Sometimes. Here's the honest breakdown:
Prompting wins when:
- The base model already knows the domain
- You have limited training data
- You need flexibility to change behavior
- Speed of iteration matters
Fine-tuning wins when:
- You have large amounts of domain-specific data
- The task is narrowly defined
- You need consistent style/format
- Latency matters (smaller models can match larger ones)
When We Use Each
At Orochi:
- Most tasks: prompting. The base models are incredibly capable.
- Specific tone/style: prompting with extensive examples.
- Domain knowledge: fine-tuning only when we have significant proprietary data.
The ratio is probably 90% prompting, 10% fine-tuning.
The Break-Even Point
Fine-tuning makes sense when:
- You have >10k high-quality examples
- The task is stable (won't change frequently)
- The cost savings from using a smaller model offset training costs
Before fine-tuning, run the math. For most use cases, prompting is cheaper and produces comparable results.
The Bottom Line
Fine-tuning is powerful. It's not always necessary.
The AI industry has a fine-tuning obsession because it's technically interesting. But the practical answer is often simpler: write a better prompt.
Test prompting first. Only fine-tune when you have evidence prompting can't scale.
The expensive solution isn't always the better one.