The short version
A model like Claude or GPT-4 is trained on a massive, general dataset. It can do many things reasonably well. But if you need it to do one thing very well, consistently, in a specific style or domain, you have two options: prompt engineering (writing better instructions) or fine-tuning (additional training).
Prompt engineering is faster, cheaper, and good enough for most use cases. Fine-tuning is for when prompting alone can't get you there.
How it works
Fine-tuning starts with a pre-trained base model and trains it further on a smaller, specialised dataset. The process:
- Prepare training data. This is usually a set of example input-output pairs. "Given this customer email, here's the ideal response." The quality and consistency of this data is the single biggest factor in results.
- Run the training job. The model's weights are adjusted to perform better on your examples. This typically takes minutes to hours depending on the dataset size, and costs tens to hundreds of pounds.
- Evaluate the result. Test the fine-tuned model against held-out examples to check whether it actually improved.
- Deploy. The fine-tuned model gets its own identifier and you call it through the same API as the base model.
Important distinctions:
- Fine-tuning vs prompting: prompting gives instructions at runtime. Fine-tuning changes the model itself. A fine-tuned model doesn't need the instructions because it's already learned the pattern.
- Fine-tuning vs RAG: RAG gives the model access to external information at query time. Fine-tuning changes how the model behaves. They solve different problems and can be combined.
- Fine-tuning vs training from scratch: fine-tuning adjusts an existing model. Training from scratch builds one from nothing and requires vastly more data, compute, and money.
When fine-tuning makes sense: you have a consistent format or style you need the model to match, you have hundreds or thousands of examples, and prompting alone produces inconsistent results.
When it doesn't: you need the model to know new facts (use RAG instead), you have fewer than 50 examples, or prompting is already getting good results.
Why it matters
Most people building with AI will never need to fine-tune a model. Prompting and RAG cover the vast majority of use cases. But understanding what fine-tuning is, and when it's the right tool versus overkill, helps you make better decisions about how to build. It's one of those concepts that's worth understanding even if you never use it yourself.