What Is Fine-tuning?
Fine-tuning is the process of taking a pre-trained large language model and continuing its training on a smaller, curated dataset specific to a domain, task, or behavioral style. The model's weights are updated — but from a much better starting point than random initialization — allowing it to adapt efficiently with far less compute and data than pre-training from scratch.
The result is a model that retains the broad capabilities of the base model while excelling at the specific task you care about.
Fine-tuning vs. Prompting
Before committing to fine-tuning, it is worth asking whether prompt engineering can achieve the same goal:
| Approach | Effort | Flexibility | Cost per request |
|---|---|---|---|
| Prompt engineering | Low | High | Higher (large prompts) |
| Few-shot prompting | Medium | Medium | Medium |
| Fine-tuning | High (upfront) | Lower | Lower (shorter prompts) |
Fine-tuning is usually warranted when:
- You need consistent formatting or tone that prompts alone cannot enforce reliably.
- You have hundreds of labeled examples demonstrating the desired input/output behavior.
- Latency or cost is sensitive and you need shorter system prompts.
- Your domain vocabulary is highly specialized and underrepresented in the base model.
Types of Fine-tuning
Supervised Fine-tuning (SFT)
The most common form. You provide a dataset of (prompt, ideal_response) pairs and train the model to maximize the likelihood of the ideal response. Used to teach models new tasks or styles.
Instruction Tuning
A variant of SFT where the dataset is composed of diverse natural language instructions. This is how base models (e.g., Llama base) become instruction-following chat models.
RLHF (Reinforcement Learning from Human Feedback)
A second stage that uses human preference data to further align the model. See the RLHF glossary entry for details.
Parameter-Efficient Fine-tuning (PEFT)
Techniques like LoRA (Low-Rank Adaptation) that freeze most model weights and only train small adapter matrices. Dramatically reduces GPU memory requirements:
from peft import get_peft_model, LoraConfig
config = LoraConfig(r=16, lora_alpha=32, target_modules=["q_proj", "v_proj"])
model = get_peft_model(base_model, config)
Preparing a Fine-tuning Dataset
Data quality matters far more than quantity. A clean dataset of 500 examples typically outperforms a noisy dataset of 5,000.
Steps:
- Define the task clearly — what input goes in, what output should come out.
- Collect or generate examples — human-written, synthetically generated via GPT-4, or mined from logs.
- Clean and deduplicate — remove duplicates, fix formatting inconsistencies, filter low-quality samples.
- Format for your provider — OpenAI fine-tuning expects JSONL with
messagesarrays; Hugging Face expects prompt/response strings.
KnowledgeSDK can accelerate dataset creation. Use /v1/extract to pull structured knowledge from documentation, product pages, or support articles, then use that clean content to generate high-quality synthetic training examples.
const extracted = await sdk.extract("https://docs.yourproduct.com");
// Use extracted.content to generate fine-tuning pairs with GPT-4
When Not to Fine-tune
Fine-tuning does not make a model more knowledgeable about recent events — it encodes behaviors and patterns, not factual knowledge. For knowledge retrieval tasks, Retrieval-Augmented Generation (RAG) is almost always faster to iterate on, easier to update, and less prone to overfitting.
Use fine-tuning for how the model responds; use RAG for what it knows.