SFT (Supervised Fine-Tuning) means continuing to train a pretrained model on labeled task data.
Core idea
- Start with a pretrained model.
- Use labeled input-output pairs for a target task.
- Optimize a supervised objective (for classification, usually cross-entropy).
Objective
- Transfer general language knowledge into task-specific behavior.
- Improve task metrics with relatively small additional training.
Typical training signal
- Input: task examples.
- Target: human-provided labels.
- Loss: direct comparison between model prediction and label.
Generic workflow
pretrained model
-> task dataset (labeled)
-> supervised optimization
-> fine-tuned model for the target taskIn this project (MNLI)
- Base model: DistilBERT model.
- Task: MNLI sentence-pair classification.
- Input: premise + hypothesis.
- Labels: entailment, neutral, contradiction.
- Output: a 3-way classifier tuned for NLI behavior.
Why SFT is useful
- Simple and stable optimization.
- Clear objective and evaluation.
- Strong baseline before trying more advanced paradigms.
Limitations
- Limited by label quality and label coverage.
- Can overfit narrow data distributions.
- Usually does not encode preference-style behavior by itself.
When to use
- You have reliable labeled data.
- The task objective is explicit and measurable.
- You need a strong, fast baseline.