SFT (Supervised Fine-Tuning) means continuing to train a pretrained model on labeled task data.

Core idea

  • Start with a pretrained model.
  • Use labeled input-output pairs for a target task.
  • Optimize a supervised objective (for classification, usually cross-entropy).

Objective

  • Transfer general language knowledge into task-specific behavior.
  • Improve task metrics with relatively small additional training.

Typical training signal

  • Input: task examples.
  • Target: human-provided labels.
  • Loss: direct comparison between model prediction and label.

Generic workflow

pretrained model
  -> task dataset (labeled)
  -> supervised optimization
  -> fine-tuned model for the target task

In this project (MNLI)

  • Base model: DistilBERT model.
  • Task: MNLI sentence-pair classification.
  • Input: premise + hypothesis.
  • Labels: entailment, neutral, contradiction.
  • Output: a 3-way classifier tuned for NLI behavior.

Why SFT is useful

  • Simple and stable optimization.
  • Clear objective and evaluation.
  • Strong baseline before trying more advanced paradigms.

Limitations

  • Limited by label quality and label coverage.
  • Can overfit narrow data distributions.
  • Usually does not encode preference-style behavior by itself.

When to use

  • You have reliable labeled data.
  • The task objective is explicit and measurable.
  • You need a strong, fast baseline.