CLI Entry

python forge.py --config ...
or
python forge.py --resume ...

Optional overrides:

  • --max_steps
  • --epochs
  • --run_name
  • --debug

Run Setup

forge.py:

  • parse args
  • resolve output dir
  • create or resume run
  • write run_metadata.json
  • register incomplete job
  • init W&B/output.log

Config Load

load YAML config
with inheritance

Decides:

  • task
  • model
  • teacher
  • training mode
  • hyperparameters

Task Preparation

Load tokenizer or processor
Load dataset

Examples:

  • MNLI -> tokenizer + GLUE MNLI
  • OCR -> processor + multimodal data

Path Selection

Config chooses path:

  • SFT -> FinetuningTrainer
  • KD -> DistillationTrainer
  • RL -> trl.GRPOTrainer

RL is selected by top-level rl: config

Model Loading

Load student model
Load teacher if needed
Resolve adapter if needed
Apply LoRA if configured

Teacher can be skipped when alpha=0.0

Trainer Build

Create:

  • TrainingArguments or GRPOConfig
  • trainer instance
  • callbacks

Callbacks include:

  • step metrics
  • checkpoint marker
  • SLURM signal handler
  • teacher eval / peak metrics

Pre-Training Evaluation

Optional baseline eval
before training starts

Common in classification runs:

  • student pre metrics
  • teacher pre metrics if relevant

Training Loop

trainer.train(resume_from_checkpoint=...)

SFT:

  • CE loss

KD:

  • CE + distillation losses

RL:

  • prompt generation + rewards + GRPO updates

Resume Logic

TrainingState:

  • check pipeline_complete
  • find latest safe checkpoint
  • read latest_checkpoint
  • fall back to checkpoint-* scan

During Training

Continuously happening:

  • logs -> output.log
  • metrics -> W&B
  • checkpoints -> checkpoint-N/
  • latest_checkpoint marker updates
  • SLURM signal can trigger safe stop + save

Finalize Run

After train:

  • sync W&B config/metadata
  • save final model
  • save tokenizer/processor
  • save base ref / merged model if needed
  • post-training evaluation

Run Output Directory

outputs/...//

Contains:

  • config.yaml
  • run_metadata.json
  • training_state.json
  • output.log
  • checkpoint-*/
  • final model files

Concrete Example: Your First Run

Config:
configs/classification/mnli/distilbert/00_sft.yaml

Meaning:

  • task = MNLI
  • student = DistilBERT
  • alpha = 0.0
  • path = SFT / CE-only
  • --max_steps 10 overrides epochs