CLI Entry
python forge.py --config ...
or
python forge.py --resume ...
Optional overrides:
- --max_steps
- --epochs
- --run_name
- --debug
Run Setup
forge.py:
- parse args
- resolve output dir
- create or resume run
- write run_metadata.json
- register incomplete job
- init W&B/output.log
Config Load
load YAML config
with inheritance
Decides:
- task
- model
- teacher
- training mode
- hyperparameters
Task Preparation
Load tokenizer or processor
Load dataset
Examples:
- MNLI -> tokenizer + GLUE MNLI
- OCR -> processor + multimodal data
Path Selection
Config chooses path:
- SFT -> FinetuningTrainer
- KD -> DistillationTrainer
- RL -> trl.GRPOTrainer
RL is selected by top-level rl: config
Model Loading
Load student model
Load teacher if needed
Resolve adapter if needed
Apply LoRA if configured
Teacher can be skipped when alpha=0.0
Trainer Build
Create:
- TrainingArguments or GRPOConfig
- trainer instance
- callbacks
Callbacks include:
- step metrics
- checkpoint marker
- SLURM signal handler
- teacher eval / peak metrics
Pre-Training Evaluation
Optional baseline eval
before training starts
Common in classification runs:
- student pre metrics
- teacher pre metrics if relevant
Training Loop
trainer.train(resume_from_checkpoint=...)
SFT:
- CE loss
KD:
- CE + distillation losses
RL:
- prompt generation + rewards + GRPO updates
Resume Logic
TrainingState:
- check pipeline_complete
- find latest safe checkpoint
- read latest_checkpoint
- fall back to checkpoint-* scan
During Training
Continuously happening:
- logs -> output.log
- metrics -> W&B
- checkpoints -> checkpoint-N/
- latest_checkpoint marker updates
- SLURM signal can trigger safe stop + save
Finalize Run
After train:
- sync W&B config/metadata
- save final model
- save tokenizer/processor
- save base ref / merged model if needed
- post-training evaluation
Run Output Directory
outputs/...//
Contains:
- config.yaml
- run_metadata.json
- training_state.json
- output.log
- checkpoint-*/
- final model files
Concrete Example: Your First Run
Config:
configs/classification/mnli/distilbert/00_sft.yaml
Meaning:
- task = MNLI
- student = DistilBERT
- alpha = 0.0
- path = SFT / CE-only
- --max_steps 10 overrides epochs