forge runtime flow

CLI Entry

python forge.py --config ...
or
python forge.py --resume ...

Optional overrides:

--max_steps
--epochs
--run_name
--debug

Run Setup

forge.py:

parse args
resolve output dir
create or resume run
write run_metadata.json
register incomplete job
init W&B/output.log

Config Load

load YAML config
with inheritance

Decides:

task
model
teacher
training mode
hyperparameters

Task Preparation

Load tokenizer or processor
Load dataset

Examples:

MNLI -> tokenizer + GLUE MNLI
OCR -> processor + multimodal data

Path Selection

Config chooses path:

SFT -> FinetuningTrainer
KD -> DistillationTrainer
RL -> trl.GRPOTrainer

RL is selected by top-level rl: config

Model Loading

Load student model
Load teacher if needed
Resolve adapter if needed
Apply LoRA if configured

Teacher can be skipped when alpha=0.0

Trainer Build

Create:

TrainingArguments or GRPOConfig
trainer instance
callbacks

Callbacks include:

step metrics
checkpoint marker
SLURM signal handler
teacher eval / peak metrics

Pre-Training Evaluation

Optional baseline eval
before training starts

Common in classification runs:

student pre metrics
teacher pre metrics if relevant

Training Loop

trainer.train(resume_from_checkpoint=...)

SFT:

CE loss

KD:

CE + distillation losses

RL:

prompt generation + rewards + GRPO updates

Resume Logic

TrainingState:

check pipeline_complete
find latest safe checkpoint
read latest_checkpoint
fall back to checkpoint-* scan

During Training

Continuously happening:

logs -> output.log
metrics -> W&B
checkpoints -> checkpoint-N/
latest_checkpoint marker updates
SLURM signal can trigger safe stop + save

Finalize Run

After train:

sync W&B config/metadata
save final model
save tokenizer/processor
save base ref / merged model if needed
post-training evaluation

Run Output Directory

outputs/...//

Contains:

config.yaml
run_metadata.json
training_state.json
output.log
checkpoint-*/
final model files

Concrete Example: Your First Run

Config:
configs/classification/mnli/distilbert/00_sft.yaml

Meaning:

task = MNLI
student = DistilBERT
alpha = 0.0
path = SFT / CE-only
--max_steps 10 overrides epochs

Notes

Explorer

forge runtime flow

Graph View