First run command

python forge.py --config configs/classification/mnli/distilbert/00_sft.yaml --max_steps 10

Why this is the first run

  • It is a short smoke test.
  • It verifies that your environment, config loading, data pipeline, and training loop all work together.
  • It uses a small step budget (--max_steps 10) so failures appear quickly.

Command breakdown

  • python forge.py
    • Runs the main Forge training entrypoint.
  • --config configs/classification/mnli/distilbert/00_sft.yaml
  • --max_steps 10
    • Limits training to 10 steps for fast validation.

What should happen if it is healthy

  • Config is parsed successfully.
  • Dataset and tokenizer load without errors.
  • Model initializes correctly.
  • Training starts and runs for 10 steps.
  • Metrics/logs are emitted.
  • Process exits cleanly without runtime errors.

Common failure points

  • Missing environment variables in .env.
  • Conda environment not activated.
  • Dependency mismatch or missing package.
  • Wrong config path.
  • Data path or dataset access issues.

What to do after this passes

  1. Remove or increase --max_steps.
  2. Run the full experiment from the same config.
  3. Monitor metrics and checkpoint outputs.