First run command
python forge.py --config configs/classification/mnli/distilbert/00_sft.yaml --max_steps 10
Why this is the first run
- It is a short smoke test.
- It verifies that your environment, config loading, data pipeline, and training loop all work together.
- It uses a small step budget (
--max_steps 10) so failures appear quickly.
Command breakdown
python forge.py
- Runs the main Forge training entrypoint.
--config configs/classification/mnli/distilbert/00_sft.yaml
- Selects the experiment config:
--max_steps 10
- Limits training to 10 steps for fast validation.
What should happen if it is healthy
- Config is parsed successfully.
- Dataset and tokenizer load without errors.
- Model initializes correctly.
- Training starts and runs for 10 steps.
- Metrics/logs are emitted.
- Process exits cleanly without runtime errors.
Common failure points
- Missing environment variables in
.env.
- Conda environment not activated.
- Dependency mismatch or missing package.
- Wrong config path.
- Data path or dataset access issues.
What to do after this passes
- Remove or increase
--max_steps.
- Run the full experiment from the same config.
- Monitor metrics and checkpoint outputs.