Skip to main content

tri train — HSLM Training Monitor

Monitor and control HSLM (Hyper-Sparse Language Model) training runs. Supports both local and Railway-hosted training.

Subcommands

CommandArgumentsDescription
tri train status[--json] [--host railway]Live dashboard or JSON status
tri train start[options] [--host railway]Launch training (local or remote)
tri train logs[--host railway]Tail training logs
tri train loss [dir][checkpoint-dir]Parse checkpoint loss curve
tri train diagnose [dir][checkpoint-dir]Auto-diagnose training anomalies
tri train compare <d1> <d2><dir1> <dir2>Side-by-side run comparison
tri train checkpoint list [dir][checkpoint-dir]List checkpoints with metrics

Options for tri train start

OptionDefaultDescription
--steps <N>100000Total training steps
--lr <value>3e-4 (local), 1e-4 (railway)Learning rate
--warmup <N>5000Warmup steps
--batch <N>64Batch size
--optimizer <type>adamwOptimizer: adamw/lamb
--ste <mode>noneSTE mode: none/vanilla/twn/progressive
--wd <value>0.1Weight decay
--checkpoint-dir <path>data/checkpointsCheckpoint directory
--resume <path>Resume from checkpoint
--data <path>data/tinystories/real_tinystories.txtTraining data file
--grad-accum <N>1Gradient accumulation steps
--context <N>81Context length

Examples

tri train status                   # Live training dashboard
tri train start --lr 1e-4 # Start local training
tri train start --host railway # Start Railway training
tri train loss data/checkpoints # Show loss curve
tri train diagnose data/checkpoints # Diagnose issues
tri train compare run1/ run2/ # Compare two runs
tri train checkpoint list # List saved checkpoints

Environment Variables

VariableRequiredDescription
HSLM_OPTIMIZERYes (remote)Optimizer type
HSLM_LRYes (remote)Learning rate
HSLM_LR_SCHEDULEYes (remote)LR schedule (ALWAYS cosine)

Handler

File: src/tri/tri_train.zig