End-to-end model training
ML Training Pipeline
Train reward models, run SFT, DPO, and PPO pipelines, and evaluate results -- all from preference data collected on the platform. Managed GPU infrastructure means no ops overhead.
Key Capabilities
Reward Model Training
Train reward models directly from pairwise comparison data. The platform handles data preprocessing, train/val splits, checkpoint selection, and evaluation against held-out gold sets.
SFT, DPO & PPO
Run supervised fine-tuning, direct preference optimization, or proximal policy optimization pipelines. Configure hyperparameters through the API or dashboard and monitor loss curves in real time.
Managed GPU Infrastructure
Training runs execute on A100 and H100 GPUs provisioned on demand. No cloud accounts to manage. Jobs queue automatically and scale down when complete to minimize cost.
Evaluation & Versioning
Every training run produces versioned model artifacts with evaluation metrics. Compare runs side-by-side, promote the best model to production, and roll back if alignment regresses.
Usage
curl -X POST https://api.commandagi.com/v1/training/reward-model \
-H "Authorization: Bearer $COMMANDAGI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"profile_id": "prof_main",
"base_model": "meta-llama/Llama-3-8B",
"data": {
"comparison_min": 5000,
"val_split": 0.1
},
"compute": {
"gpu": "A100-40GB",
"max_hours": 4
},
"callbacks": {
"on_complete": "https://your-app.com/hooks/training-done"
}
}'