What CommandAGI Is
The Platform
CommandAGI collects structured human preference data and returns calibrated taste profiles. You submit stimuli—images, video frames, text, code, audio, designs. We collect preference judgments from human annotators through three complementary modalities. We fit Bradley-Terry models to the resulting pairwise comparison data, producing differentiable functions over stimulus space that predict human preference for novel stimuli.
Elicit
Version
Ship
How Pairwise Comparison Works
A
BWhich image has better composition?
Each comparison is a forced choice: the annotator sees two stimuli and selects the one they prefer. No scales, no deliberation prompts, no verbal justification. The signal we care about is pre-reflective. Comparisons are presented with sub-second stimulus exposure to capture the response that precedes conscious rationalization.
P(i > j) = σ(u_i - u_j). The resulting utility surface over stimulus space is the taste profile—a differentiable function that generalizes to novel stimuli through learned embeddings.Each comparison yields one bit of ordinal information about the local topology of preference space. A hundred well-chosen comparisons can recover a detailed preference surface. Comparisons are selected adaptively: the system picks the pair that maximizes expected information gain given responses so far, which is equivalent to choosing the pair where the current model is most uncertain.
Applications
The immediate applications are concrete. Automate frame selection for video production—score every frame against a taste profile and extract the best. Filter AI-generated images by aesthetic fit before they reach a human reviewer. Build recommendation systems that curate by taste geometry rather than engagement metrics. Personalize creative tools to individual aesthetic sensibilities with ~100 pairwise comparisons.
The Annotation Marketplace
Preference data collection requires both scale and diversity. The marketplace connects developers who submit content for evaluation with professional annotators who provide preference judgments. Each modality—images, text, code, audio, websites, design files—has a purpose-built interface optimized for rapid, honest evaluation.
The Deeper Purpose
The data we collect—millions of structured preference judgments across diverse stimuli, modalities, and populations—is also something else: the empirical foundation for calibrating formal models of experiential geometry.
When someone says “I prefer A to B,” they report one bit of ordinal information about their experiential landscape. That bit is noisy but honest—it's harder to confabulate a forced choice than an essay response. Aggregated across sufficient comparisons, the data reveals topology: which experiential states are close to each other, which are far apart, how many independent dimensions of variation exist.
The theoretical frameworks matured first—the equations for integration, valence, effective rank exist. The data to calibrate them didn't. That's the gap we fill.