Back to blog
ProductFebruary 2025

What CommandAGI Is

The Platform

CommandAGI collects structured human preference data and returns calibrated taste profiles. You submit stimuli—images, video frames, text, code, audio, designs. We collect preference judgments from human annotators through three complementary modalities. We fit Bradley-Terry models to the resulting pairwise comparison data, producing differentiable functions over stimulus space that predict human preference for novel stimuli.

Stage 1

Elicit

Extract preferences through three modalities. Structured questions (~10) partition preference space coarsely. Reference labels (~30) provide absolute anchoring—where “good enough” sits for this person. Pairwise comparisons (~100, adaptively chosen) give ordinal precision through forced choices.
Stage 2

Version

Taste profiles are versioned artifacts. Each profile has immutable commits, branches for exploring variations, and merges when an experiment works. You can diff two versions to see what changed. Preference evolution is itself data.
Stage 3

Ship

The API exposes calibrated profiles for evaluation. Score any content against any profile. Rank, filter, and select programmatically. Latency is in the millisecond range—the model runs inference, not the annotator.

How Pairwise Comparison Works

pairwise comparison
Landscape AA
Landscape BB

Which image has better composition?

Each comparison is a forced choice: the annotator sees two stimuli and selects the one they prefer. No scales, no deliberation prompts, no verbal justification. The signal we care about is pre-reflective. Comparisons are presented with sub-second stimulus exposure to capture the response that precedes conscious rationalization.

Definition
Bradley-Terry model: Given pairwise outcomes, we fit latent utility scores that maximize the likelihood of observed choices. For items i and j with utilities u_i and u_j, the probability of preferring i is P(i > j) = σ(u_i - u_j). The resulting utility surface over stimulus space is the taste profile—a differentiable function that generalizes to novel stimuli through learned embeddings.

Each comparison yields one bit of ordinal information about the local topology of preference space. A hundred well-chosen comparisons can recover a detailed preference surface. Comparisons are selected adaptively: the system picks the pair that maximizes expected information gain given responses so far, which is equivalent to choosing the pair where the current model is most uncertain.

Applications

The immediate applications are concrete. Automate frame selection for video production—score every frame against a taste profile and extract the best. Filter AI-generated images by aesthetic fit before they reach a human reviewer. Build recommendation systems that curate by taste geometry rather than engagement metrics. Personalize creative tools to individual aesthetic sensibilities with ~100 pairwise comparisons.

The Annotation Marketplace

Preference data collection requires both scale and diversity. The marketplace connects developers who submit content for evaluation with professional annotators who provide preference judgments. Each modality—images, text, code, audio, websites, design files—has a purpose-built interface optimized for rapid, honest evaluation.

Definition
Pricing: 1¢ per frame label, 2¢ per pairwise comparison. Quality is monitored in real time through internal consistency checks (transitivity fraction), test-retest reliability, and inter-annotator agreement on calibration stimuli.

The Deeper Purpose

The data we collect—millions of structured preference judgments across diverse stimuli, modalities, and populations—is also something else: the empirical foundation for calibrating formal models of experiential geometry.

When someone says “I prefer A to B,” they report one bit of ordinal information about their experiential landscape. That bit is noisy but honest—it's harder to confabulate a forced choice than an essay response. Aggregated across sufficient comparisons, the data reveals topology: which experiential states are close to each other, which are far apart, how many independent dimensions of variation exist.

The theoretical frameworks matured first—the equations for integration, valence, effective rank exist. The data to calibrate them didn't. That's the gap we fill.
Key Takeaway
CommandAGI is a preference data platform. The commercial application is taste intelligence—scoring and filtering content against calibrated aesthetic profiles. The research application is mapping the geometry of human experience through structured preference judgments. Both use the same infrastructure.