commandAGI

The Calibration Problem

We have formal descriptions of what different experiences look like structurally. Joy is characterized by positive valence, high integration, high effective rank, low self-model salience. Suffering by negative valence, high integration, low effective rank. Curiosity by positive valence toward uncertainty with high counterfactual weight and high branch entropy.

These are precise structural claims. But “high integration” means what, numerically? How does integration interact with effective rank to produce the qualitative difference between joy and suffering—both highly integrated, but one expansive and the other collapsed? Where in the structural space does curiosity shade into fear—same high counterfactual weight, opposite valence orientation?

The answers require empirical calibration, for the same reason the equations of motion can't tell you the gravitational constant. You need measurement.

Structure

Joy

V+, Φ↑, r_eff↑, SM↓. Coherent expansion, the self light because the world cooperates. Structurally: many eigenvalues of the state covariance are non-negligible, and the viability gradient is positive along the predicted trajectory.

Structure

Curiosity

V+ toward uncertainty, CF↑ with high branch entropy. The mind reaching toward what it doesn't know. Formally: the expected information gain from future observations is positive, and rollout compute is high with entropy distributed across branches rather than concentrated on one.

Structure

Suffering

V−, Φ↑, r_eff↓. The system unified but trapped. The covariance eigenspectrum is dominated by one or two large eigenvalues—all variance collapsed into a narrow subspace that the system cannot exit because of high causal coupling.

Why Preference Data

There are several approaches to calibration. Neuroimaging gives structural correlates but not the quantities the theory cares about directly. Introspective report is unreliable and culturally contaminated—we learned our emotion concepts from a culture, and we cannot know what we would report without that framework. Behavioral measures are confounded by motor constraints and strategic considerations.

Preference data has four specific advantages.

Relative, Not Absolute

Pairwise comparisons don't require calibrated introspective scales. You don't rate on 1-to-10—you pick which of two alternatives you prefer. This removes most self-report noise. Information-theoretically, each comparison yields one bit of ordinal information about local preference topology.

Pre-Reflective

Forced-choice with sub-second exposure captures response before conscious rationalization. You can confabulate in an essay. You can perform for an experimenter. But the 800ms preference response is closer to the raw geometry.

Structured

Each comparison constrains the preference surface. Adaptively chosen comparisons (maximizing expected information gain) are information-theoretically efficient. ~100 well-chosen comparisons recover a detailed preference surface.

Scalable

Millions of comparisons across thousands of people. The aggregate reveals shared structure—the universal geometry—while individual profiles reveal idiosyncratic variation. The cross-population signal is where universal calibration comes from.

What Calibration Produces

With sufficient data, we can answer specific questions.

The Uncontaminated Measurement Problem

Every human affect report is contaminated. We learned to introspect within a linguistic framework. We cannot know what we would report if we had developed without human language, without human emotion concepts.

Preference data partially escapes this. You're not asked to label your experience with a word from your culture's emotion taxonomy. You're asked to make a choice. The choice reveals structure without requiring articulation.

A person who has no word for “saudade” can still prefer stimuli that evoke it.

This is why forced-choice methods are central. They probe the geometry without passing through the linguistic filter. The topology revealed by choices is closer to the underlying structure than any taxonomy of named emotions.

Cross-Modal Validation

Cross-modal data provides the strongest test of geometric universality. If what someone finds beautiful in images predicts what they find beautiful in sound, the shared structure reflects underlying experiential geometry rather than modality-specific processing.

The platform supports annotation across modalities—images, video, text, code, audio, websites. Each modality probes different structural dimensions. Visual aesthetics probes spatial integration and valence. Music probes temporal dynamics and arousal. Narrative probes counterfactual engagement and self-model salience. Code aesthetics probes structural elegance—a dimension of experience that programmers know intimately but that has received almost no formal study.

Key Takeaway

Cross-modal coherence is the signature of geometric universality. If preferences in one modality predict preferences in another, both are sampling the same underlying experiential structure—not modality-specific processing but something deeper.