Quality Control
Ensure high-quality annotations through built-in quality measures, agreement checks, and annotator performance tracking.
Why Quality Matters
Taste profiles are only as good as the data used to train them. Low-quality annotations introduce noise and can lead to inconsistent or unreliable scoring.
Common Quality Issues
- Random clicking: Annotators submitting without looking
- Inconsistency: Same annotator giving different answers to similar items
- Misunderstanding: Annotators not understanding the task
- Gaming: Optimizing for speed over accuracy
Built-in Quality Measures
Gold Questions
Inject pre-labeled "gold" items that have known correct answers. Annotators who fail gold questions are flagged for review.
Setting: gold_question_rate = 0.1 (10% of items)
Threshold: min_gold_accuracy = 0.8 (80% correct)
Inter-Annotator Agreement
Have multiple annotators label the same items and measure agreement. High disagreement may indicate ambiguous items or low-quality annotators.
Setting: overlap_rate = 3 (3 annotators per item)
Metric: Krippendorff's alpha or Cohen's kappa
Response Time Monitoring
Track how long annotators spend on each item. Suspiciously fast responses (under 1 second) often indicate random clicking.
Minimum time: 1.5 seconds for labels, 2.5 for comparisons
Action: Auto-flag responses under minimum
Consistency Checks
Re-show items that were previously labeled and check if the annotator gives the same answer. Inconsistent annotators may not be paying attention.
Setting: consistency_check_rate = 0.05 (5% retest)
Threshold: min_self_agreement = 0.85 (85%)
Annotator Performance Tracking
We track annotator performance metrics across all their work to identify and promote high-quality annotators.
| Metric | Description | Target |
|---|---|---|
| Gold Accuracy | Percentage of gold questions answered correctly | >80% |
| Agreement Rate | How often they agree with other annotators | >70% |
| Self-Consistency | Agreement with their own previous answers | >85% |
| Skip Rate | Percentage of items skipped | <20% |
| Avg Response Time | Average time spent per item | 2-30 sec |
Handling Bad Annotations
Automatic Rejection
Annotations that fail quality checks (fast responses, failed gold questions) are automatically rejected and not charged to your budget.
Majority Voting
When using overlap, the final label is determined by majority vote. Outlier annotations are discarded. Annotators who frequently disagree with majority are flagged.
Annotator Blocking
You can block specific annotators from your projects if they consistently provide low-quality work. Blocked annotators cannot participate in any of your projects.
Manual Review
Download annotations for manual review. Flag suspicious patterns and report to us for platform-wide action.
Configuring Quality Settings
Configure quality settings per project to balance quality with speed and cost.
curl -X PATCH https://api.commandAGI.com/api/projects/proj_abc123 \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"qualitySettings": {
"goldQuestionRate": 0.1,
"minGoldAccuracy": 0.8,
"overlapCount": 3,
"minResponseTimeMs": 1500,
"consistencyCheckRate": 0.05,
"minSelfAgreement": 0.85,
"autoRejectFastResponses": true,
"useMajorityVoting": true
}
}'Best Practices
Start with High Overlap
Begin with 3-5x overlap to identify high-quality annotators, then reduce overlap for trusted annotators.
Provide Clear Instructions
Many quality issues stem from unclear task definitions. Include examples of good and bad labels in your project description.
Create Good Gold Questions
Gold questions should have unambiguous answers. If experts disagree on a gold question, it's not a good gold question.
Review Early, Review Often
Check the first 100 annotations manually to catch issues early. Adjust instructions or gold questions if you see systematic problems.
Pay Fair Rates
Low payment rates attract low-effort annotators. Higher rates attract professionals who take pride in their work.