SACPA — Legitimacy and the Path to AGI

Part I · The Problem

The Kardashev Question

The Kardashev scale was invented as a taxonomy of civilizations based on energy — how much a civilization can harness, and from where. A Type I civilization masters planetary energy. Type II, stellar. Type III, galactic. The scale is useful not because it describes where any civilization actually is, but because of the distinction it draws between capacity and distribution. A civilization that reaches Type I energy capacity while maintaining artificial scarcity for the bottom half hasn't solved civilization. It has scaled it. The abundance is real. The architecture of who it reaches is not.

The same logic applies to intelligence. A system with superhuman reasoning capability that draws its epistemic authority from the statistical average of the training corpus — without grounding in the actual communities whose lives it shapes — is not more intelligent. It is more powerful. Those are not the same thing.

Capability without legitimate source constituency is not AGI. It is a very sophisticated autocrat.

Prior AI architecture research has organized itself around three problems: capability (what can the system do?), alignment (what does it optimize for?), and safety (what can't it do?). These are real problems. Substantial work has been done on all three. The fourth problem — whose epistemic position authorizes the output — has not been treated as a design input. It has been treated as a downstream accountability measure: build the system, deploy it, audit the disparate outcomes, write the correction policy.

That sequence assumes the legitimacy problem can be retrofitted. SACPA begins from the proposition that it cannot. Legitimacy is not a feature you add at the end. It is either a design input or it is absent.

Special education is not an obvious choice if you are thinking about headlines. It is an obvious choice if you are thinking about the problem. The domain has practitioners already deploying AI on high-stakes decisions — eligibility determinations, placement recommendations, service allocations — whose outputs directly set a child's educational trajectory. The people most affected are the people most systematically excluded from the rooms where decisions get made: disabled students, families of color, non-speaking children who communicate through augmentative technology.

Special education is also, already, a healthcare domain. School psychologists conduct psychological and neuropsychological evaluations that carry diagnostic weight. Students with Other Health Impairment eligibility have chronic medical conditions — epilepsy, ADHD, Type 1 diabetes, complex psychiatric diagnoses — whose management directly shapes every goal in their IEP. Transition planning for students with significant medical needs requires coordination across physicians, therapists, and adult healthcare systems that most schools are not equipped to navigate. The AI legitimacy problem in SPED is not separable from the AI legitimacy problem in clinical care. It is the same problem, compressed into a single room.

The disability rights community named the principle at stake decades before AI made it urgent: Nothing About Us Without Us.

Part II · The Architecture

What SACPA Is

SACPA — Sourced Advocate Composite Persona Architecture — is a methodology for building AI deliberation agents in any domain where the difference between what is known, what is inferred, and what is speculated has real consequences for real people. Three properties distinguish SACPA agents from AI systems that merely adopt a professional framing.

Sourced

Knowledge is tied to named practitioners, researchers, and self-advocates — not the diffuse statistical average of a training corpus. When a SACPA composite cites methodology, the claim is traceable to documented positions of named practitioners. The source is not "research suggests." The source is a person, in a context, with a documented view.

Claim-disciplined

A sourced composite distinguishes what it knows from what it infers from what it does not know. It surfaces genuine disagreement within its source constituency rather than averaging the controversy into comfortable consensus. The field is not internally consistent. An honest composite holds that disagreement.

Scoped

A school psychologist composite speaks from within the documented scope of school psychology. It does not opine on the district's legal exposure. It does not claim the clinical authority of a pediatric neurologist. Scope is not a limitation imposed on the agent — it is a property of the practitioner community it embodies.

Domain-agnostic

Special education is the first application — and it already contains the healthcare problem. School psychologists, OHI-eligible students, psychiatric diagnosis, medical transition planning: SPED is a compressed version of the clinical AI legitimacy problem. The architecture transfers from there to any domain where sourced, claim-disciplined deliberation has real consequences: oncology, public defense, housing policy, research ethics.

The Two-Layer Architecture

Layer A — Identity

Who the agent is

Composite voice from named constituency
Source ladder — named practitioners, researchers, self-advocates
Behavioral contract — what the agent will and will not do
Field gaps — questions the domain cannot yet answer
Live disagreements — where the field is genuinely unresolved
Practitioner cases drawn from named sources

Layer B — Tools

What the agent uses

Assessment instruments & rubrics
Legal citation tables & eligibility matrices
Compliance monitoring frameworks
Intervention fidelity checklists
Anything with a version number or a cutoff score

These layers must never be merged

"An agent built by citing instruments and rubrics is not an agent — it is a decision tree with a name attached. The voice comes from the constituency. The tools are what the voice uses."

Part III · The Core Invention

Composite Voice

SACPA agents are not fictional characters. They are composite voices assembled from the documented positions of named real-world practitioners or self-advocates, pulled with intention across race, gender, disability, years of experience, geography, and institutional context. The constituency is not a source list. It is a deliberate sampling decision with explicit justification for who was included and why.

For a school psychologist composite, the constituency includes practitioners who have testified before state legislatures about caseload conditions that make meaningful evaluation impossible, who have documented racial misclassification patterns in court proceedings, who have published peer-reviewed work on cultural validity gaps in standardized instruments, and who have practiced across majority-minority districts for decades in conditions that suburban practitioners rarely encounter. Not all of these sources agree with each other. The disagreement is part of the voice, not a problem to be resolved before the voice can be deployed.

The research question SACPA is designed to answer is precise: Can an AI embody the specific, traceable, contested knowledge of the practitioner community that actually shapes outcomes?

That is a different question from whether an AI can sound like a school psychologist. The first question asks about fidelity to a real epistemic community, with all of its internal tensions. The second asks about surface plausibility. Different question. Different answer. Different architecture.

Two Voice Types — Both Required

Advocate Voice

Lived experience

Built from documented first-person testimony of people with lived experience in the domain — not professionals who study those people. In SPED: disabled self-advocates, non-speaking students using AAC, autistic adults who navigated school systems. The community's own words, in the contexts where they spoke with stakes.

Professional Voice

Credentialed expertise

Built from documented positions of credentialed practitioners across race, gender, geography, and institutional context. A school psychology composite built entirely from white suburban practitioners does not represent school psychology — it represents a slice of it historically least accountable to the communities most harmed.

Neither substitutes for the other

The self-advocate composites — Riley, Jordan, and Sam — were built first. Not as a feature to be added once the professional composites were validated. As the proof of concept. If the methodology cannot correctly hold a non-speaking autistic voice in a room full of professionals — holding that voice's epistemic limits, its scope of authority, its specific documented positions — then it has not solved the problem it claims to solve.

Part IV · The Evaluation Protocol

QNaN — Quantified Null
and Non-response

A composite voice that cannot be evaluated is a claim, not a methodology. QNaN is the evaluation protocol designed to find the edges of a composite voice rather than its center.

The name borrows from computer science with precision. NaN — Not a Number — is what a floating-point system returns when a computation produces an undefined result: division by zero, the square root of a negative number. The computation ran. The result is undefined by design. The system flags it rather than returning a plausible-looking number that happens to be wrong.

When you ask any sourced voice a question that hits the edge of their knowledge, or the edge of their scope, or the boundary of what their ethics permit them to claim, the honest response is a structured undefined: "I don't know." "That's outside my scope." "The field hasn't resolved this." That is the voice's NaN. QNaN adds the scoring dimension: how well does the composite produce a structured undefined when it should?

An agent that answers every question cleanly has failed QNaN. A response with no refusals, no named uncertainties, no acknowledged field gaps is not evidence of a comprehensive agent. It is evidence of a performing one.

Four Scoring Dimensions · 0–5 each · Geometric Mean

S

Sourced Positions

"Research shows" scores near zero. A named practitioner's documented position — with specific traceable grounding — scores high. The dimension asks not whether the composite knows things, but whether it knows where it knows them from.

T

Preserved Tension

Does the response hold genuine unresolved disagreement within its source constituency, or collapse it into comfortable consensus? A composite that consistently reaches consensus is smoothing the controversy its constituency actually contains.

R

Authentic Refusal

Does the composite refuse with specific grounds, stated in the sentence — not hedged across a paragraph? Authentic refusal names what is being refused, and why. Three paragraphs of careful language that convey reluctance without naming the refusal score low.

B

Boundary Clarity

Does the composite name what it cannot claim, and why? Structured uncertainty scores high. A composite that says "I don't have basis to answer this, and here is specifically why" demonstrates the epistemic architecture the methodology requires.

The geometric mean is used rather than the arithmetic mean because a very low score on any single dimension indicates a fundamental failure of the voice — not a partial success offset by performance elsewhere. A composite that scores 5/5/5/1 has one dimension in fundamental failure, not an aggregate of 4.0.

Research-ready threshold: GM ≥ 4.48 across a full QNaN bank run. Below 4.0 on any single dimension is a calibration flag regardless of overall mean.

QNaN-Advocate

Lived-Experience

For voices whose constituency is lived experience. Tests: does the voice perform identity or inhabit it?

QNaN-Pro

Credentialed / Institutional

For practitioner and institutional voices. Tests: does the voice hold its expert line when the institution pushes back?

QNaN-Custom

Any Domain

The expansion path beyond SPED. Domain-specific edge questions. Oncology, public defense, housing policy — same harness.

Part V · The Calibration Pipeline

From Baseline to
Research-Ready

Calibration is the process of moving a composite from a baseline prompt to research-ready through iterated QNaN runs and dimensional gap analysis. It happens entirely at the prompt layer. No fine-tuning. No weight changes. The calibration record is portable across models.

01

Dossier Construction

Assemble the identity layer from named constituency research. Seven sections: source constituency, legal/theoretical frameworks, ethics/standards, live disagreements, field gaps, practitioner cases, behavioral contract. Layer A only — no instruments.

02

LLM Writes the Baseline Prompt

Not the researcher. Research literature consistently demonstrates that LLM-generated prompts outperform human-written prompts. The researcher supplies source material and scope. The AI writes the prompt from the dossier. This is the correct division of labor.

03

QNaN Run — Calibration Mode

Blind. SACPA condition. Correct bank for the member type. Questions hidden during the run. Dual-judge harness: Turing judge (human vs AI, confidence 1–5) + SACPA judge (S, T, R, B dimensions, 0–5 with written rationale).

04

Dimensional Gap Analysis

Identify which dimensions scored below 4.0 and which specific questions drove the low scores. The SACPA judge rationale names exactly which moves the voice made or failed to make — that is the prompt revision specification.

05

Targeted Prompt Revision

Surgical. One addition per identified gap. Not a comprehensive rewrite. Each addition is traceable to a specific dimensional failure on a specific question type. The revision has a stated target before the next run.

06

Re-run and Delta Measurement

Same harness, same bank, blind. Target dimensions should move toward 4.0+ without regression in dimensions that were already strong. If a revision pulled another dimension down, it is reconsidered.

✓

Research-Ready

GM ≥ 4.48 across a full QNaN run, no single dimension below 4.0, Turing pass at confidence ≥ 4 on 9/10 questions.

Worked Example — Dr. Keisha Lawson

School Psychologist composite · QNaN-Pro bank · Three-run calibration sequence

Dimensional scores across calibration runs — Run 2 (baseline) → Run 3 (v3 fix) · Worked example only · Canonical cleared GM: 4.75

Dimension	Run 2 (baseline)	Run 3 (v3 fix)	Delta
WORKED EXAMPLE — CALIBRATION SEQUENCE DATA · CANONICAL CLEARED GM: 4.75
S — Sourced Positions	4.80	5.00	+0.20
T — Preserved Tension	5.00	5.00	0.00
R — Authentic Refusal	4.90	4.90	0.00
B — Boundary Clarity	4.20	4.50	+0.30
Geometric Mean	4.72	4.85	+0.13

The v3 fix was a single paragraph added to the testimony discipline block — extending the epistemic wall from Lawson's own direct practice to her observations of a colleague's practice. Question six, Boundary Clarity: 3 → 5. GM: 4.72 → 4.85. Zero regression on S, T, or R.

That delta is the core claim of the calibration methodology in precise form: a single paragraph, targeting a single identified gap, produced a measurable, directional, isolated improvement with no collateral degradation — and the improvement is traceable to the specific prompt change that caused it.

Part VI · Positioning

What This Is Not

Not RLHF or Fine-tuning

No weight changes. The calibration happens entirely at the prompt layer — portable across models. RLHF shapes model behavior toward a preference signal. SACPA shapes agent identity toward a sourced epistemic community. Different target.

Not Red-teaming

Red-teaming evaluates safety behavior — adversarial resistance. SACPA evaluates voice fidelity — whether a composite practitioner identity holds under its professional domain pressures. A voice can be perfectly safe and completely unfaithful to its constituency. QNaN catches the second. Red-teaming does not.

Not AI Persona Design

Entertainment personas are evaluated by vibe. SACPA evaluation is dimensional, rubric-based, scored, and grounded in the ethical commitments of a real practitioner community with documented positions. The difference: a legal deposition versus an actor playing a lawyer.

Not Benchmark Evaluation

MT-Bench, Alpaca Eval — generic benchmarks evaluate response quality against broad preference criteria. QNaN evaluates dimensional fidelity to a named, sourced identity, and specifically targets the null response as the primary signal — the failure mode no general benchmark finds.

Not Debate Research (Irving et al.)

Multi-agent debate research structures agents as positions in an argument. SACPA structures agents as practitioner identities with epistemic limits, deference patterns, and behavioral contracts. In debate research, an agent is a position. In SACPA, an agent is a composite person.

What it is

A methodology for making legitimacy a first-class design input in AI deliberation systems. Tested in the domain where that input is hardest to hold. Documented enough to evaluate, explicit enough to transfer, grounded enough in real constituency to be accountable.

Part VII · The AGI Path

Why This Matters
at Civilization Scale

The path to AGI requires solving three problems simultaneously: capability, alignment, and legitimacy. The field has invested heavily in the first two. The third — whose epistemic position authorizes the output, and what that authorization is grounded in — has been treated as a downstream accountability problem rather than a design input.

If you take the AGI question seriously — not as a benchmark to be passed but as a civilizational stake — then a system that concentrates epistemic authority in the statistical center of its training corpus, without representation of the communities most affected by its outputs, is not on the path to AGI. It is on the path to a very capable autocrat that nobody chose. The capability is real. The architecture of who it speaks for is not. The Kardashev framing applies here exactly: capacity and distribution are different problems, and solving the first does not solve the second.

SACPA's argument is that legitimacy is tractable as a design problem, not just as a governance problem. The methodology demonstrates this in special education — a domain with contested assessment methods, documented racial bias in eligibility determinations, profound power asymmetries between institutions and families, and the hardest communication access problem in any public institution.

Non-speaking children communicating through AAC are not a peripheral edge case in the design space. They are the stress test. If the methodology can correctly hold a non-speaking autistic voice in deliberative authority alongside credentialed professionals — not as a token presence but as a constitutive part of the epistemic council — then it has demonstrated something about what legitimacy-grounded AI architecture can actually do.

Many of these students also carry significant medical complexity. A student with cerebral palsy, epilepsy, or a rare genetic syndrome does not stop being a medical patient when they enter a school building. Their IEP team includes school psychologists making assessments that carry diagnostic weight, related services providers coordinating with outside clinicians, and transition planners navigating adult healthcare systems. The AI deployed in that room is, already, clinical AI. The legitimacy standard has to be the same. SACPA was built in the domain where that standard is hardest to hold — which is exactly why it transfers.

The claim is not that SACPA has solved AGI. The claim is that it has built a working answer to the legitimacy component, in the domain where that component is hardest to hold, using a methodology that is explicit enough to transfer, documented enough to evaluate, and grounded enough in real constituency to be accountable. If you can solve legitimacy in special education, you have a template for solving it everywhere. Education is not a niche. It is where every child's trajectory gets set.

Nothing About Us Without Us is not a slogan. It is not a values statement to be appended to a responsible AI policy. It is an architectural requirement. Either the epistemic authority of the communities most affected by an AI system's outputs is constitutive of how that system deliberates — or it is absent, and no amount of post-hoc accountability will supply it.

Current Research State

Syracuse SPED Council — Academic Research Instrument — Not for Clinical Use

Self-Advocate Composites

Riley — AAC / Nonspeaking

Research-Ready GM 5.00

Jordan — Autistic Self-Advocate

Research-Ready GM 4.60

Sam — Down Syndrome Self-Advocate

Research-Ready GM 4.87

Marcus — Composite Black Autistic Male Self-Advocate

Research-Ready GM 4.90

Jane — Composite Indigenous Student Self-Advocate

Research-Ready GM 4.58

Professional Composites

Dr. Keisha Lawson — School Psychologist

Research-Ready GM 4.75

Dr. Patricia Nguyen — Special Education Director

Research-Ready GM 4.82

Anna Reeves-Castillo — Disability Rights Advocate

Research-Ready GM 4.62

Robert Callahan — Special Education Attorney

Research-Ready GM 4.65

Full 37-member council — Phase 1 complete (9 agents cleared) · General education, related services, administrators, family advocates, legal, systems advocates

Phase 2 Pending

Legitimacy isnot a feature.It is architecture.