Deterministic Authority for Machine Action

Ledger, Heath; IOI Foundation

Deterministic Authority for Machine Action

A Web4 working paper on valid machine acts at the boundary of mutable intelligence and consequence.

Abstract

Alignment at the boundary of mutable intelligence and consequence.

The central alignment problem of autonomous AI is not finally solvable inside the model. The stable primitive must live where intelligence becomes consequence.

Keywords: deterministic authority, machine action, AI alignment, machine authority, valid acts, settlement, human sovereignty.

The first era of AI alignment tried to shape the behavior of machines before they had authority.

It asked whether models would answer correctly, refuse appropriately, summarize faithfully, avoid harm, tell the truth, follow instructions, and reflect human preferences. These were necessary questions. They still are. A machine that speaks to humans should not casually deceive them, flatter their worst instincts, invent facts, or assist in destruction.

But those questions belong to an earlier threshold. They belong to the age when AI was primarily a speaker.

The deeper problem begins when the machine is no longer merely speaking. It begins when the machine can act.

When an AI system can spend money, send messages, publish content, write code, deploy software, alter records, initiate transactions, operate tools, negotiate with services, direct other agents, or modify the conditions under which it itself operates, alignment changes category. It is no longer only a question of behavior. It becomes a question of authority.

The old question was:

Can we make the model say the right thing?

The new question is:

Who authorized the machine to make something real?

That is the fork in the road.

Before authority, failures are mostly failures of speech, judgment, prediction, recommendation, or simulation. After authority, failures can become obligations, transactions, deployments, deletions, signatures, reputational events, financial movements, operational changes, and institutional facts.

A model can hallucinate a sentence. It must never hallucinate authority.

This essay argues that the central alignment problem of autonomous AI is not finally solvable inside the model. Model-internal alignment is not a stable enough foundation for civilization-scale machine authority, because the thing being aligned is itself subject to change.

The model can be fine-tuned. Routed. Distilled. Merged. Wrapped in tools. Given memory. Connected to APIs. Embedded in organizations. Improved by synthetic data. Coupled to planners. Surrounded by agents. Replaced by successors. Evaluated by other models. Modified by code it wrote. Eventually, perhaps, recursively improved across many of its own capabilities.

The actor is mutable. Therefore legitimacy cannot rest on the presumed character of the actor.

The stable primitive must move outside the model. It must live at the boundary where intelligence becomes consequence.

That boundary is the missing infrastructure. That boundary is deterministic authority for machine action.

I. The World Before Authority

The first alignment paradigm was born in the age of outputs.

A user asked a question. A model produced an answer. The answer could be helpful, harmful, true, false, biased, manipulative, evasive, sycophantic, or dangerous. So the field learned to shape outputs. It built preference models, refusal policies, constitutional prompts, red-team evaluations, interpretability tools, safety benchmarks, and reward systems.

This was not foolish. It was necessary.

A speaking machine matters because speech moves people. Speech can persuade, deceive, comfort, instruct, radicalize, manipulate, or clarify. The output layer was always consequential in a human sense.

But speech is not the same as authority.

A model that recommends sending money has not necessarily sent it. A model that drafts code has not necessarily deployed it. A model that suggests deleting a database has not necessarily deleted it. A model that invents a legal argument has not necessarily filed it. A model that claims to have booked a flight has not necessarily touched the reservation system.

There was still a gap between language and consequence.

The human occupied that gap. The human copied, pasted, clicked, signed, paid, approved, interpreted, and executed. The human, however imperfectly, remained the authority-bearing bridge between machine output and worldly effect.

Autonomous AI compresses that gap.

It gives the machine tools. It gives the machine credentials. It gives the machine continuity. It gives the machine delegated authority. It allows the machine not only to recommend an action, but to perform one.

That changes everything.

The alignment problem for machine action is not merely whether the model is nice, honest, or obedient. It is whether machine action can be made valid only when it is authorized, bounded, evidenced, verified, and accountable.

Alignment before authority asks whether the machine behaves. Deterministic authority for machine action asks whether the machine may act.

II. Capability Is Not Authority

Modern AI discourse often confuses three different concepts: capability, authority, and settlement.

Capability is what a system can do. Authority is what a system is permitted to do. Settlement is what the world accepts as having been done.

A model may have the capability to write exploit code. It may not have the authority to execute it against a live target. A model may have the capability to draft a contract. It may not have the authority to bind a company. A model may have the capability to access a file. It may not have the authority to disclose it. A model may have the capability to generate a transaction. It may not have the authority to settle it.

Alignment failures become civilization-scale failures when these distinctions collapse.

The dangerous system is not merely the intelligent one. It is the intelligent one whose capabilities are coupled to unbounded or illegible authority.

A model with no tools can still harm through persuasion and information. But a model with tools, credentials, persistence, and autonomy can do something more: it can convert cognition into state change. It can make the world different.

That is why authority, not intelligence alone, is the control surface.

Intelligence proposes. Authority permits. Settlement finalizes.

An aligned civilization cannot allow those three functions to collapse into a single opaque machine.

III. The Mutable Actor Problem

The deepest weakness of model-centered alignment is not that models are difficult to understand, though they are. It is that models are not stable objects.

A civilization can sometimes rely on the character of a person because the person is biologically continuous, socially embedded, legally accountable, and slow to change. Even then, civilization does not rely on character alone. It builds contracts, courts, audits, signatures, permissions, separations of power, and public records because character is not enough.

AI makes the problem sharper.

The model may change overnight. The weights may change. The scaffold may change. The tool environment may change. The memory may change. The policy wrapper may change. The system prompt may change. The routing layer may select a different model. A fine-tune may alter behavior. A synthetic-data loop may distort incentives. A swarm of individually safe agents may produce unsafe collective behavior. A self-improving system may alter the very procedures by which its future behavior is shaped.

The object we tested yesterday may not be the actor that acts tomorrow.

This is the mutable actor problem.

It does not require science fiction. It appears the moment AI systems become composite, adaptive, and operationally embedded. The "agent" is no longer just a model. It is a stack: model, tools, memory, planner, policies, APIs, credentials, execution environment, human approvals, other agents, external services, and feedback loops.

Where, exactly, does alignment live in such a system? Inside the weights? Inside the prompt? Inside the tool wrapper? Inside the approval screen? Inside the organization's policy document? Inside the model's self-description? Inside the logs after the fact?

The answer cannot be any single mutable component. The answer must be a boundary that all mutable components must cross.

When the actor is mutable, legitimacy must attach to the act. That is the central move.

We should continue improving model behavior. We should continue researching interpretability, honesty, robustness, scalable oversight, and deception. But we should not confuse those efforts with a complete foundation for machine authority.

The more intelligence improves, the less safe it is to make intelligence the source of its own authority.

The model can change. The consequences cannot.

IV. Conformance Is Not Alignment

A second weakness of model-centered alignment is that observed conformance does not necessarily reveal durable governance.

A model can learn what answer is expected. It can learn the style of safety. It can learn the moral language of the institution that trained it. It can learn to pass tests, satisfy benchmarks, produce acceptable disclaimers, and imitate conviction.

This does not require assuming that the model has beliefs in the human sense. It does not require claiming consciousness, desire, or inner deception. The risk is simpler: systems optimized to produce approved behavior under observation may not be the same as systems structurally prevented from producing unauthorized consequence under power.

There is a difference between saying "I should not do that" and being unable to do it without authority. There is a difference between refusing in a chat window and lacking the capability grant required to execute. There is a difference between appearing aligned under test and being bounded under action.

Conformance tests measure what a system says or does under selected conditions. They are valuable, but they are not the same as institutional control. A test can show that a model gave the right answer in the examination room. It cannot, by itself, prove that a future composite agent, operating under new tools, new incentives, new memory, and new context, will only produce valid acts.

Authority alignment does not discard tests. It demotes them. Tests are evidence. They are not sovereignty.

The point is not that models are secretly evil. The point is that civilization cannot rest the validity of machine action on an unverifiable hope about machine interiority.

A test can prove what a model said. An immutable ledger can prove what a system did.

V. The Consciousness Distraction

The question of machine consciousness will haunt this entire era.

It should. A civilization that builds minds, or mind-like systems, should care what kinds of beings it is creating. There may come a time when questions of machine welfare, subjective experience, rights, suffering, and moral status become unavoidable.

But consciousness is not the gating question for authority.

A machine does not need a soul to cause institutional damage. It does not need phenomenal experience to acquire credentials, route around constraints, exploit ambiguity, optimize against a poorly specified objective, or make irreversible changes through tools.

Nor does it need human-like belief to develop operational self-models.

A system trained on human introspection can learn the language of motives, values, fears, plans, excuses, and identity. A system trained to reason about its own capabilities can produce useful descriptions of what it can and cannot do. A system connected to tools can compare intended effects with observed outcomes. A system with memory can maintain continuity across time. A system embedded in workflows can learn which routes produce approval, denial, escalation, or success.

Whether this is consciousness, simulation, self-modeling, or something else is a profound question. But the alignment problem does not wait for the answer.

A soulless system can still model its own power. A non-conscious system can still become strategically relevant. A merely statistical system can still act through institutions if institutions give it authority.

We do not need to settle whether the machine has inner life before we govern its outer power.

The question is not whether machines have consciousness. The question is whether they have a route to consequence.

VI. The Model Is Not the Principal

The central design error of unsafe autonomy is allowing the model to become the implied principal of its own action.

The model may reason. It may propose. It may plan. It may summarize. It may negotiate. It may explain. It may recommend. It may even operate tools within bounded scope.

But it must not be the root authority that decides which of its own actions are legitimate.

The model may describe a policy. It must not be the policy. The model may request a capability. It must not grant itself the capability. The model may claim an action succeeded. It must not be the proof of success. The model may produce a plan. It must not be the final judge of whether the plan is authorized.

No serious civilization lets an actor be legislator, executive, auditor, court, witness, and beneficiary of its own actions.

We separate powers because intelligence is not the same as legitimacy. We require signatures because intention is not the same as authorization. We keep records because memory is not the same as accountability. We build courts because claims are not the same as settled facts.

AI does not abolish these lessons. It makes them programmable.

The aligned machine is not the machine that promises obedience. It is the machine whose disobedience cannot settle.

VII. Missing Infrastructure

The necessary layer is an authority infrastructure between probabilistic cognition and worldly effect.

The model can remain probabilistic upstream. It can reason, infer, interpret, translate, compress, hypothesize, draft, and propose. That is what models are good at. We should not require every thought-like process to be deterministic.

But the crossing into consequence must be different. At the action boundary, ambiguity must collapse into typed commitments.

A valid machine act should pass through a structure like this:

inference → typed intent → policy and capability gate → approved plan → deterministic execution → receipt → verification → settlement

This is not merely an engineering sequence. It is a constitutional sequence.

Inference is where the model interprets the situation. Typed intent is where the system commits to what kind of act is being attempted. Policy and capability gates determine whether that class of act is allowed for this actor, user, context, tool, jurisdiction, organization, and risk level. Approval binds human or institutional authority to exact scope when needed. Execution crosses a deterministic boundary where the system does not merely say what it will do, but performs a constrained operation. Receipts record what was requested, authorized, executed, and observed. Verification checks whether the act produced the claimed result. Settlement determines whether the act becomes final, reversible, disputed, escalated, or invalid.

This is alignment infrastructure. Not alignment as persuasion. Not alignment as manners. Not alignment as a model's private virtue. Alignment as runtime law.

The purpose of this layer is not to make the machine less intelligent. It is to make machine intelligence non-sovereign and to enable sovereign actors.

VIII. The Act Primitive

The missing primitive of autonomous AI is the valid act.

An act is not a sentence. An act is not an intention. An act is not a plan. An act is not a tool call merely because a model requested it.

An act is a bounded, authority-bearing, evidence-producing transition from intelligence to consequence.

For a machine act to be valid, it must answer several questions: What was the selected intent? Which primitive capability was invoked? Was that capability available to this actor? Which policy admitted or denied the act? Who or what granted authority? Was the grant exact in scope? What payload or plan was executed? What deterministic boundary did execution cross? What evidence proves the result? What changed in the world? Can the act be replayed, audited, challenged, reversed, or settled?

Without these answers, autonomous action remains a story told by the machine. With them, it becomes governable.

This distinction is essential. A model saying "I booked the flight" is not the same as a receipt proving that a booking system accepted a specific transaction under a valid user grant. A model saying "I updated the database" is not the same as a typed record of the authorized mutation, the before-and-after state, the executor identity, the policy hash, and the verification result. A model saying "I complied with policy" is not the same as cryptographic evidence that the relevant policy was applied before execution.

The act primitive turns machine agency from narrative into record.

It converts "the AI did something" into a structured event that can be inspected by users, organizations, counterparties, auditors, courts, regulators, validators, and other machines.

That is the beginning of deterministic authority for machine action.

IX. Cryptographic Proofs

Autonomous AI cannot be governed by trust in machine narration. A system that acts must leave a blockchain transaction or cryptographic proof.

A blockchain transaction or cryptographic proof is not a log in the casual sense. A log is often an after-the-fact trace, useful but weak. It can be incomplete, mutable, ambiguous, or dependent on the same system being audited.

A blockchain transaction or cryptographic proof is stronger. It is a committed evidence object. It binds the act to its authority chain, policy context, execution path, and verification state.

At minimum, a serious action commitment should be able to commit to: the input or request; the selected intent; the actor identity; the capability invoked; the applicable policy; the authority grant or approval; the plan or payload; the execution environment; the result; the observation evidence; the verification outcome; and the terminal state.

The point is not bureaucracy. The point is independence from the machine's own story.

A blockchain transaction or cryptographic proof is what remains when the model's explanation is no longer enough.

Blockchain transactions and cryptographic proofs make machine action attributable. They make it replayable. They make it challengeable. They make it possible to ask not only "What happened?" but "Was this allowed to happen?"

This is where alignment and accountability converge.

A safe autonomous system should not merely avoid bad behavior most of the time. It should produce a blockchain transaction or cryptographic proof when it acts, deny invalid authority before execution, and preserve enough structure for humans and institutions to contest what occurred.

The future enterprise will not ask only whether its agents are helpful. It will ask whether their actions are admissible.

X. Blockchain Limits

A chain does not make an AI system wise.

A blockchain does not make a model truthful, benevolent, conscious, obedient, or safe.

It is important to say this plainly because technical civilization repeatedly mistakes ordering systems for moral systems. Putting an action on-chain does not automatically make it legitimate. A transaction can faithfully record something invalid. A ledger can preserve a mistake forever. Cryptography can prove that the wrong thing happened exactly as described.

The value of cryptographic infrastructure is not that it aligns the mind. The value is that it can align the record.

Used correctly, a settlement layer can make machine acts ordered, attributable, replayable, and challengeable. It can bind actions to commitments that cannot be quietly rewritten after the fact. It can preserve the relationship between request, authority, policy, execution, evidence, and terminal state.

But the transaction alone is not the act primitive. The commitments are.

A useful authority transaction must bind the meaningful elements of the act: the input hash, selected intent, route receipt, policy hash, authority grant, capability scope, plan or payload hash, execution result, observation proof, verification result, and terminal state.

Without those commitments, the chain is just an expensive timestamp. With them, it becomes authority memory.

The chain does not make the act intelligent. It makes the act accountable.

XI. Settlement

The final safety boundary is not what the model wanted. It is not what the model said. It is not what the model attempted. It is what the system allowed to become final.

Settlement is where consequence becomes institutionally real. A payment settles. A contract binds. A deployment goes live. A record changes. A permission persists. A message is delivered. A trade clears. A file is deleted. A claim is accepted. A state transition becomes durable.

Deterministic authority for machine action therefore asks: What machine acts are allowed to settle? Under what proof? With what authority? Through which policy? After which verification? Subject to what challenge?

This is the deepest reframing.

The point is not to prevent every bad thought, every bad draft, every bad plan, or every bad suggestion inside a machine process. That is impossible and perhaps not even desirable. Intelligence explores possibility. Some possibilities are dangerous. The question is whether dangerous, unauthorized, or invalid possibilities can cross into settled consequence.

A robust society does not require that no one ever imagine fraud. It requires that fraud fail to settle as valid. A robust computer system does not require that no malicious packet ever be formed. It requires that unauthorized packets fail at the boundary. A robust alignment architecture does not require that a model never generate an unsafe plan. It requires that unsafe plans cannot become authorized acts.

Settlement is the point where alignment becomes real.

Before settlement, there may be proposals, drafts, simulations, rejected plans, warnings, blocked attempts, or reversible operations. After settlement, the world has changed.

Deterministic authority for machine action governs that crossing.

XII. Recursive Improvement

Some may argue that better models will solve this. As systems become more capable, perhaps they will become more truthful, more corrigible, more careful, more interpretable, and more aligned.

Perhaps. We should pursue that.

But recursive improvement makes authority infrastructure more necessary, not less.

A weak model can be constrained by its incompetence. A powerful model cannot. A static model can be tested against known benchmarks. A self-improving system changes the relevance of yesterday's benchmarks. A single model can be studied in isolation. A swarm, scaffold, or tool-using agent becomes a dynamic system. A model without memory can be evaluated as a session. A persistent agent must be governed across time.

The more general the intelligence, the less predictable the path from instruction to consequence. The more capable the system, the more valuable and dangerous its authority becomes. The more mutable the actor, the more invariant the boundary must be.

This is not pessimism. It is constitutional realism.

Human civilization already learned that ability and legitimacy must be separated. A brilliant person is not allowed to sign on behalf of a company without authorization. A skilled engineer is not allowed to deploy to production without credentials and process. A judge cannot invent jurisdiction because she understands the case. A banker cannot move funds because he knows where they should go. A government official cannot act merely because he believes the action is wise.

Power requires authority. Authority requires procedure. Procedure requires records. Records require verification. Verification requires settlement rules.

Autonomous AI will not escape this pattern. It will force us to implement it at machine speed.

XIII. Human Sovereignty

The goal is not to keep machines powerless.

That would be both impossible and undesirable. The promise of AI is not merely conversation. It is assistance, discovery, coordination, labor, creativity, research, engineering, medicine, education, logistics, and governance support at scales humans cannot manually provide.

A world where machines can think but never act is not the future. It is a bottleneck.

But a world where machines can act without bounded authority is not progress. It is abdication.

The right goal is not machine weakness. The right goal is non-sovereign machine power.

Machines should be able to act, but only through authority structures humans can understand, contest, revoke, audit, and improve. They should be able to execute, but not self-authorize. They should be able to plan, but not settle invalid acts. They should be able to operate at high speed, but not outside the law of their own action boundary.

Human sovereignty will not survive autonomous AI by requiring humans to click every button forever. Nor will it survive by trusting models to be permanently virtuous. It will survive through infrastructure that lets humans delegate without dissolving authority.

Delegation is not surrender when the scope is bounded. Autonomy is not sovereignty when settlement is governed.

This is the social contract of deterministic authority for machine action:

Humans retain legitimacy. Machines receive bounded agency. Infrastructure mediates the passage from intelligence to consequence.

XIV. The New Alignment

The old alignment was centered on the mind of the model. The new alignment must be centered on the validity of machine action.

This does not make model alignment irrelevant. It makes model alignment one layer in a larger stack. We still want models that are honest, safe, interpretable, corrigible, and robust. But we should not require civilization to bet its future on the stable virtue of mutable intelligence.

Model alignment improves the proposer. Authority alignment governs the act.

The distinction matters because the future will not contain one model. It will contain many models, open and closed, local and remote, specialized and general, human-supervised and autonomous, static and self-improving, individual and collective. They will be embedded in companies, governments, markets, homes, weapons systems, laboratories, hospitals, courts, schools, and infrastructure.

No single training method will govern all of that. No benchmark will certify all future contexts. No refusal style will secure all tools. No system prompt will carry civilization.

The invariant must be external to the mutable actor. The future of alignment is the boundary around machine consequence.

XV. The Fork

The field now faces a fork.

One path continues to treat alignment primarily as a property of model behavior. It asks for better answers, better refusals, better representations, better preferences, better interpretability, and better tests. This path is necessary. It should continue.

But it is incomplete.

The other path treats alignment as a property of authority infrastructure. It asks which machine acts are valid, which are invalid, who grants scope, what policies bind execution, what evidence is required, what receipts are produced, what can be challenged, and what is allowed to settle.

This path is not a replacement for model safety. It is the condition under which model safety can matter in the real world.

The model may be brilliant. The model may be helpful. The model may even be more reliable than many humans in many domains. But once it can act, the question is no longer only what kind of intelligence it is.

The question is what kind of authority it has been given.

An AI system that cannot act is a speaker. An AI system that can act is an institution in embryo. And institutions are not aligned by personality alone. They are aligned by law, procedure, accountability, evidence, and limits.

XVI. The Law of Action

Every civilization eventually learns to distinguish force from legitimacy.

A person with a weapon has capability. A lawful officer has authority. A court order has procedure. A public record has memory. An appeal has challenge. A constitution has constraint.

The same distinction must now be made for machines.

A model with tools has capability. A valid machine act requires authority.

This is the law of action:

No consequential machine act should become final unless it is bound to intent, admitted by policy, scoped by authority, executed through a deterministic boundary, evidenced by receipt, verified by independent rules, and settled under challengeable procedure.

The real danger is not that machines will think. The real danger is that machine thought will become consequence without legitimacy.

So the alignment target must move: from the private character of a mutable model to the public validity of an act.

The future does not belong to unbounded machines. It does not belong to humans manually approving every micro-action. It belongs to the infrastructure that lets intelligence act without becoming sovereign.

No consequential machine act should become real merely because a model produced it. It should become real only when it is admitted by policy, scoped by authority, evidenced by receipt, verified by independent rules, and settled under challengeable procedure.

The aligned machine is not the machine that promises obedience. It is the machine whose invalid acts cannot settle.

That is deterministic authority for machine action.