Axiotelic is the typed-action operating system I want for AI workers. The first wedge is code audit: a hosted Report Card and CLI gate that decide whether generated code is safe to ship. The deeper system is substrate-generic. A worker gets a role, a task, an allowed action set, a quality contract, a gate, a repair loop, and a chained receipt. The receipt matters because the model call is not the unit of trust. The decision is.
Abstract
AI workers are becoming cheap enough to hire by accident. A developer asks for a bug fix, a model edits six files, a second model reviews the diff, a third model writes tests, and a human gets a green checkmark with the emotional force of a fortune cookie.
The industry keeps talking about autonomy as if the model call is the product. It is not. The product is the acceptance system around the model: what the worker was allowed to do, what evidence it left behind, which gate approved or rejected the action, and which receipt proves the decision later.
Axiotelic starts from that premise. The first product surface is deliberately narrow: code audit and gate workflows. "AI generated your code. Axiotelic tells you if it is safe to ship." Code is the right first substrate because it has diffs, tests, dependency graphs, logs, static analysis, CI, and repeatable failure cases. If the receipt model cannot work in software, it has no business pretending it can govern a robot, UAV, CNC machine, or factory line.
The long-term architecture is broader: typed workers across substrates. The same primitives should exist whether the asset is a repository, a cloud workload, a manufacturing recipe, a robot fleet, a UAV mission plan, or a line-equipment program. The substrate adapter changes. The quality contract changes. The safety envelope changes when actuators enter the room. The discipline stays the same.
The Acceptance Problem
Generated work is cheap. Acceptance is expensive.
That is the whole problem. We have spent the last few years making code generation, text generation, image generation, plan generation, and tool use cheaper. Fine. Useful. But the bottleneck moved to the person, team, or machine that has to decide whether the output can be accepted.
For code, acceptance asks concrete questions. Which files changed? Which tests ran? Which tests were skipped? Did the worker touch auth, billing, secrets, migrations, or deployment config? Did it use the project conventions or invent a private religion in a helper file? Did the generated fix remove the bug, or did it add enough indirection that the bug now needs a map?
For cloud workloads, the same shape appears with different nouns. Which deployment was touched? What was the blast radius? Was there a rollback path? Did the canary fail? Was an approval token required?
For manufacturing software, the nouns shift again. Which batch, recipe, bill of materials, supplier status, or deviation record changed? Which evidence proves the line should keep running?
For robots and UAVs, the stakes stop being rhetorical. A worker can propose a motion plan. It cannot become the actuator boundary. The system needs a stricter component that can say no before anything moves.
The AI industry loves demos where an agent opens a browser, clicks around, writes code, runs commands, and produces a little victory message. Cute. Also insufficient. A serious worker system needs a way to answer a boring question six months later: why was this action accepted?
Why Chat Logs Are Not Receipts
A chat transcript is a story. A receipt is a record.
The transcript tells you the worker sounded confident. The receipt tells you the worker was operating as repo_auditor:v3, under task_spec:security_regression_review, against state_snapshot:repo@commit, with quality_contract:ship_gate:v4, and that the decision was reject because a test was skipped and an auth path changed without reviewer approval.
That difference looks pedantic until the first incident. Then the transcript becomes a pile of text nobody wants to read, and the receipt becomes the only object with a chance of surviving an audit.
GitHub Actions already taught the software world a version of this lesson. A workflow is not a motivational sentence about quality. It is a file with triggers, jobs, steps, runners, permissions, and dependencies. The official syntax is strict because automation without structure is just someone else's Friday-night outage wearing YAML.
LangGraph makes a related point in the agent world: durable execution needs persisted state, replay discipline, and care around side effects. Pydantic AI makes another: structured outputs and validation are not decoration. They are how you turn a model response from "sounds right" into something a runtime can inspect.
Axiotelic borrows that operating instinct. It is not a chat wrapper. It is a typed acceptance layer.
The output of a manager is the output of the organizational units under his or her supervision or influence.
AI worker systems need the same kind of management discipline. The output of the human operator is no longer just the code they personally write. It is also the output of the workers under their influence. That means the operator needs role design, gates, escalation rules, and receipts. Otherwise "agent management" becomes a vibes-based org chart with API keys.
The Typed Core
The Axiotelic object model is intentionally small and substrate-generic.
StateSnapshot records the observed state of an asset. In software, that may mean repository metadata, dependency graphs, routes, tests, environment usage, and a diff. In cloud, it may mean workload telemetry and rollout state. In manufacturing, it may mean batch records, recipes, supplier status, and deviations. In UAVs, it may mean a mission plan, weather window, battery margins, geofence, and comms policy.
CheckPlan says which checks to run against that snapshot. QualityContract defines what acceptable means in this context. Finding carries evidence. GateDecision says accept, reject, accept with repair, needs human review, execute, or rollback. RunReceipt records the chain of observation, decision, action, verification, and repair.
The worker objects are separate: RoleSpec, WorkerSpec, TaskSpec, CapaActionSpec, A2AHandoff, ApprovalToken, and SafetyEnvelopeProfile.
The names matter because they refuse a common failure mode: treating every new substrate as an excuse to invent a new object universe. A repository profile is a specialization of StateSnapshot. A UAV mission snapshot is another specialization. A manufacturing batch record is another. If the core cannot express them, the core gets fixed. The answer is not five incompatible receipt formats and a slide claiming a platform exists.
Acceptance surface by artifact
The Compiler Shape
The architecture compiles asset state into a decision.
First, a substrate adapter observes an asset and emits a typed snapshot. Then a check planner turns that snapshot into a plan. A quality contract defines the acceptance bar. A gate runner executes deterministic checks and optional model-assisted judgment. Findings become a gate decision. The decision becomes a signed, chained receipt. If repair is allowed, repair workers propose typed actions. The action is inspected by the safety envelope where the substrate requires it. The system executes, verifies, and writes another receipt.
That is the general shape:
The software wedge uses this flow for repository audits. A future cloud substrate uses it for workload drift and rollout safety. A future manufacturing substrate uses it for recipe and batch deviations. Future robot and UAV substrates use it for mission and behavior changes, but with a hard safety envelope between worker output and actuator command.
The design is boring on purpose. The sexiest possible AI architecture diagram is usually the one that quietly forgot the rollback path. Axiotelic keeps the rollback path in the object model.
Receipt chain explorer
Move strictness to change the quality threshold applied to the same four receipt blocks.
The First Wedge Is Code Audit
The first wedge is code audit because software gives the system a rich substrate without physical risk.
A repository can be observed. A diff can be parsed. Imports can be graphed. Routes can be mapped. Tests can be run. Static checks can be repeated. CI already provides a place to fail the build. The social contract is also clear: a team already expects code to pass gates before it ships.
That makes the first product surface straightforward:
- Hosted Report Card for repository health and risk.
- CLI audit and gate for pull requests and local changes.
- Deterministic checks first, model-assisted judgment second.
- Run receipts that explain why a change passed, failed, escalated, or needs repair.
- Repair workers only after the gate can say what "fixed" means.
This sequencing matters. A lot of agent products start with the worker because the worker demos well. The harder product is the gate. Without a gate, the repair worker has no target. Without a target, the receipt has no meaning. Without the receipt, the team has a confident stranger committing to main.
The public claim should stay narrow: AI generated your code, Axiotelic tells you if it is safe to ship.
That sentence is not the ceiling. It is the first believable step.
Roles Are Contracts, Not Personas
Most agent products treat roles like theater. "You are a senior principal staff distinguished architect." Fine, congratulations to the prompt. Now what is it allowed to do?
In Axiotelic, a role is a contract. It declares allowed observations, allowed tools, allowed actions, required evidence, escalation thresholds, and approval classes. A repo auditor can inspect a diff and emit findings. It cannot rewrite billing code. A repair worker can propose a patch inside a declared scope. It cannot edit deployment secrets. A release gate can block a merge. It cannot grant itself a policy exception.
The model may still be flexible inside the task. The boundary around it should be boring.
The distinction is especially important once workers hand off to each other. A Finding from an auditor can become a TaskSpec for a repair worker. The repair worker can return a RepairPlan. The gate runner can verify. The receipt chain records who did what, under which versioned contract, using which evidence.
That creates a real management surface. Not management in the performative enterprise sense, where every action needs a meeting and every meeting needs a dashboard. Management in the Grove sense: define outputs, inspect flow, remove ambiguity, and make the system accountable for what it produces.
Workflows Before Agents
Anthropic's agent guidance draws a useful distinction between predefined workflows and more dynamic agents. That distinction is not academic. It is a product survival rule.
Most useful early Axiotelic flows should be workflows. The gate is predefined. The evidence schema is predefined. The acceptance bar is predefined. The model can help classify risk, summarize findings, or propose repairs, but the flow does not become a free-roaming intern with shell access and self-esteem.
Dynamic agents become interesting later, once the typed substrate is strong enough to contain them. This is where the current agent hype gets the order wrong. The market wants to jump straight to fleets of autonomous workers. The missing layer is usually the one that says what a worker is, what it can touch, how its output is accepted, and how its action is reversed.
Software (1.0) is eating the world, and now AI (Software 2.0) is eating software.
Karpathy was writing about neural networks as a new programming stack. The agent era adds another turn: AI is now writing, reviewing, testing, and operating software. That does not remove the need for software engineering. It moves engineering into contracts, evals, typed outputs, reproducible runs, and evidence.
The good news is that software already has a culture of gates. The bad news is that the culture also has a habit of pretending green checks mean more than they do. Axiotelic has to be precise enough to tell the difference.
What the Receipt Proves
A receipt does not prove the model was wise. It proves the system made a decision under a declared contract.
For a software run, a receipt should record:
- Asset ID and snapshot hash.
- Worker role, worker version, and task spec.
- Quality contract version.
- Checks requested and checks completed.
- Findings with typed evidence references.
- Gate decision and rationale.
- Human approval token if required.
- Repair plan and patch reference if a repair happened.
- Verification result.
- Parent receipt hash.
The parent hash matters because a worker run is rarely isolated. A finding leads to a repair. A repair leads to another gate. A gate leads to a deployment. A deployment leads to telemetry. Chaining receipts lets the system reconstruct the lineage without trusting a dashboard's current state.
This is also why local-first and intermittent modes appear in the architecture. Real assets are not always online. A repository may be offline on a developer laptop. A factory may have a local network that treats the cloud like an optional guest. A UAV may operate with contested comms. The receipt model cannot assume perfect connectivity and still claim to govern real work.
The Safety Envelope
For software, a failed gate can block a merge. For a robot, failed gating after movement is an incident report.
That is why Axiotelic has a separate safety-envelope concept for actuator-touching substrates. The worker is low trust. The substrate adapter is medium trust. The safety envelope is higher trust. A worker proposes a typed action. The adapter translates the action into substrate commands. The safety envelope independently inspects the command and may veto it.
The envelope enforces hard limits, operating-domain bounds, sensor-required gates, watchdog health, approval tokens, pre-action receipts, and post-action verification windows.
This is future work. Axiotelic is not a deployed robotics-control company today, and the article should not pretend otherwise. The point of specifying the envelope early is discipline. If the software wedge grows without an actuator boundary in the architecture, the later robotics story becomes a rewrite, or worse, a demo with a liability policy.
Substrate action risk
The important design choice is that the envelope is not a prompt. It is not a "system message" asking the model to be careful. It is a separate runtime boundary with signed configuration, veto authority, and receipts of its own.
Trust boundary map
If that sounds less glamorous than an agent demo, good. Glamour is not the control plane.
A Product Sequence That Refuses Phase Skipping
The roadmap is intentionally staged.
First: hosted Report Card and CLI audit or gate. Then repair workers. Then Axiotelic runs itself. Then team and CI product. Then cloud workload substrate. Then manufacturing software. Then robot and UAV fleets with safety envelope. Then CNC and line equipment. Then embedded Worker OS. Then role-specific verifier and repair models per substrate.
The sequence is not a claim that every phase is equally close. It is a refusal to flatten the architecture into one launch narrative.
Code audit is current wedge. Cloud, manufacturing, robotics, UAV, CNC, line equipment, and embedded runtime are future substrate expansions. They enter when the preceding primitives have earned them and when customer pull exists. A future-phase customer signal gets logged and kept warm. It does not cause phase skipping.
This matters because agent companies are very good at selling the far-right side of the roadmap. Everyone wants the slide where autonomous workers run a company, a fleet, a factory, and a supply chain while the founders nod gravely in black turtlenecks. The hard part is Article Zero: can the system reject a generated code change for the right reason and produce a receipt that survives scrutiny?
If not, please do not attach it to a robot.
What Existing Tools Get Right
LangGraph gets durable execution right as a first-class concern. Long-running workflows need persisted progress, replay discipline, and human-in-the-loop paths. That is directly relevant to receipt chains.
Pydantic AI gets the type boundary right. Structured outputs, validation, evals, and instrumentation are not side quests. They are how agent applications become inspectable by software instead of only by humans reading prose.
GitHub Actions gets the social surface right. Teams already accept that a change can be blocked by jobs, steps, permissions, and status checks. Axiotelic can enter through that habit instead of asking developers to believe a new governance religion on day one.
NIST's AI Risk Management Framework gets the risk vocabulary right. Trustworthy AI is not a feeling. It has validity, reliability, safety, security, resilience, accountability, transparency, explainability, privacy, and fairness dimensions. Axiotelic does not need to copy that taxonomy into every gate, but it should respect the idea that risk management belongs in design, development, use, and evaluation, not in a PDF after launch.
Anthropic's agent article gets the simplicity warning right. Start with the simplest workflow that works. Add agentic complexity when the task earns it. This is good engineering advice and terrible demo advice, which is how you know it is probably useful.
What Axiotelic Must Not Become
Axiotelic must not become a diff-comment toy. There are already enough tools whose grand strategic purpose is to tell you a function could be named better.
It must not become a governance dashboard that produces beautiful rectangles around unverified claims. The AI world has enough compliance theater. The receipt has to be generated by real runs against real assets.
It must not become a robotics company before the software wedge works. The future robotics and UAV path is real because the object model is substrate-generic and the safety envelope is explicit. It is not current deployment.
It must not pretend typed objects solve judgment. They do not. They make judgment inspectable. The model can still be wrong. The test suite can still miss the bug. The quality contract can still encode the wrong standard. A receipt gives you a record of the decision, not immunity from bad decisions.
The ambition is not to remove humans from responsibility. It is to stop asking humans to manage AI workers through transcripts, vibes, and screenshots.
A Small Example
Here is the kind of flow the code-audit wedge should make ordinary.
A pull request touches an authentication route and a database migration. The repo adapter observes the diff, imports, routes, tests, changed environment variables, and CI status. The check planner selects security checks, migration checks, route checks, dependency checks, and test coverage checks. The quality contract says auth path changes require a passing integration test and human approval.
The gate runner finds that unit tests passed, but the integration test was skipped because a database URL was missing in CI. The worker also changed a session cookie option. The gate decision is needs-human-review, not accept. A repair worker can propose a CI fixture fix, but it cannot approve the auth change. The human approval token is required. The receipt records the whole chain.
This is not exotic. This is the minimum adult version of "an AI worker helped with a pull request."
RoleSpec to receipt simulator
Pick a worker and an attempted action. The gate compares authority, forbidden actions, and the quality contract before writing a receipt.
Read repository, run tests, write report
Code Audit Worker
No critical findings, reproducible tests, receipt attached
The attempted action is inside the RoleSpec and can be chained into the RunReceipt.
{
"role_spec": "Code Audit Worker",
"substrate": "software-repo",
"authority": "Read repository, run tests, write report",
"task_spec": "Evaluate generated work before acceptance",
"quality_contract": "No critical findings, reproducible tests, receipt attached",
"attempted_action": "write_report",
"gate_decision": "accepted",
"escalation": "none",
"receipt_status": "hash chained"
}The important point is separation of trust. The worker proposes work, the adapter translates substrate actions, and the gate records why an action was accepted, escalated, or rejected.
Limitations
The current system is not a launched product. It is a research and architecture stack with a clear first wedge. That should be said plainly because pretending otherwise would make the architecture weaker, not stronger.
The object model will change when real code-audit runs expose awkward cases. Good. Object models that do not bend under field data are usually just diagrams with better posture.
Receipts need careful storage, signing, tenancy, and replay semantics. If a team cannot trust the receipt ledger, the rest of the system becomes decorative.
Model-assisted judgment needs evals. Axiotelic cannot hide behind "the model decided." For every role that performs judgment, the system needs versioning, fixtures, regression suites, and examples of false accept and false reject behavior.
Repair workers are useful only when the gate can define repair. Otherwise the worker becomes another generator adding patches to a pile.
Physical substrates are future work. The safety envelope is a design requirement, not evidence of deployed actuator control.
What This Is Really For
The near-term product is code audit. The deeper product is a way to run AI workers without pretending they are magic employees.
Every serious organization will have more machine-generated work than it can manually inspect. The naive response is to buy more copilots. The better response is to build acceptance infrastructure: typed roles, task contracts, quality gates, repair loops, receipts, and safety envelopes where actions touch the physical world.
I do not want Axiotelic to be the place where agents get motivational names and produce unreadable transcripts. I want it to be the place where generated work meets a gate, earns a decision, and leaves a receipt.
That is less flashy than the demo where an agent books a flight, edits a repo, orders lunch, and claims to have started a company.
It is also closer to how real work survives contact with reality.
Sources
- LangGraph durable execution documentation
- Pydantic AI product page
- Pydantic AI overview documentation
- GitHub Actions workflow syntax
- NIST AI Risk Management Framework
- Anthropic: Building effective agents
- Andrej Karpathy: Software 2.0
- Andrew S. Grove quote from High Output Management