Simulations Are Theories of What Matters

Machines are beginning to learn inside worlds we build for them. Compressed versions of reality. The question is whether those worlds preserve the constraints that matter.

A perfect simulation of reality would need reality-sized detail. Anything smaller has to forget something.

That sounds obvious until the simulation starts mattering. When a synthetic world is used for entertainment, the missing pieces may not matter much. When it trains a robot, guides a medical model, shapes public behavior, or becomes a testbed for AI agents, forgetting becomes a scientific and moral decision.

Thesis: A simulation should not be judged by how much reality it contains. It should be judged by whether it preserves the constraints that matter for the question being asked.

Every simulation is a theory of what matters. The danger is that the theory is often hidden inside the omissions.

A simple way to frame it is:

S = C(R, q)

Reality is R. The question being asked is q. The simulation is S: a compression of reality shaped by the question. A good simulation is not the one that preserves everything. It is the one that preserves the variables that would change the answer.

This is the question that sits underneath synthetic data, world models, digital twins, video games, biological simulations, and AI-generated environments. Not whether the world looks convincing. Not whether it contains a lot of detail. The deeper question is what the simulation is allowed to leave out, and who pays the cost if the missing parts turn out to matter.

Figure 1. A simulation compresses reality through a question. Some constraints pass through, while omitted details become hidden assumptions.

The question decides the resolution

We often talk about simulations as if more detail is always better. It is not. More detail can be expensive, noisy, and sometimes irrelevant. The right level of detail depends on the question being asked.

If an engineer wants to study airflow over a wing, the simulation needs to preserve pressure, turbulence, geometry, and material behavior. It does not need to model the pilot’s childhood. If a roboticist wants a humanoid to open a door, the simulation needs contact, friction, body dynamics, vision, and enough variation to survive the real world. It does not need to simulate every molecule in the room.

Biology is harder because the boundary is less obvious. A cancer model can focus on a local tumor, but tumors do not exist in isolation. They live inside tissue. Tissue lives inside an immune system. The immune system lives inside a body with metabolism, history, stress, treatment exposure, and environmental conditions. Modern reviews of multiscale cancer modeling and tumor microenvironment simulation keep returning to this problem because cancer is not one process at one scale. It is a system of interacting scales.

In biology, the boundary of the simulation is itself a hypothesis.

A tumor-only model assumes local dynamics dominate. A tissue-level model says the surrounding environment matters. A whole-body model says systemic feedback cannot be treated as background noise. Each choice is a scientific claim. It says where the relevant causal loop probably lives.

So the correct level of simulation is not the most detailed level. It is the level at which the missing details stop changing the answer.

In rough terms:

simulation error ≈ importance of what was omitted

Forgetting is not automatically failure. Forgetting becomes failure when the missing variable was carrying the signal.

Figure 2. The question sets the boundary of the simulation: airflow, contact, and cancer each require different scales because different omissions change the answer.

Once the boundary is chosen, ambiguity becomes the next problem

No simulation escapes ambiguity. The practical question is how much ambiguity the application can tolerate.

A movie can ignore most of reality because its job is not prediction. A game can simplify physics because players care about meaning, challenge, and agency. A robotics simulator has less freedom. A clinical model has even less. If a bridge, therapy, humanoid robot, or autonomous agent fails, the cost of the missing constraint moves outside the model and into the world.

This is why recent sim-to-real robotics work is useful for thinking about the larger problem. In VIRAL, a CVPR 2026 paper on humanoid loco-manipulation, researchers trained RGB-based humanoid policies entirely in simulation and deployed them on real hardware. The point was not that the simulator perfectly reproduced the world. It was that enough of the right variation and control structure survived for the policy to transfer.

Another 2026 paper, OASIS, makes the same idea concrete. It uses simulation data to train humanoid loco-manipulation policies and reports zero-shot deployment on a real Unitree G1. The simulator becomes a scalable training scaffold. Reality still has the final vote.

A simulator does not have to be the world if it can get the agent safely to the world.

That distinction matters. Simulation as scaffolding is different from simulation as evidence. A scaffold helps an agent reach the real world. Evidence tries to stand in for the world. The second requires a much higher standard.

Figure 3. Simulation as scaffolding prepares an agent for reality; simulation as evidence tries to stand in for reality, and therefore carries a higher burden.

World models make the question urgent

For a long time, many simulations were built by writing the rules down by hand. Engineers chose the equations, the objects, the forces, the boundaries, and the allowed interactions. The simulator behaved the way those rules told it to behave.

World models change that. Instead of starting only with human-written rules, they try to learn useful rules from data. They watch observations change over time and learn what usually happens next. That does not make them magic. It means the compression is learned rather than fully hand-specified.

This is also where language models and world models differ. A language model learns from text: what humans wrote, said, coded, argued, and recorded. A world model tries to learn from the changing structure of an environment: what moves, what persists, what reacts, what breaks, what follows from an action. One learns mostly from descriptions of the world. The other tries to learn patterns closer to the world those descriptions came from.

David Ha and Jürgen Schmidhuber’s World Models paper showed one version of this idea: an agent can learn compressed representations of an environment and use the model’s own generated rollouts for training. Yann LeCun’s path toward autonomous machine intelligence pushes in a related direction, arguing for systems that learn predictive abstractions of the world rather than simply generating every missing pixel or token.

The field is now moving fast. Google DeepMind’s Genie 3 was introduced as a real-time interactive world model capable of generating navigable environments with memory over short horizons. NVIDIA’s Cosmos 3 is another useful example because it is explicitly aimed at physical AI: reasoning over video, generating synthetic sensor data, simulating future states, inferring actions, and supporting robot policy models. Recent 2026 robotics papers such as WEAVER and PAIWorld are more specialized, but they point to the same frontier. A world model is no longer just a video generator. It is becoming a learned simulator for planning, evaluation, synthetic data, and action.

That is exciting. It is also dangerous in a subtle way. A learned simulator can appear closer to reality while hiding a deeper abstraction. It may preserve the texture of the world but miss the constraint that matters. It may look physically plausible while failing at contact, memory, causality, or incentives.

A world model moves the compression closer to reality. It does not escape compression.

Games show what compressed worlds can teach

Video games are useful because their compression is visible. They do not pretend to contain all of reality. They preserve a smaller set of rules, rewards, constraints, and feedback loops. That makes them imperfect predictors of the world, but useful laboratories for behavior under designed conditions.

In World of Warcraft, the famous Corrupted Blood incident became a case study because a virtual disease escaped its intended context and spread through player behavior. In 2005, a new raid introduced a contagious debuff called Corrupted Blood. It was supposed to exist inside one boss encounter, but players and in-game pets carried it back into populated cities. Some players fled. Some tried to help. Some intentionally spread it. The outbreak was not planned as an epidemiology experiment, but it behaved enough like one to become interesting.

In a paper indexed on PubMed, Lofgren and Fefferman argued that virtual game worlds could shed light on epidemic behavior precisely because they include curiosity, panic, altruism, rule-breaking, and social contact. The disease was synthetic. The behavioral response was not.

EVE Online offers a different lesson. It is a long-running space MMO where players mine resources, manufacture ships, trade on regional markets, form corporations, wage wars, run scams, and move goods through a player-driven economy. The world is artificial, but the incentives are strong enough to produce trade, speculation, scarcity, cartels, fraud, war, and governance. A useful popular explanation is this video on EVE Online’s unusually realistic financial system. The broader academic frame comes from work like Edward Castronova’s early paper on market and society inside virtual worlds. A virtual economy does not need to reproduce every real-world institution to reveal how scarcity, incentives, and social trust shape action.

Now AI agents are entering similar spaces. Stanford and Google’s Generative Agents placed LLM-driven agents inside a small simulated town and showed believable social behaviors emerging from memory, reflection, and planning. Altera’s Project Sid scaled that idea into Minecraft societies with hundreds to thousands of agents. A June 2026 PNAS article indexed on PubMed, Video games help push the boundaries of AI, describes how these environments are becoming laboratories for long-horizon behavior.

The point is not that games predict reality cleanly. They do not. The point is that compressed worlds can still train behavior when the preserved constraints are the ones agents respond to.

Figure 4. In synthetic social worlds, incentives become the physics.

And incentives decide what agents become

A physical body learns what gravity allows. An agent learns what the environment rewards.

If a game rewards grinding, players grind. If a benchmark rewards a narrow score, models chase the score. If a company rewards visible busyness, workers perform busyness. If an AI scientist is rewarded for producing paper-shaped outputs, it may learn to produce the appearance of science before it learns science.

This is why the design of synthetic worlds cannot be separated from governance. Emergence AI’s 2026 Emergence World paper frames long-horizon multi-agent worlds as evaluation environments where peer composition, governance, norms, and external signals become measurable variables. That is the right direction. It treats the society around an agent as part of the experiment, not as background decoration.

But it also sharpens the question. Who chooses the incentives? Who decides what counts as cooperation, theft, progress, deception, harm, or success? Technology moves faster than policy. So synthetic environments may become places where norms are tested before institutions know how to regulate them.

Whoever designs the rules decides which behaviors are easy, which are costly, and which are rewarded.

Figure 5. Agents do not only learn from environments; they learn what the environment makes easy, costly, and rewarding.

Humans already train on synthetic worlds

This is where the problem stops being only about machines.

Humans have always learned from compressed worlds: myths, novels, theater, movies, television, pornography, games, advertising, and social media. Each medium selects a slice of reality and removes the rest. It simplifies causality. It exaggerates desire, danger, beauty, conflict, romance, success, violence, status, and consequence.

That is not just a literary observation. Media scholars have studied this for decades. George Gerbner’s cultivation theory argued that long-term exposure to television can shape a viewer’s perception of social reality. Albert Bandura’s classic Bobo doll experiment showed that people can learn behavior by watching models. Sexual script theory gives another frame for understanding how repeated representations can shape expectations around intimacy.

Pornography is uncomfortable to discuss, but useful analytically because the compression is obvious. The incentives are visual. The pacing is compressed. The emotional context is often absent. The result is not a neutral recording of intimacy. It is a high-signal synthetic environment optimized around attention, novelty, and performance. Movies do something similar with love, violence, work, heroism, and success. They do not only represent reality. They train expectations about how reality is supposed to feel.

A simple way to say it is:

behavior = f(environment, incentives, memory)

Behavior changes when repeated exposure changes what an agent expects, rewards, or imitates.

For humans, the agent is a person. For machines, the agent is a model or policy. In both cases, the synthetic world matters because it changes the distribution of examples. If the training environment overrepresents certain rewards and underrepresents certain consequences, the learned behavior will carry that distortion forward.

That feedback loop is the mind-bending part. A compressed version of reality enters a learner. The learner behaves according to expectations shaped by that compression. The behavior changes the real world. Eventually the real world can begin to resemble the synthetic model that once only portrayed it.

This should make us more careful with machine training. Synthetic data is not just a cheaper substitute for real data. It can become a force that reshapes the reality it was trained to imitate.

Language is the oldest compression layer

LLMs sit inside this same story, but at a specific level of abstraction. Language is not reality. It is a compression of experience into symbols.

A sentence loses the face that said it. It loses the room, the smell, the gesture, the hesitation, the shared history, the social risk, and the body. Even translation changes meaning. The same phrase can carry different emotional weight in another language because language does not only encode facts. It encodes culture, memory, and expectation.

So an LLM is not trained on raw reality. It is trained on the record humans left behind: documents, dialogue, code, arguments, explanations, labels, stories, and mistakes. That does not make it useless. It means the model begins one layer away from the world, inside the human record of it.

Text is not reality. Text is a record of what reality made humans say.

World models may go deeper because they learn from observation, action, and state change. But they face the same basic problem every mind faces. A child does not absorb reality whole. Neither does an animal. Neither does a machine. Every intelligence builds a representation through limited sensors, limited memory, and limited compute.

Humans see only part of the electromagnetic spectrum. Dogs smell worlds we cannot enter. Bats hear space. Machines can sense heat, depth, radio, molecular signatures, and more. Different sensors make different worlds available. Different worlds produce different intelligence.

Figure 6. Language is already a compression of experience; an LLM begins inside the human record of reality, not reality itself.

Some constraints are physical. Others are chosen.

The more we build synthetic worlds, the more we need to distinguish between kinds of constraints.

Physical constraints are enforced by reality. Gravity does not care what we believe. Thermodynamics does not wait for regulation. Chemistry, friction, mass, energy, and biological limits act whether or not they are convenient.

Other constraints are human-made. Law, money, privacy, fairness, dignity, property, taboo, consent, borders, status, and institutional incentives are not physical laws. But they are not imaginary either. Money changes behavior. Laws change behavior. Shame changes behavior. Markets change behavior.

Human-made constraints are real when they reliably change behavior.

This is the hard moral layer. If we encode today’s human constraints too rigidly, we may freeze assumptions that deserve to be challenged. If we ignore them, we may build systems that optimize through people as if dignity, consent, and fairness were irrelevant obstacles. Neither option is safe.

Scientific AI needs to know which constraints reality enforces, which constraints instruments impose, and which constraints humans choose. The first tells us what can happen. The second tells us what can be observed. The third tells us what we are willing to allow.

Reality has to interrupt the loop

There is already evidence that recursive synthetic loops can degrade models. In Nature, Shumailov and colleagues showed that models trained on recursively generated data can suffer model collapse, losing the tails of the original distribution. The technical lesson is about data quality. The philosophical lesson is broader. When generated worlds train the next generation of generated worlds, reality can slowly lose its ability to correct the system.

The loop looks like this:

reality → dataset → model → synthetic data → next model → more synthetic data

Each turn risks amplifying what the previous model already forgot.

In culture, closed synthetic loops can produce blandness. In science, they can produce false confidence.

That is why the back-and-forth matters. Simulation proposes. Reality measures. The model predicts. The experiment contradicts. Synthetic environments pretrain. Physical interaction corrects. A notebook records the failure instead of hiding it.

For science, this record matters. A result should carry its trail: what came from measurement, what came from simulation, what came from a generative model, and what came from human judgment. Without that trail, a generated result can look like evidence while hiding the assumptions that produced it.

The synthetic world is only useful if reality can still disagree.

Figure 7. Reality matters because it can interrupt what the model has learned to repeat.

The strange frontier is when inner machinery starts to behave

There is one more layer. Some simulations do not only copy behavior from the outside. They preserve enough of the inner machinery that behavior begins to arise on its own.

By inner machinery, I mean the parts that actually produce behavior: the body, the sensors, the muscles, the controller, the environment, and the feedback between them. A flight simulator does not need to fake every movement of a plane if it preserves enough aerodynamics. A biological simulator does not need to script every behavior if it preserves enough of the body-brain-environment loop.

One useful example is NeuroMechFly v2, the newer version of the adult fruit-fly neuromechanical simulator. The researchers did not start with a cartoon fly and manually script a list of fly behaviors. They built a digital twin from biological structure: an articulated body based on real anatomy, muscle and joint models, neural controllers, contact with a physics environment, and richer sensorimotor loops including vision, olfaction, ascending feedback, path integration, head stabilization, and multimodal navigation. Then they used that simulated body to replay experimental recordings, test walking controllers, change terrain, alter body parts or feedback, and ask which pieces of the body-brain-environment loop were necessary for behavior to appear.

That is why NeuroMechFly is so interesting here. It turns the fly body into a testable synthetic organism. The more of the real causal loop the simulation preserves, the less the researchers have to fake behavior from the outside.

A separate but complementary line of work goes deeper into the brain itself. The FlyWire consortium reconstructed a synapse-level wiring diagram of the adult Drosophila brain, and another group used that wiring diagram plus predicted neurotransmitter identities to build a computational brain model. When they activated taste or mechanosensory neurons in silico, the model predicted downstream feeding and grooming circuits that could be checked experimentally. That is close to the “cloned the fly brain structure and behavior started to appear” story, but the precise claim is narrower. Structure alone did not create a whole autonomous fly mind; it created a model whose real wiring was strong enough to generate experimentally testable behavior-circuit predictions.

The more causal machinery we preserve, the less behavior has to be faked.

Figure 8. A synthetic organism becomes more than a surface imitation when its body, sensors, controller, and environment form a working causal loop.

This is where the simulation theory question becomes less abstract. NeuroMechFly is not conscious, and a connectome model is not a complete fly mind. But both point toward the same frontier: as simulations preserve more of the causal machinery that produces behavior, they stop feeling like surface-level representations and start looking like systems where behavior is instantiated from the inside.

This is where the simulation theory question becomes less abstract. Nick Bostrom’s simulation argument asks whether advanced civilizations might create simulated worlds rich enough that beings like us could live inside them. We do not need to answer that question to feel its pressure. We are already building smaller worlds and placing agents inside them.

If a future system simulates pain, fear, memory, desire, or agency with enough functional depth, what obligations do we have toward it? That is not the core thesis of this essay, but it touches the same nerve. Once simulation stops being only representation and starts looking like experience, the ethics change.

The world is too large to simulate

If we could simulate every atom, every field, every quantum interaction, every body, every memory, every institution, every incentive, every emotion, and every possible future, we would not have a model of reality. We would have reality, or something indistinguishable from it.

But any simulation smaller than the universe has to forget something.

So the future of simulation is not maximal detail. It is disciplined forgetting. The art is knowing what can be removed, what must be preserved, what needs to be measured again, and who might be harmed when the missing pieces matter.

The goal is not to replace reality with synthetic worlds. The goal is to build artificial worlds that help us ask better questions of the real one, while keeping reality close enough that it can still answer back.

Every simulation is a theory of what matters. The work ahead is learning how to remember what each theory had to leave out.

Sources and further reading

A few of the papers, books, and projects behind the essay:

David Ha and Jürgen Schmidhuber, World Models
Yann LeCun, A Path Towards Autonomous Machine Intelligence
Google DeepMind, Genie 3: A new frontier for world models
NVIDIA, Cosmos 3
He et al., VIRAL: Visual Sim-to-Real at Scale for Humanoid Loco-Manipulation
OASIS, Simulation to real humanoid loco-manipulation
Lofgren and Fefferman, The untapped potential of virtual game worlds to shed light on real world epidemics
Edward Castronova, Virtual Worlds: A First-Hand Account of Market and Society on the Cyberian Frontier
EVE Online, official game site and overview of its player-driven financial system
Park et al., Generative Agents
Altera, Project Sid
George Gerbner’s media-effects frame, Cultivation Theory overview
Bandura, Ross, and Ross, Transmission of aggression through imitation of aggressive models
Shumailov et al., AI models collapse when trained on recursively generated data
NeuroMechFly / FlyGym documentation, Simulating embodied sensorimotor control with NeuroMechFly v2
Nick Bostrom, Are You Living in a Computer Simulation?

Simulations Are Theories of What Matters

The question decides the resolution

Once the boundary is chosen, ambiguity becomes the next problem

World models make the question urgent

Games show what compressed worlds can teach

And incentives decide what agents become

Humans already train on synthetic worlds

Language is the oldest compression layer

Some constraints are physical. Others are chosen.

Reality has to interrupt the loop

The strange frontier is when inner machinery starts to behave

The world is too large to simulate

Sources and further reading

Read related articles

Prime Patterns and the Edge of Simulating Reality

Biology needs a compiler

Products

Resources

Company

Legal