Skip to content

The Optimal Amount of Violence Is Not Zero

Published: at 10:00 AM

The Optimal Amount of Violence Is Not Zero

Last week I built a game theory simulation where hobbits and orcs trade with each other. The hobbits always cooperate. The orcs always defect. You can adjust how often hobbits fight back.

At 0% violence, the orcs farm the hobbits until everyone starves.

At 100% violence, the orcs die immediately. The hobbits survive but the welfare score tanks. Turns out a peaceful society of corpses isn’t actually peaceful.

Somewhere around 30%, with the right learning mechanisms enabled, something interesting happens: orcs start cooperating. Not because they’ve found Jesus, but because they’ve done the math. Defection gets you injured. Injury makes you desperate. Desperation plus memory equals behavior change.

Anyone who’s read Axelrod or thought seriously about deterrence knows this. But there’s something about watching it play out, little emoji orcs turning from red to purple (scared) to green (reformed), that makes the theory feel less like theory.

Why I Built This

I’ve been thinking about punishment lately. Not in a philosophical “is it justified” way. That question is largely settled by anyone who’s tried to run a functional organization or raise children. Consequences matter. The interesting question is: what kind of consequences actually produce better behavior?

The rehabilitation vs. retribution debate usually gets framed as a values question. Soft-hearted liberals want to understand criminals. Hard-nosed conservatives want to punish them. But this framing is stupid. It treats “understanding” and “punishing” as opposites when they’re actually orthogonal. You can understand exactly why someone defects and still punish them for it. You can punish effectively precisely because you understand the incentives.

Game theory doesn’t care about your feelings. It just asks: what feedback loops produce what outcomes?

Pure retribution (kill every defector immediately) removes the defector from the population but teaches nothing to observers. Pure rehabilitation (no consequences ever) lets defection pay, so it spreads. The sweet spot is somewhere in between: consequences that hurt enough to change behavior, but don’t remove the opportunity to change.

In the simulation, this manifests as an injury system. First offense gets you hurt. You heal in three turns. While injured, you’re much more likely to cooperate. Call it situational wisdom. Second offense while still injured? Now you die.

This is basically graduated sanctions, which Elinor Ostrom identified as a key feature of successful common-pool resource management. First-time fish quota violators get a warning. Repeat offenders lose their license. The system needs enough signal to learn from.

What The Simulation Actually Shows

The interesting runs are when you turn on both “schools,” the information sharing mechanisms for each population.

Hobbit School means hobbits share gossip. They maintain a blacklist of known defectors. They’re more cautious after the community has been burned. But they also give second chances to orcs who’ve reformed or who’ve already been punished.

Orc School means orcs learn from observation. They see other orcs get injured and killed. They accumulate fear. An orc who’s been injured personally has a 60% base chance to cooperate next time — not because the game told it to, but because the incentives shifted.

With both schools enabled and moderate violence (~30%), you see something like a functioning society emerge:

  1. Early game: orcs exploit hobbits freely
  2. Mid game: hobbits start fighting back, orcs get injured
  3. Injured orcs cooperate, survive, build reputation
  4. Hobbits recognize reformed orcs, give them chances
  5. Late game: stable mixed population, mostly cooperating

The welfare score (which penalizes violence) peaks somewhere in this range. Not at zero violence (exploitation collapse) and not at maximum violence (everyone dies).

The math says: if you want the good outcome, you have to be willing to hurt people. But not too much, and not permanently, and ideally in ways they can learn from.

The Part People Don’t Want To Hear

There’s a certain kind of person who thinks punishment is inherently barbaric, a vestige of our cruel past that enlightened societies should evolve beyond. I’ve never been that person. The simulation confirms what I already suspected: purely voluntary cooperation doesn’t scale. You need enforcement. And enforcement means someone, somewhere, has to be willing to impose costs on defectors.

Cruelty has nothing to do with it. Taking incentives seriously does. A world where defection never costs anything is a world optimized for defectors. The hobbits in my simulation aren’t mean when they injure orcs. They’re just not naive.

The question was never whether to have punishment. The question is whether your punishment regime is calibrated to produce behavior change rather than just warehousing people or letting them walk.

The injury system in my simulation is a toy model, but the principle is real. A fine is better than prison for minor offenses, not because it’s nicer but because it’s a faster feedback loop. Prison with education beats prison without. Not mercy, just better ROI on the intervention. A criminal record that expires beats a permanent one because permanent exile removes any incentive to reform.

The people who want zero punishment are wrong. But so are the people who think punishment is about making offenders suffer. Both camps are optimizing for feelings rather than outcomes. The actual goal is behavior change. Everything else is theater.

What’s Next

This is the first in what I expect to be a small series. The hobbits-and-orcs model is useful for population dynamics, but it abstracts away individual psychology.

The next simulation I want to build focuses on a single defecting agent moving through a social network. One bad actor who poisons interactions as they go. The question: how does punishment (or its absence) affect the spread of toxicity?

In the population model, defectors either reform or die. In a network model, they can also leave. They can burn one community and move to the next. This changes the calculus entirely and maps more closely to how bad actors actually work in online spaces, companies, and social scenes.

What happens when a defector can simply move to a new pool of victims? Does punishment help or just accelerate their departure? Is there a way to share reputation across communities that doesn’t also enable witch hunts?

I don’t know the answers yet but will report back and open source the next version too.

Try It Yourself

The simulation is a React component. You can adjust:

The interesting experiments:

The code is MIT licensed. The math is older than all of us.


This is part of an ongoing project exploring game theory through interactive simulation. Built with Claude. url to try it via your browser: https://orcs-vs-hobbits.joonaheino.com/ url to code so you can clone and edit: https://github.com/lurnake/game-theory-orcs-vs-hobbits-simulator


Next Post
The New Rise of Alchemy