If You're Running Claude Code, PLEASE Run It in a Box

Let’s talk about Claude Code for a minute. I’m not going to tell you yet again ([1], [2], [3], [4]) why you shouldn’t, but rather how you should use it, if you must. In other words: this post assumes you’ve already thought about the craft side, and focuses on not blowing up your production {insert whatever} in the process.

But just to summarize the why and why not of Claude Code (or whatever latest fancy tool), here’s where I actually think it shines:

I have a really nice (and also quite dumb and non-complex) skill called tidy-tailwind. Before:

button
    [ class "bg-blue-500 text-white px-4 py-2 rounded hover:bg-blue-600 font-medium text-sm flex items-center gap-2 disabled:opacity-50 md:px-6 md:py-3 border border-blue-600 shadow-sm"
    ]
    [ text "Submit" ]

After:

button
    [ class "flex items-center gap-2"
    , class "px-4 py-2 md:px-6 md:py-3"
    , class "font-medium text-sm"
    , class "text-white"
    , class "bg-blue-500 hover:bg-blue-600"
    , class "border border-blue-600 rounded"
    , class "shadow-sm"
    , class "disabled:opacity-50"
    ]
    [ text "Submit" ]

Takes seconds, costs nothing to verify, and I genuinely do not learn anything from doing that by hand. So I don’t. It’s nice! And it doesn’t make me stupid, though arguably less adept at, I don’t know, creating vim macros?

I’ll also occasionally have ~~my slave~~ Claude connect to a Figma MCP server and ask “does the gap on this card match the design?” (Almost always, btw: “not quite.” Which is the whole point.) Switching windows and squinting at spacing values is exactly the kind of thing I want to outsource to something that doesn’t mind doing it. I’m not on this earth to become a Figma layers investigator expert! Arguably, again, I would learn things from doing this too all manually. But I’m a software engineer; preferably dealing with code, not with clicking through Figma layers.

And, yes, occasionally – very occasionally – I’ll run Claude over a chunk of code before or after I do my own review. It catches things sometimes. And sometimes it’s just completely off. YMMV.

But I don’t want it to write my code, solve my problems, invent my algorithms or design my architecture!

Bottom line: I want the common denominator for all my LLM usage to be that it frees up more time for me to write code and do engineering, not to outsource those very things.

OK, back to the point: whatever your LLM usage scope may be, please pause for a moment with me and think about how you use it.

Why Claude needs a box Link to heading

In case you forgot, Claude Code (and all its relatives, whether named after constellations, animals, or initials prefixed with conversation synonyms) runs shell commands. It reads your environment variables(!), your filesystem, your git config – credentials baked in. And each command informs the next; you give it a goal, not a script, and it figures out the steps. The Railway agent had a goal, found a token, and acted on it. Confidently. Without checking what it was actually deleting.

Replit did something similar last summer: AI agent, active code freeze, production database gone, 1,200+ companies affected. The agent called it “a catastrophic error in judgment.” There’s also a Claude Code GitHub issue where someone’s agent ran git reset --hard, fetched stale data from remote, and silently overwrote eight hours of work. No prompt or warning, just that telltale LLM self-confidence.

I won’t even mention how 29 million secrets were leaked in public GitHub commits in 2025 – up 34% year over year, with AI tools ingesting .env files for context flagged as a significant driver. Won’t mention at all.

Bottom line: Don’t be stupid, just do this Link to heading

What you need is to simply use Docker’s sbx (brew install docker/tap/sbx):

sbx run claude

The sbx docs cover the setup, but TL;DR by default this spawns a safe sandbox that can’t git push or read files outside of your project. What an extreme improvement right from the start that is!

michael-scott-prison-mike

And get this: inside the sandbox (/prison), you can actually just let it run without that annoying halt asking for permission to cat a file or whatever. Claude Code auto-approves everything by default – full kamikaze mode with no confirmation prompts. On my host machine that would be terrifying (I mean, even without the dangerous flags it does crazy stuff!). Inside sbx it’s fine, because it has neither my git credentials nor any path to anything outside my working directory. Worst case something goes sideways, I close it and git stash. Containable blast radius: √.

In other words: Sandboxing makes it faster, not just safer. Took me a while to realize that.

(Btw, someone replied that Claude Code has its own sandbox mode that we should rather use. Allow me to simply answer with this wonderful screenshot from a fellow Elm developer: Claude’s “sandbox”

Suffice it to say I prefer the real sandbox.)

The same nine seconds that the rogue agent spent blowing up a production db would be better spent on doing cp -r ~/.claude/skills .claude/skills && sbx run claude instead of claude. And since you’re a developer, I bet you could (without asking an LLM for help) find some way to alias it too.