Unleashing Agentic Coding Tools
Run coding agents in Docker Sandboxes to reduce approvals, improve reproducibility, and contain full-access code generation.
Intro
Over the last few years, we have seen an immense boom in agentic coding tools, and while the applicability is often clear, workflow-wise there are different ways and flavours to do the job. At a high level, we’re talking about a trade-off among efficiency, effectiveness, autonomous vs. interactive ways to generate code, and, of course, security.
In this article I’ll focus on how to securely improve the efficiency of autonomous coding tools, like Codex*. That works as well for small-to-medium teams as for individuals.
*in the examples I’ll use Codex, but the same approach works for Claude Code, Gemini, OpenCode and other interchangeable agentic CLI tools. The only important detail is that you might need to change tool-specific flags and params.
Problem
While CLI tools are leaning towards the autonomous side of the spectrum, by default they still require a lot of short-lived interactions for you as a user during the generative session - approving script runs, file reads, env reads(…yeah), you name it.
One way to solve it is to use tool settings: update permissions, yolo mode (danger-full-access), a sandbox, or remote execution. If you are a user of the enterprise package, most of that is likely already defined for you by the admin.
The compromises here are that it’s
A) less convenient to transfer and maintain permissions across vendors. With the industry moving that fast, it’s a good strategy to be open to new tooling
B) you have to trust that the tool will respect the boundaries and permissions
Solution
Another, more flexible way is to constrain agentic CLI tools at the OS level. By running Codex or Claude in an isolated Docker container/microVM(Virtual Machine), you get
- a more contained environment to run the tool in full access mode
- fewer hiccups with permission requests
- reproducibility across machines
- flexibility to swap the tool without affecting existing workflows that much
Based on your goals, there are different levels of how you can adopt this approach. I’ll use sbx https://docs.docker.com/ai/sandboxes/ as it is specifically designed for such use cases.
Docker Sandboxes run AI coding agents in isolated microVM sandboxes
To set it up, simply run
brew install docker/tap/sbx
sbx login
Docker Templates
Docker offers a list of maintained sandbox templates https://docs.docker.com/ai/sandboxes/customize/templates/, which is good enough for basic tasks
Here's an example for running Codex
sbx run codex --template docker.io/docker/sandbox-templates:codex
For alternative tools, the idea is the same, but the template must match the tool.
sbx run claude --template docker.io/docker/sandbox-templates:claude-code
That command will create a workspace sandbox and start an interactive CLI session, and to run it autonomously, add the exec command
sbx run codex --template docker.io/docker/sandbox-templates:codex -- exec "create google clone, no mistakes"
Custom Templates
Docker templates are basically container images used as sandbox templates, meaning that to execute additional libraries or tools, your agent will need access to them, and in yolo mode it will most likely just go and install them. That’s effective - it doesn’t bother you, but not efficient - token burn rate may skyrocket.
That can be avoided with custom containers-templates, that have all the libs and tools. Extra perk - you can inject a reusable system prompt/config in the script itself, or preinstall tools that you expect the agent to use often.
One way to do it - assuming the agent installed everything itself - is to, right after the sbx session ends, call the sbx template save command
sbx template save workspace-sandbox-name new-template-name:v1
Important: do not save/publish templates from sandboxes where the agent could have handled secrets, logged tokens, cloned private repos with credentials, or written auth config into the filesystem. Saving the template captures the filesystem state.
But to make it reusable, we’ll have to create a new Dockerfile. Here’s a step-by-step guide for a FastAPI + React monorepo template (pnpm, Vite, Node.js, Python, Playwright, and Poetry):
FROM docker.io/docker/sandbox-templates:codex
LABEL maintainer="levchenkod.com" \
description="Sandbox template for Codex and Playwright, with pinned Node.js, Python, Playwright, pnpm, and Poetry"
ENV POETRY_HOME=/opt/poetry \
PLAYWRIGHT_BROWSERS_PATH=/ms-playwright
USER root
ENV PNPM_STORE_PATH=/home/agent/.local/share/pnpm/store
ENV DEBIAN_FRONTEND=noninteractive
ENV NPM_CONFIG_PREFIX=
ENV npm_config_prefix=
ENV PNPM_HOME=/home/agent/.local/share/pnpm
ENV PATH=/home/agent/.local/bin:/home/agent/.local/share/pnpm:${PATH}
ARG NODEJS_APT_VERSION=
ARG NPM_APT_VERSION=
ARG PYTHON3_APT_VERSION=
ARG PYTHON3_PIP_APT_VERSION=
ARG PNPM_VERSION=10.24.0
ARG TYPESCRIPT_VERSION=5.4.5
ARG VITE_VERSION=5.2.11
RUN apt-get update \
&& apt-get install -y --no-install-recommends \
ca-certificates \
curl \
nodejs${NODEJS_APT_VERSION:+=${NODEJS_APT_VERSION}} \
npm${NPM_APT_VERSION:+=${NPM_APT_VERSION}} \
python-is-python3 \
python3${PYTHON3_APT_VERSION:+=${PYTHON3_APT_VERSION}} \
python3-pip${PYTHON3_PIP_APT_VERSION:+=${PYTHON3_PIP_APT_VERSION}} \
sudo \
tini \
&& rm -rf /var/lib/apt/lists/*
RUN mkdir -p /ms-playwright /home/agent/.local/bin /home/agent/.local/share/pnpm/store \
&& chown -R agent:agent /ms-playwright /home/agent/.local
USER agent
SHELL ["/bin/bash", "-lc"]
# pnpm, Vite, and TypeScript as pinned global CLIs.
RUN unset NPM_CONFIG_PREFIX npm_config_prefix \
&& npm --prefix /home/agent/.local install -g "pnpm@${PNPM_VERSION}" "vite@${VITE_VERSION}" "typescript@${TYPESCRIPT_VERSION}" \
&& /home/agent/.local/bin/pnpm config set global-bin-dir "${PNPM_HOME}" \
&& node --version \
&& npm --version \
&& pnpm --version \
&& vite --version \
&& tsc --version \
&& python --version
COPY --chown=agent:agent web/package.json web/pnpm-lock.yaml /tmp/codex-playwright-web/
RUN cd /tmp/codex-playwright-web \
&& pnpm fetch --frozen-lockfile --store-dir "${PNPM_STORE_PATH}" \
&& rm -rf /tmp/codex-playwright-web
ARG PLAYWRIGHT_VERSION=1.60.0
ARG PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=
ARG TARGETARCH
# Playwright package plus its matching Chromium.
RUN python -m pip install --user --break-system-packages "playwright==${PLAYWRIGHT_VERSION}" \
&& if [[ -z "${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" ]]; then \
case "${TARGETARCH}" in \
amd64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64 ;; \
arm64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-arm64 ;; \
*) echo "Unsupported TARGETARCH for Playwright: ${TARGETARCH}" >&2; exit 1 ;; \
esac; \
fi \
&& PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install-deps chromium \
&& PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install chromium \
&& touch /ms-playwright/.system-deps-installed
WORKDIR /workspace
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["sleep", "infinity"]
Build and publish the image
docker buildx build \
--platform linux/arm64 \
--push \
--provenance=false \
-t lapps/codex-playwright:0.1.0 \
-f ./Dockerfile.codex-playwright .
Or save it locally as tar
docker image save lapps/codex-playwright:0.1.0 -o codex-playwright.tar
If you use a local tar, load it into sbx
sbx template load codex-playwright.tar
Create a new workspace using your template
sbx create --name codex-playwright --template docker.io/lapps/codex-playwright:0.1.0 codex .
For the context - in my system prompts I like to define that after a task is completed the e2e video proof must be provided, so I can validate the behaviour even before reviewing the code. And Playwright here does the heavy lifting.
To test Playwright, I created a smoke test:
import { expect, test } from "@playwright/test";
test("records video for a trivial browser page", async ({ page }) => {
await page.setContent("<main><h1>Playwright video smoke</h1></main>");
await expect(
page.getByRole("heading", { name: "Playwright video smoke" }),
).toBeVisible();
});
And then run
sbx run codex-playwright -- exec "run playwright video smoke spec"
Which will result in a new video file
./test-results/playwright-video-smoke-rec-6c08a--for-a-trivial-browser-page-chromium/video.webm
Outcome
With a few simple steps, we get a reliable, reproducible and more contained way to let generative models do whatever they do best - generate code changes, without stopping to ask their human for permission. Also, we can give the tool more freedom within the sandbox while keeping the host machine, credentials, and network access strictly constrained.
Bonus - example use cases
Running agentic loops
One way to use it efficiently is to ralph-loop (short, bounded, agentic loops) the sandbox call to keep the context lean, eg
# ralph.sh
#!/bin/bash
iterations=5
sandbox_name="ralph-env"
resultfile="output.txt"
effort="medium"
agent_message="$1"
if [ -z "$agent_message" ]; then
echo "Usage: ./ralph.sh \"message for agent\""
exit 1
fi
run_codex() {
sbx run "$sandbox_name" -- --dangerously-bypass-approvals-and-sandbox exec \
-c "model_reasoning_effort=\"$effort\"" \
-o "$resultfile" "$1"
}
for ((i=1; i<=iterations; i++)); do
echo "Running iteration $i..."
previous_context="$(cat "$resultfile" 2>/dev/null)"
run_codex "$agent_message
Current iteration: $i.
Previous context:
$previous_context"
exit_code=$?
case $exit_code in
2) echo "Target reached."; exit 0 ;;
0) continue ;;
*) echo "Error in iteration $i"; exit "$exit_code" ;;
esac
done
And call it like
chmod +x ralph.sh
./ralph.sh "hey ai"
Or use an existing one: https://github.com/snarktank/ralph
Create a skill
Let your agentic tool know about the sandbox and when to use it, so it can act as an orchestrator when working on complex development tasks
https://gist.github.com/levchenkod/a92f58b58c32c531528a709913b2506b