Unleashing Agentic Coding Tools

Run coding agents in Docker Sandboxes to reduce approvals, improve reproducibility, and contain full-access code generation.

ai-cli-sandbox-levchenkod.com@2x.png

Intro

Over the last few years, we have seen an immense boom in agentic coding tools, and while the applicability is often clear, workflow-wise there are different ways and flavours to do the job. At a high level, we’re talking about a trade-off among efficiency, effectiveness, autonomous vs. interactive ways to generate code, and, of course, security.

In this article I’ll focus on how to securely improve the efficiency of autonomous coding tools, like Codex*. That works as well for small-to-medium teams as for individuals.

*in the examples I’ll use Codex, but the same approach works for Claude Code, Gemini, OpenCode and other interchangeable agentic CLI tools. The only important detail is that you might need to change tool-specific flags and params.

Problem

While CLI tools are leaning towards the autonomous side of the spectrum, by default they still require a lot of short-lived interactions for you as a user during the generative session - approving script runs, file reads, env reads(…yeah), you name it.

One way to solve it is to use tool settings: update permissions, yolo mode (danger-full-access), a sandbox, or remote execution. If you are a user of the enterprise package, most of that is likely already defined for you by the admin.

The compromises here are that it’s

A) less convenient to transfer and maintain permissions across vendors. With the industry moving that fast, it’s a good strategy to be open to new tooling

B) you have to trust that the tool will respect the boundaries and permissions

Solution

Another, more flexible way is to constrain agentic CLI tools at the OS level. By running Codex or Claude in an isolated Docker container/microVM(Virtual Machine), you get

  • a more contained environment to run the tool in full access mode
  • fewer hiccups with permission requests
  • reproducibility across machines
  • flexibility to swap the tool without affecting existing workflows that much

Based on your goals, there are different levels of how you can adopt this approach. I’ll use sbx https://docs.docker.com/ai/sandboxes/ as it is specifically designed for such use cases.

Docker Sandboxes run AI coding agents in isolated microVM sandboxes

To set it up, simply run

brew install docker/tap/sbx
sbx login

Docker Templates

Docker offers a list of maintained sandbox templates https://docs.docker.com/ai/sandboxes/customize/templates/, which is good enough for basic tasks

Here's an example for running Codex

sbx run codex --template docker.io/docker/sandbox-templates:codex

For alternative tools, the idea is the same, but the template must match the tool.

sbx run claude --template docker.io/docker/sandbox-templates:claude-code

That command will create a workspace sandbox and start an interactive CLI session, and to run it autonomously, add the exec command

sbx run codex --template docker.io/docker/sandbox-templates:codex -- exec "create google clone, no mistakes"

Custom Templates

Docker templates are basically container images used as sandbox templates, meaning that to execute additional libraries or tools, your agent will need access to them, and in yolo mode it will most likely just go and install them. That’s effective - it doesn’t bother you, but not efficient - token burn rate may skyrocket.

That can be avoided with custom containers-templates, that have all the libs and tools. Extra perk - you can inject a reusable system prompt/config in the script itself, or preinstall tools that you expect the agent to use often.

One way to do it - assuming the agent installed everything itself - is to, right after the sbx session ends, call the sbx template save command

sbx template save workspace-sandbox-name new-template-name:v1

Important: do not save/publish templates from sandboxes where the agent could have handled secrets, logged tokens, cloned private repos with credentials, or written auth config into the filesystem. Saving the template captures the filesystem state.

But to make it reusable, we’ll have to create a new Dockerfile. Here’s a step-by-step guide for a FastAPI + React monorepo template (pnpm, Vite, Node.js, Python, Playwright, and Poetry):

FROM docker.io/docker/sandbox-templates:codex

LABEL maintainer="levchenkod.com" \
    description="Sandbox template for Codex and Playwright, with pinned Node.js, Python, Playwright, pnpm, and Poetry"

ENV POETRY_HOME=/opt/poetry \
    PLAYWRIGHT_BROWSERS_PATH=/ms-playwright 
    
USER root

ENV PNPM_STORE_PATH=/home/agent/.local/share/pnpm/store
ENV DEBIAN_FRONTEND=noninteractive
ENV NPM_CONFIG_PREFIX=
ENV npm_config_prefix=
ENV PNPM_HOME=/home/agent/.local/share/pnpm
ENV PATH=/home/agent/.local/bin:/home/agent/.local/share/pnpm:${PATH}

ARG NODEJS_APT_VERSION=
ARG NPM_APT_VERSION=
ARG PYTHON3_APT_VERSION=
ARG PYTHON3_PIP_APT_VERSION=
ARG PNPM_VERSION=10.24.0
ARG TYPESCRIPT_VERSION=5.4.5
ARG VITE_VERSION=5.2.11

RUN apt-get update \
    && apt-get install -y --no-install-recommends \
        ca-certificates \
        curl \
        nodejs${NODEJS_APT_VERSION:+=${NODEJS_APT_VERSION}} \
        npm${NPM_APT_VERSION:+=${NPM_APT_VERSION}} \
        python-is-python3 \
        python3${PYTHON3_APT_VERSION:+=${PYTHON3_APT_VERSION}} \
        python3-pip${PYTHON3_PIP_APT_VERSION:+=${PYTHON3_PIP_APT_VERSION}} \
        sudo \
        tini \
    && rm -rf /var/lib/apt/lists/*

RUN mkdir -p /ms-playwright /home/agent/.local/bin /home/agent/.local/share/pnpm/store \
    && chown -R agent:agent /ms-playwright /home/agent/.local

USER agent
SHELL ["/bin/bash", "-lc"]

# pnpm, Vite, and TypeScript as pinned global CLIs.
RUN unset NPM_CONFIG_PREFIX npm_config_prefix \
    && npm --prefix /home/agent/.local install -g "pnpm@${PNPM_VERSION}" "vite@${VITE_VERSION}" "typescript@${TYPESCRIPT_VERSION}" \
    && /home/agent/.local/bin/pnpm config set global-bin-dir "${PNPM_HOME}" \
    && node --version \
    && npm --version \
    && pnpm --version \
    && vite --version \
    && tsc --version \
    && python --version

COPY --chown=agent:agent web/package.json web/pnpm-lock.yaml /tmp/codex-pp-web/

RUN cd /tmp/codex-pp-web \
    && pnpm fetch --frozen-lockfile --store-dir "${PNPM_STORE_PATH}" \
    && rm -rf /tmp/codex-pp-web

ARG PLAYWRIGHT_VERSION=1.60.0
ARG PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=
ARG TARGETARCH

# Playwright package plus its matching Chromium.
RUN python -m pip install --user --break-system-packages "playwright==${PLAYWRIGHT_VERSION}" \
    && if [[ -z "${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" ]]; then \
        case "${TARGETARCH}" in \
            amd64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-x64 ;; \
            arm64) PLAYWRIGHT_HOST_PLATFORM_OVERRIDE=ubuntu24.04-arm64 ;; \
            *) echo "Unsupported TARGETARCH for Playwright: ${TARGETARCH}" >&2; exit 1 ;; \
        esac; \
    fi \
    && PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install-deps chromium \
    && PLAYWRIGHT_HOST_PLATFORM_OVERRIDE="${PLAYWRIGHT_HOST_PLATFORM_OVERRIDE}" python -m playwright install chromium \
    && touch /ms-playwright/.system-deps-installed

WORKDIR /workspace

ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["sleep", "infinity"]

Build and publish the image

docker buildx build \
  --platform linux/arm64 \                     
  --push \
  --provenance=false \
  -t lapps/codex-playwright:0.1.0 \
  -f ./Dockerfile.codex-pp .

Or save it locally as tar

docker image save lapps/codex-playwright:0.1.0 -o codex-pp.tar

If you use a local tar, load it into sbx

sbx template load codex-playwright.tar

Create a new workspace using your template

sbx create --name codex-playwright --template docker.io/lapps/codex-pp:0.1.5 codex .

For the context - in my system prompts I like to define that after a task is completed the e2e video proof must be provided, so I can validate the behaviour even before reviewing the code. And Playwright here does the heavy lifting.

To test Playwright, I created a smoke test:

import { expect, test } from "@playwright/test";

test("records video for a trivial browser page", async ({ page }) => {
  await page.setContent("<main><h1>Playwright video smoke</h1></main>");

  await expect(
    page.getByRole("heading", { name: "Playwright video smoke" }),
  ).toBeVisible();
});

And then run

sbx run codex-playwright -- exec "run playwright video smoke spec"

Which will result in a new video file

./test-results/playwright-video-smoke-rec-6c08a--for-a-trivial-browser-page-chromium/video.webm

Outcome

With a few simple steps, we get a reliable, reproducible and more contained way to let generative models do whatever they do best - generate code changes, without stopping to ask their human for permission. Also, we can give the tool more freedom within the sandbox while keeping the host machine, credentials, and network access strictly constrained.


Bonus - example use cases

Running agentic loops

One way to use it efficiently is to ralph-loop (short, bounded, agentic loops) the sandbox call to keep the context lean, eg

# ralph.sh
#!/bin/bash

iterations=5
sandbox_name="ralph-env"
resultfile="output.txt"
effort="medium"
agent_message="$1"

if [ -z "$agent_message" ]; then
    echo "Usage: ./ralph.sh \"message for agent\""
    exit 1
fi

run_codex() {
    sbx run "$sandbox_name" -- --dangerously-bypass-approvals-and-sandbox exec \
        -c "model_reasoning_effort=\"$effort\"" \
        -o "$resultfile" "$1"
}

for ((i=1; i<=iterations; i++)); do
    echo "Running iteration $i..."

    previous_context="$(cat "$resultfile" 2>/dev/null)"

    run_codex "$agent_message

Current iteration: $i.
Previous context:
$previous_context"

    exit_code=$?

    case $exit_code in
        2) echo "Target reached."; exit 0 ;;
        0) continue ;;
        *) echo "Error in iteration $i"; exit "$exit_code" ;;
    esac
done

And call it like

chmod +x ralph.sh
./ralph.sh "hey ai"

Or use an existing one: https://github.com/snarktank/ralph

Create a skill

Let your agentic tool know about the sandbox and when to use it, so it can act as an orchestrator when working on complex development tasks

https://gist.github.com/levchenkod/a92f58b58c32c531528a709913b2506b

Ask your AI about my services

🇺🇦 в 🇨🇦