Building AI-Tailored Document Generation (React Edition)

A practical, cost-optimized guide for stable generation of templated documents using Large Language Models

Intro

If you need to generate documents with an AI assistant but have to limit data variations, strictly follow a design template, and the prompt "Hey LLMBro, generate me a PDF with an offer for a client, no slop pls" doesn't quite cut it, then here's what worked for my case and might work for you too.

But first, here are the specific constraints I had to deal with:

  1. Follow a design template
  2. Support multiple formats (PDF and HTML to start)
  3. Render in different environments: browser, server, and email
  4. Operate with defined facts and numbers

The core premise: generation structure must be handled by code. The LLM's job is to analyze user input and call deterministic tools (methods) to fine-tune the document for a given case. This keeps output stable and avoids burning tokens on document body generation.

Why Not Just Prompt Better?

Large language models tend to drift during long conversations, especially after the summarization step, so even strict instructions can get left behind. In short, there's no stability in factual output. On top of that, generating a templated document with AI is not cost efficient. A decent portion of tokens will be spent trying to replicate a design that already exists in your codebase.

Base Use Case

The user selects a document, provides client-specific details, and the AI assistant tailors the content. The user then gets options to send it as an email or download it as a PDF.

There are other scenarios where a document needs to be embedded on a webpage and be AEO/GEO/SEO-friendly. We'll keep that in mind but focus on the base case for now.

Talking Solutions

One of the trickiest parts of this challenge is multi-format rendering. There are a lot of great tools that can convert PDF to HTML to React and vice versa, but conversions come at the cost of visual artifacts and broken sizes and layouts.

The more reliable approach is to generate the target format directly, without a middleman. The pipeline looks like this:

Data -> AI Tailoring -> Template -> Format Rendering

Template

The amount of templating options is overwhelming. But at a high level, you're choosing between an AST, a Virtual DOM, or a template engine.

AST (recommended)[Virtual DOM-like][snabbdom] ([React][react])[Template engine][awesome-te]
OutputPlain data treeDiffable node treeRendered string
Render-agnosticYes, one tree, many renderersTied to its diffing runtimeTied to one output format
LLM-friendlyEasy to validate and generateHard, needs framework primitivesLoose, strings are hard to validate
Dynamic UINot the goal, fine for static docsBuilt for itLimited, usually re-renders the whole string
Bundle sizeMinimal, just objectsHeavier runtimeLightweight at runtime
Best forStatic, multi-target documentsInteractive appsSingle-target HTML/email

Given there are no requirements for dynamic templates (interactivity, conditional rendering, etc.), the AST is a great candidate. It's lightweight and render-agnostic at the same time.

In practice, I have a list of simple functions that produce AST nodes, so the template looks like this:

const template = page([
  h1('Hi there, this is a DSL template'),
  p('Lightweight, simple and somewhat readable'),
  // ...
]);

Which under the hood resolves into a plain object:

const template = {
  type: "Page",
  elements: [
    { type: "Text", style: styles.h1, elements: "Hi there, this is a DSL template" },
    { type: "Text", style: styles.p, elements: "Lightweight, simple and somewhat readable" },
    // ...
  ]
}

Rendering

Working with React makes it a natural render engine for CSR, SSR, and email. For PDF, I ended up using React-PDF. It lets you use JSX-like syntax to construct and render PDF documents, and having the same mental model for PDFs as for React components makes the DX noticeably nicer.

const MyDocument = () => (
  <Document>
    <Page size="A4" style={styles.page}>
      <View style={styles.section}>
        <Text>Section #1</Text>
      </View>
      <View style={styles.section}>
        <Text>Section #2</Text>
      </View>
    </Page>
  </Document>
);

It also ships a <PDFDownloadLink /> component, meaning you can download the doc directly from browser memory, making storage completely optional.

Styling

Because of the less dynamic nature of PDF/Word documents, the set of available styles is quite limited. At least we get display: flex, which is already more than you'd expect (though it's a subset).

Fun fact: @react-pdf/renderer supports rem but not em, and its default font size is significantly larger than the 16px browsers use. So characters that look fine in PDF can be barely visible in the React version. Relative units are technically available, but pixels are definitely safer.

Putting It Together

const LeafletDocument = ({ data }) => {
  // Fill template with data
  const templateJSON = buildTemplate(data);

  // Map AST elements to components
  const ReactElementsMap = {
    Image: (props) => <img {...props} />,
    h1: (props) => <h1 {...props} />,
    // ...
  };

  // Build the template
  const WebDocument = (
    <DocumentBuilder template={templateJSON} elements={ReactElementsMap} />
  );

  // Or swap the map for PDF:
  // const PDFDocument = <DocumentBuilder template={templateJSON} elements={PDFElementsMap} />;

  // Or Word via https://github.com/nitin42/redocx:
  // const WordDocument = <DocumentBuilder template={templateJSON} elements={WordElementsMap} />;

  return WebDocument;
};

The DocumentBuilder component recursively renders the template:

const Elements = ({ elements, components }) => {
  return (
    <>
      {elements?.map((itemProps, index) =>
        typeof itemProps === "string" ? (
          <Fragment key={index}>
            {parseHTMLTags(itemProps, parseHTMLTagsOptions(components)) || ""}
          </Fragment>
        ) : (
          <Element key={index} {...itemProps} components={components} />
        )
      )}
    </>
  );
};

const Element = ({ elements, components, ...props }) => {
  const ElementComponent = components[props.type || DocumentElementType.View];
  return (
    <ElementComponent {...props}>
      <Elements elements={elements} components={components} />
    </ElementComponent>
  );
};

export const DocumentBuilder = ({ data, components }) => {
  return (
    <components.Document style={{ fontSize: 8 }}>
      <Elements elements={data?.elements} components={components} />
    </components.Document>
  );
};

AI Tailor

Regardless of your AI strategy (LLM chat or MCP), the high-level approach is the same. Build a tool that will:

  1. Provide a dataset of possible values the AI can rely on
  2. Use a prompt that matches the tailoring input with existing data
  3. Validate the AI response

Here's a simplified example using TanStack AI:

import { toolDefinition } from "@tanstack/ai";
import type { JSONSchema } from "@tanstack/ai";

const inputSchema: JSONSchema = {
  type: "object",
  properties: {
    prospectDetails: {
      type: "string",
      description: "The prospect's company details (name, industry, size, etc.)",
    },
  },
  required: ["prospectDetails"],
};

const outputSchema: JSONSchema = {
  type: "object",
  properties: {
    differentiators: {
      type: "array",
      items: {
        type: "object",
        properties: {
          id: { type: "string" },
          headline: { type: "string" },
          proofPoint: { type: "string" },
          matchedPriority: { type: "string" },
        },
        required: ["id", "headline", "proofPoint", "matchedPriority"],
      },
    },
  },
  required: ["differentiators"],
};

const tailorLeafletDef = toolDefinition({
  name: "tailor_leaflet",
  description: "Tailor the leaflet to the prospect's company details",
  inputSchema,
  outputSchema,
});

const tailorLeaflet = tailorLeafletDef.server(
  async ({ differentiators }, context) => {
    const genai = new GoogleGenAI({ ... });
    const leafletOptions = await getLeafletOptions();

    // Note: we're injecting leafletOptions directly into the prompt.
    // If the object grows too large, consider RAG-ifying it to avoid bloating the context.
    const prompt = `
      You are writing a sales leaflet. Given the following leaflet options and client
      differentiators, produce tailored leaflet content in JSON format.

      <LeafletOptions>
        ${leafletOptions}
      </LeafletOptions>

      <Differentiators>
        ${differentiators}
      </Differentiators>

      Respond with a valid JSON object matching this exact shape:
      <Schema>
        ${JSON.stringify(leafletJSONSchema)}
      </Schema>

      Return ONLY the JSON, no markdown, no explanation.
    `;

    const response = await genai.models.generateContent({
      model: CHAT_MODEL,
      contents: prompt,
    });

    try {
      const parsed = validateResponse(response.text);
      return parsed;
    } catch (error) {
      console.error("Error parsing response:", error);
      return null;
    }
  }
);

And once we have the tailored data, we can render it on the client (or wherever it's needed).

Validating the Response

No matter how precise the prompt is, I highly recommend safeguarding the response. For the same reason we didn't just prompt better: there's no guarantee the model will respond exactly as instructed.

In my case, I wrote a custom check that validates the options suggested by AI against the original data (e.g. leafletOptions). If you want more flexibility, an extra call to another LLM as a judge is a solid alternative. And either way, making sure a human reads the doc before it goes anywhere (human in the loop) is always a good idea.

Summary

The approach described here keeps generated documents stable and on-spec, and it might even save some tokens along the way.

It's all trade-offs, of course. This solution comes with some development overhead and will need ongoing support. But that seems like a reasonable price for the certainty.

πŸ‡ΊπŸ‡¦ in πŸ‡¨πŸ‡¦