Back to Blog

Montage Turns LLM Intent Into Deterministic Server-Rendered Interfaces

We have spent the last three years watching product managers try to force every software interaction into a chat window. It is exhausting. Typing a paragraph to sort a table is not innovation; it is a regression to the command line, just with worse latency and a higher AWS bill. Jakob Nielsen correctly identified intent‑based outcome specification as the third major user interface shift. We are moving from clicking rigid buttons to conversing with systems. But the execution has been abysmal. Most engineering teams approach this by having an LLM spit out raw HTML, or worse, hallucinate React components on the fly and execute them directly in the user's browser. It is a security nightmare, a state-management dumpster fire, and the user experience is universally jarring. You do not want your interface to be creative. You want it to be deterministic. Enter Montage. Montage is not a library; it is an architectural pattern. It forces the LLM out of the presentation layer and restricts it entirely to the control plane. The model determines the *intent*, outputs a strictly typed data structure, and your server renders the exact, deterministic UI components associated with that intent. Here is exactly how you build it, why client-side generation is a dead end, and how to tune your infrastructure so the whole pipeline runs in under 400 milliseconds. ## The Illusion of Client-Side UI Generation Let us talk about the "Renderify" approach that polluted developer blogs throughout 2025. The premise sounds magical: the LLM understands what the user wants and writes the UI code for it instantly, executing it in the client browser. Do not put this in production. Client-side LLM UI generation breaks almost every rule of reliable software engineering. When you allow a non-deterministic statistical model to write your presentation layer at runtime, you surrender control over accessibility, branding, and security. What happens when the model hallucinates a prop? What happens when it forgets to bind an `onClick` handler to your global state context? The component silently fails, the user is stranded, and your error tracking logs fill with stack traces that point to code that never existed in your repository. You are effectively running `eval()` on an output stream from a hallucination engine. ## The Montage Architecture: Separation of Concerns Montage fixes this by returning to basic MVC principles. The LLM is the controller. It does not touch the view. The user provides unstructured intent. The LLM translates that intent into a highly constrained, deterministic JSON Abstract Syntax Tree (AST). Your backend server reads that AST, maps it to pre-built, strictly tested UI components, and streams the rendered HTML (or serialized component state) down to the client. ### Step 1: The Intent Parser Your prompt engineering must shift from "write a component" to "extract the parameters for this specific schema." You are not asking the model to be a frontend developer; you are asking it to be a JSON serializer. ```typescript // Define the exact schema the LLM is allowed to use const montageSchema = { "name": "render_dashboard_widget", "description": "Renders a specific data visualization widget", "parameters": { "type": "object", "properties": { "widget_type": { "type": "string", "enum": ["line_chart", "bar_chart", "data_table", "metric_card"] }, "data_source_id": { "type": "string", "description": "The UUID of the database view" }, "date_range": { "type": "string", "enum": ["7d", "30d", "90d", "ytd"] } }, "required": ["widget_type", "data_source_id", "date_range"] } } ``` ### Step 2: Forcing Determinism at the Sampler Level If your tool-use chain fails, your UI fails. Determinism is not optional here. The architectural samplers on your inference server dictate the reliability of the Montage pattern. If you leave `temperature` at 0.7 for an intent-parsing task, you are playing Russian roulette with your interface. When hitting your model (whether via an API gateway or your own bare-metal vLLM instance), lock the samplers down. ```json { "model": "mistral-large-2407", "messages": [{"role": "user", "content": "Show me our revenue for the last month."}], "tools": [{"type": "function", "function": montageSchema}], "tool_choice": {"type": "function", "function": {"name": "render_dashboard_widget"}}, "temperature": 0.0, "top_p": 0.1, "top_k": 1 } ``` By forcing `temperature: 0.0` and aggressively clamping `top_p` and `top_k`, you strip away the model's creative variance. It will pick the most statistically probable token every single time. It becomes a reliable state machine. ### Step 3: Server-Side Rendering the Intent Once the backend receives the structured JSON from the LLM, the AI portion of the transaction is over. You are back in standard software engineering territory. Your server takes the validated JSON payload and mounts the corresponding React server component or HTML fragment. ```tsx // Server-side rendering logic import { LineChart, MetricCard, DataTable } from '@/components/widgets'; export async function resolveMontageIntent(llmOutput) { const { widget_type, data_source_id, date_range } = llmOutput.arguments; // Fetch actual deterministic data from your primary database const widgetData = await db.query(data_source_id, date_range); // Map the strict intent to a pre-built, safe component switch (widget_type) { case 'line_chart': return <LineChart data={widgetData} range={date_range} />; case 'metric_card': return <MetricCard data={widgetData} />; case 'data_table': return <DataTable data={widgetData} pagination={true} />; default: throw new Error(`Unsupported widget type requested: ${widget_type}`); } } ``` The components being rendered are pre-compiled, type-checked, and rigorously tested. They adhere to your design system. They cannot hallucinate non-existent CSS classes or leak user session tokens. ## Streaming the DOM: Beyond the Loading Spinner The biggest friction point in LLM-driven interfaces is the TTFB (Time to First Byte). Users will tolerate a 3-second wait for a complex text generation, but they will bounce immediately if an interface takes 3 seconds to render after a click. To make the Montage architecture feel like a native application, you must master Server-Sent Events (SSE) and chunked transfer encoding. Do not wait for the entire LLM generation and database query sequence to finish before updating the DOM. Stream the UI state transitions. ### Building the SSE Pipeline WebSockets are overkill for this. You do not need bidirectional binary framing; you need a unidirectional pipe to push DOM updates from the server to the client. SSE is native to the browser, handles reconnections automatically, and works flawlessly over HTTP/2. Here is the Node.js implementation for a Montage streaming endpoint: ```javascript // Express route for streaming UI updates app.post('/api/intent/stream', async (req, res) => { const userIntent = req.body.prompt; // Set headers for SSE res.writeHead(200, { 'Content-Type': 'text/event-stream', 'Cache-Control': 'no-cache', 'Connection': 'keep-alive', }); // Push immediate loading state UI res.write(`data: ${JSON.stringify({ action: 'MOUNT', componentId: 'temp-loader', html: '<div class="skeleton-loader">Analyzing intent...</div>' })}\n\n`); try { // 1. Call the LLM (Wait for JSON tool call) const intentStart = performance.now(); const llmToolCall = await callLlmWithSchema(userIntent); // Update UI to show data fetching state res.write(`data: ${JSON.stringify({ action: 'UPDATE', componentId: 'temp-loader', html: `<div class="skeleton-loader">Fetching data for ${llmToolCall.widget_type}...</div>` })}\n\n`); // 2. Fetch DB data and render component const renderedHtml = await resolveMontageIntent(llmToolCall); // 3. Push final rendered UI res.write(`data: ${JSON.stringify({ action: 'REPLACE', componentId: 'temp-loader', html: renderedHtml })}\n\n`); } catch (error) { res.write(`data: ${JSON.stringify({ action: 'ERROR', message: 'Failed to process intent. Fallback to manual selection.' })}\n\n`); } finally { res.write('event: close\ndata: {}\n\n'); res.end(); } }); ``` On the frontend, you simply consume this stream and patch the DOM. No complex state management, no bloated client-side LLM libraries. Just a clean `EventSource` connection applying server-dictated UI transitions. ## Infrastructure: Eradicating Inference Latency A beautiful architecture means nothing if your inference server is choking. Deployment architecture determines cost and latency. If you are reliant on public API endpoints, you will always be at the mercy of their noisy neighbor problems. For enterprise production, running local LLMs is mandatory. You must move beyond simple `ollama run` setups. If you want your Montage implementation to feel instantaneous, you need to architect for high-concurrency throughput. ### The vLLM Configuration You should be running vLLM or TensorRT-LLM on dedicated GPU hardware. The difference between a badly configured inference server and an optimized one is the difference between 3,500ms and 350ms response times. KV cache warmth is your primary weapon. The KV (Key-Value) cache stores the attention tensors for previously processed tokens. If your system prompt and system schema are 2,000 tokens long, recomputing them for every user request will destroy your latency budget. Configure vLLM to utilize prefix caching aggressively. ```bash # Production vLLM startup command for Montage architecture python3 -m vllm.entrypoints.openai.api_server \ --model mistralai/Mistral-Nemo-Instruct-2407 \ --gpu-memory-utilization 0.90 \ --max-num-batched-tokens 8192 \ --enable-prefix-caching \ --tensor-parallel-size 2 \ --enforce-eager ``` By enabling `--enable-prefix-caching`, the server keeps the massive system prompt and Montage schema tensors hot in VRAM. When a user sends a 10-token request, the engine only computes attention for those 10 tokens against the cached history. This drops your time-to-first-token (TTFT) from 1200ms to sub-100ms. ### Hardware Allocation Do not skimp on PCIe bandwidth. If you are splitting inference across multiple GPUs (Tensor Parallelism), the interconnect speed dictates your latency floor. Running dual RTX 4090s or A6000s on a motherboard with PCIe Gen 4 x8 lanes will bottleneck your P2P communication. You need full x16 lanes or NVLink bridges to prevent synchronization latency between the cards. ## Comparison: Architecture Paradigms Understanding where Montage sits in the ecosystem requires looking at the alternatives. | Feature | Traditional SPA (React/Vue) | Client-Side Gen (Renderify) | Montage (Server-Rendered Intent) | | :--- | :--- | :--- | :--- | | **Input Paradigm** | Imperative (Clicks) | Declarative (Natural Language) | Declarative (Natural Language) | | **UI Reliability** | 100% Deterministic | Chaotic / Hallucination-prone | 100% Deterministic | | **Security Posture** | High (Pre-compiled) | Very Low (Runtime Eval) | High (Pre-compiled components) | | **Payload Size** | Massive (Thick Client) | Large (LLM engine + AST parser) | Minimal (Streamed HTML/JSON) | | **State Management** | Complex Client Stores | Unmanageable | Server-Authoritative | | **Latency (TTFB)** | ~50ms | ~2500ms+ (Waiting for code gen) | ~350ms (With prefix caching) | ## Practical Takeaways Stop letting statistical models write your frontend code. Use them for what they are actually good at: parsing messy human input into structured data. 1. **Strip the LLM out of the View layer.** Force it to return strictly typed JSON schema parameters. Your server should always own the actual rendering logic. 2. **Clamp your samplers.** Set `temperature` to 0.0 and `top_k` to 1. Creative variance has no place in UI state routing. 3. **Stream state transitions via SSE.** Do not make the user wait for the entire backend sequence to finish. Push intermediate loading states and skeleton UI components over Server-Sent Events while the LLM parses the intent. 4. **Host your own inference for the control plane.** Public APIs have unpredictable latency spikes. Use vLLM with `--enable-prefix-caching` to keep your system prompts hot in VRAM, guaranteeing sub-100ms Time-To-First-Token. 5. **Build a component registry.** The Montage pattern only works if your backend has a robust library of highly generic, heavily tested UI components ready to be mapped to the LLM's structured output. Invest your engineering cycles here, not in prompt-engineering better React components.