Simon Willison Releases LLM 0.32a0 Alpha with Major CLI Changes
Most of the AI tooling ecosystem is a dumpster fire of electron apps, VC-funded wrappers, and bloated web interfaces that consume 4GB of RAM just to POST a string to an endpoint. If you live in the terminal, this is offensive.
Enter Simon Willison’s `llm`.
Willison has been quietly building the most pragmatic suite of data tools on the internet (Datasette being the prime example). His `llm` utility brought the Unix philosophy to language models: text goes in, tokens come out, pipes connect the pieces.
This morning, Willison dropped `llm` 0.32a0. It is an alpha release, but it represents a massive, backward-compatible structural refactor of both the CLI and the underlying Python library. He explicitly targeted this release at making the tool work better with reasoning models and frontier capabilities.
If you are writing custom bash scripts to hit the Claude API, you are wasting your time. Here is what 0.32a0 changes, why the refactor was necessary, and how to integrate it into a workflow that doesn't suck.
## The Problem with "Frontier" Capabilities
Until late last year, the abstraction for interacting with an LLM was simple. You send a prompt, you receive a stream of text. Standard input to standard output.
Then reasoning models arrived. OpenAI dropped o1, Anthropic introduced extended thinking blocks, and local models like DeepSeek started spitting out massive `<think>` tags before answering. The old abstraction broke.
If you pipe a reasoning model into a standard terminal workflow today, your pipeline gets polluted with internal monologue. If you are feeding that output into `jq` or another script, the parser crashes.
Willison recognized that the underlying architecture of `llm` needed a rewrite to handle these asynchronous, multi-modal, thought-heavy models without breaking the hundreds of existing plugins already written for the tool.
## What 0.32a0 Actually Does
Version 0.32a0 is a refactor, not a feature-bloat update. It rewires the internals of how models are executed and how responses are yielded.
The old architecture assumed a relatively flat execution cycle. The new architecture introduces deeper hooks for response metadata, streaming chunks that can be typed (e.g., separating "thought" chunks from "content" chunks), and better session handling for models that require tool-use loops.
Because it is strictly backwards-compatible, your old scripts will not break. If you have an existing cron job piping log files into `llm -m claude-3-haiku`, it keeps running. But under the hood, the foundation is now ready for models that think for five minutes before outputting a single character.
### Terminal Tooling Done Right
The beauty of `llm` is that it respects standard input. Willison uses it specifically for tasks like summarizing massive Hacker News threads.
Here is what a modern terminal workflow looks like using the utility. First, you install it cleanly via `pipx` to avoid polluting your system Python environment.
```bash
pipx install llm
llm keys set openai
# Paste your key
```
Now, pull a Hacker News thread and summarize it. You don't need a heavy web crawler. You just need `curl`, `files-to-prompt` (another Willison special), or a basic text parser.
```bash
curl -s "https://news.ycombinator.com/item?id=40000000" | \
sed -e 's/<[^>]*>//g' | \
llm -m gpt-4o "Summarize the top three arguments in this thread. Be cynical."
```
With 0.32a0, when reasoning models are fully integrated, you will be able to pipe this to an `o1` class model and explicitly tell the CLI to swallow the reasoning tokens so your stdout remains clean for the next pipe.
## The Python Library Redux
`llm` is not just a CLI tool. It is a Python library. The 0.32a0 refactor fundamentally improves how developers interact with models in their own scripts.
Previously, handling asynchronous execution or fetching rich metadata from a completion request was clunky. You were often better off just using the official OpenAI or Anthropic SDKs if you needed deep control.
Willison’s refactor makes the `llm` Python API a viable abstraction layer over all providers. You don't have to rewrite your application when you switch from OpenAI to local Llama 3 running on Ollama.
Here is how the API looks in practice for a batch processing script:
```python
import llm
# The abstraction holds regardless of the model provider
model = llm.get_model("claude-3-5-sonnet")
model.key = model.get_key()
logs = [
"Error: Database connection timeout on port 5432",
"Warn: Memory usage exceeds 90% in container web_01",
"Info: User auth successful for id 8923"
]
for log in logs:
response = model.prompt(f"Classify this log severity and output ONLY JSON: {log}")
print(response.text())
```
The 0.32a0 changes mean that when you query a reasoning model, the `response` object will cleanly separate the generated answer from the underlying thought process, token usage statistics, and latency metrics.
## Building on the Datasette DNA
Willison is obsessed with plugins. The architecture of `llm` mirrors Datasette. The core tool is intentionally dumb. It knows how to manage keys, handle aliases, and route prompts. Everything else is a plugin.
There are plugins for Claude, Gemini, local llama.cpp models, and MLX.
The 0.32a0 release ensures that as API providers push out radical changes to their endpoints—like Anthropic’s prompt caching or Google’s massive context windows—plugin authors have the internal API hooks necessary to support them without waiting for Willison to update the core library.
If you want to add a custom local model running on a weird architecture, you write a hook.
```python
import llm
@llm.hookimpl
def register_models(register):
register(MyWeirdLocalModel())
class MyWeirdLocalModel(llm.Model):
model_id = "weird-local"
def execute(self, prompt, stream, response, conversation):
# 0.32a0 gives you richer context here
yield "This is a custom response."
```
## How It Compares
The CLI AI ecosystem is crowded, but mostly with junk. Here is how Willison’s utility stacks up against the noise.
| Tool | Philosophy | Speed | Extensibility | Reasoning Model Support |
| :--- | :--- | :--- | :--- | :--- |
| **LLM (0.32a0)** | Unix-native, pipes, plugins | Instant | Infinite (Python hooks) | Native structural support added |
| **Fabric** | Prompt-template heavy, opinionated | Moderate | Hardcoded patterns | Poor stdout isolation |
| **Raw Curl** | Masochistic, verbose | Network bound | None | Requires massive `jq` parsing |
| **Electron Wrappers**| GUI focused, heavy | Slow | Walled gardens | Handled via UI updates |
`llm` wins because it stays out of your way. It does exactly what standard Unix tools do: it takes text, processes it, and passes it along.
## The Reality of AI Abstraction
Writing an abstraction layer over APIs that change weekly is a nightmare. OpenAI deprecates models without warning. Anthropic changes its system prompt requirements. Local models invent new chat templates every Thursday.
Most libraries fail because they couple themselves too tightly to OpenAI's specific schema. When OpenAI sneezes, the library breaks.
Willison’s 0.32a0 refactor is an explicit rejection of tight coupling. By generalizing the concept of "frontier capabilities" rather than hardcoding "OpenAI o1 support", the tool insulates your bash scripts from the chaos of the AI provider wars.
## Actionable Takeaways
If you are a developer, stop writing custom wrapper scripts for AI endpoints. You are incurring technical debt for no reason.
1. **Install the CLI:** Run `pipx install llm`.
2. **Configure your keys:** Add your Anthropic and OpenAI keys via `llm keys set`.
3. **Pipe everything:** Start piping your `git diff` outputs into `llm` to generate commit messages. Pipe your error logs to explain stack traces.
4. **Use the Alpha:** You can install the pre-release specifically with `pipx install 'llm==0.32a0'` if you are working with reasoning models and want the structural updates immediately.
5. **Ditch the GUI:** Every time you open a web tab to paste code into an AI chat window, you are breaking your context. Stay in the terminal.
Willison’s 0.32a0 is not a flashy feature release. It is infrastructure work. It is the boring, necessary plumbing required to make the next generation of reasoning models behave predictably in a Unix environment. And in a space obsessed with hype, boring plumbing is exactly what we need.