Back to Blog

OpenAI Release Notes

OpenAI’s release notes are usually a masterclass in corporate obfuscation. They bury breaking changes under cheerful marketing copy about "empowering creators." But if you read the actual diffs, you see the architectural tectonic plates shifting. The latest batch of updates—spanning September 2025 to the 2026 roadmap—is a wake-up call. We are seeing the death of stateful bloat, the financialization of compute, and a direct assault on Microsoft’s enterprise moat. Let's tear down what these changes actually mean for your production systems. ## The Assistants API is Dead. Good Riddance. According to the API Changelog, OpenAI plans to sunset the Assistants API in 2026. All features are migrating to the "easier to use" Responses API. Anyone who built a production system on the Assistants API knows exactly why this is happening. It was an architectural nightmare. OpenAI tried to own your state. They wanted you to push threads, messages, and files into their opaque data stores, then write convoluted polling loops to check if a "Run" was finished. It was the antithesis of modern, stateless web architecture. It introduced massive latency, locked you into their specific vector database implementation, and made debugging impossible because you couldn't inspect the intermediate state. The move to the Responses API is OpenAI admitting defeat on state management. ### The Migration Path If you have code that looks like this, you have a technical debt timebomb ticking toward 2026: ```python # The old, broken way (Assistants API) run = client.beta.threads.runs.create( thread_id=thread.id, assistant_id=assistant.id ) while run.status in ['queued', 'in_progress']: time.sleep(1) # Praying to the latency gods run = client.beta.threads.runs.retrieve( thread_id=thread.id, run_id=run.id ) if run.status == 'completed': messages = client.beta.threads.messages.list(thread_id=thread.id) ``` You need to rip that out. The Responses API model forces you back to owning your own state. You manage the context window. You pass the array of messages. You handle the RAG retrieval. This is how it should have been from day one. ```python # The sane way (Responses API pattern) response = client.responses.create( model="gpt-4.5-turbo", messages=my_local_database.get_thread_history(user_id), tools=[search_tool, execute_sql_tool], stream=True ) for chunk in response: sys.stdout.write(chunk.choices[0].delta.content) ``` Start porting your infrastructure now. Do not wait until December 2025 when the deprecation warnings start breaking your CI/CD pipelines. ## The Compute Knob: September 2025's "Thinking Level" We used to hack LLM reasoning by appending "think step by step" to our system prompts. It was a stupid, inexact science. The September 2025 release notes formalize test-time compute with the "thinking level toggle." Users and developers now have a choice beyond Standard: you can dial down for lighter, faster responses, or dial up for extended reasoning. This is not just a UI feature. It is a fundamental shift in token economics. ### Economics of Extended Reasoning Under the hood, "extended reasoning" means the model is generating hidden Chain-of-Thought (CoT) tokens. It is running internal verification loops before it emits the final user-facing string. From an engineering perspective, this means your latency SLAs are now dynamic. If you set `thinking: "extended"`, you are trading milliseconds for accuracy. If you are building a real-time chatbot, extended thinking will result in terrible UX. The time-to-first-token (TTFT) will spike. Conversely, the "lighter" mode is an acknowledgment that we don't need a massive, heavily parameterized model to parse JSON or extract dates from an email. ### Implementing Compute Control Your application logic needs to route requests based on task complexity. Do not hardcode a single thinking level across your entire stack. ```json // Payload for a complex math or logic task { "model": "gpt-next", "messages": [{"role": "user", "content": "Write a lock-free queue in C"}], "thinking_level": "extended", "max_completion_tokens": 8000 } ``` ```json // Payload for simple text extraction { "model": "gpt-next", "messages": [{"role": "user", "content": "Extract the zip code from this text."}], "thinking_level": "light", "max_completion_tokens": 100 } ``` Build middleware that evaluates the intent of the user prompt and dynamically sets the `thinking_level` parameter. This will save you a massive amount of money on API costs and keep your application snappy. ### The Physics of Thinking Levels Let’s get technical about what the "thinking level toggle" actually does. This is a direct evolution of the techniques pioneered in early reasoning models. Standard LLMs generate tokens autoregressively. Given a prompt, they predict the next most likely token based on their training weights. They do not "think." They hallucinate highly probable continuations. The "extended" thinking level injects a hidden execution phase. The model begins generating a Chain-of-Thought sequence. It poses a hypothesis, tests it internally, recognizes an error, backtracks, and tries a different logical path. All of this happens in a hidden context window that you, the developer, never see. You only see the final, synthesized output. Why hide it? Because exposing the raw, messy CoT exposes the model's internal reasoning mechanics, making it vulnerable to prompt injection and distillation (training smaller models on the CoT output). But this hidden process burns compute. Every internal token generated during the "thinking" phase costs money. While OpenAI abstracts this into a simple toggle, the underlying reality is that you are paying for the compute cycles required to resolve the internal logic tree. If you are building a system that requires high reliability—like parsing legal contracts or generating complex SQL migrations—the "extended" level is worth the cost. It significantly reduces the hallucination rate for multi-step logic problems. But do not use it for summarization. Summarization is a single-pass mapping operation. Extended thinking will not make a summary better; it will only make it slower and more expensive. ## Native Spreadsheets: Bypassing the Wrappers The most aggressive move in these release notes is the global rollout of ChatGPT for Excel and Google Sheets (Enterprise, Edu, and K-12). For years, developers have built entire startups essentially wrapping the OpenAI API inside a Google Sheets add-on. OpenAI just killed all of them. By bringing a "spreadsheet-native ChatGPT sidebar" directly into Excel and Sheets, OpenAI is going after the absolute core of enterprise data gravity. Financial analysts, marketers, and HR reps do not want to export CSVs, upload them to a web UI, and download the results. They want the agent living inside the spreadsheet. ### The Threat to Microsoft Copilot Microsoft owns Excel. Microsoft owns a massive stake in OpenAI. Yet, OpenAI is shipping a native sidebar that directly competes with Microsoft 365 Copilot. The internal politics there must be toxic. The release notes specify that this sidebar supports "Skills and apps where available so spreadsheet work can be grounded in approved files, systems, and data sources." This means the AI isn't just reading cells A1 through B10. It is reaching out to your internal Jira instances, your Salesforce data, and your PostgreSQL read replicas, and pulling that data directly into the spreadsheet model. ### The Spreadsheet Vector The integration of ChatGPT directly into Excel and Google Sheets is brilliant product strategy, but it introduces a fascinating new attack surface. Spreadsheets are Turing complete. They are also universally abused as makeshift databases, CRMs, and application backends by non-technical staff. When you give an LLM native access to a spreadsheet, you are essentially giving it access to the most chaotic data structure in the enterprise. The release notes mention "Skills and apps." This implies a plugin architecture. If an HR manager is using the ChatGPT sidebar in a compensation spreadsheet, and that sidebar has the "Workday Skill" enabled, the LLM can query Workday, pull salary bands, and populate the spreadsheet. Now consider the security boundary. The LLM is acting on behalf of the authenticated user. It inherits their IAM roles. If you are a DevOps or Security engineer, you must implement strict zero-trust policies for these agentic integrations. You cannot rely on the LLM to enforce access controls. The API endpoints that the "Skills" interact with must strictly validate the user's permissions for every single request, regardless of whether the request comes from the web frontend or the ChatGPT Excel sidebar. ```yaml # Example IAM Policy strictly limiting Spreadsheet AI access PolicyName: SpreadsheetAgentRestrictivePolicy Statement: - Effect: Allow Action: - internal-api:ReadData Resource: "arn:aws:api:us-east-1:123456789012:my-api/*" Condition: StringEquals: "aws:PrincipalTag/SourceApplication": "openai-spreadsheet-integration" NumericLessThan: "internal:RiskScore": 50 - Effect: Deny Action: - internal-api:WriteData - internal-api:DeleteData Resource: "*" Condition: StringEquals: "aws:PrincipalTag/SourceApplication": "openai-spreadsheet-integration" ``` Never allow an LLM operating inside a spreadsheet to execute write operations on core infrastructure without an explicit human-in-the-loop approval step. The risk of automated data corruption is too high. The free preview runs through June 2, 2026. After that, it flips to usage-based billing. If you manage enterprise IT, you need to audit exactly which "Skills" are approved for the spreadsheet environment immediately. ## Codex Evolves: Agents, Not Autocomplete The changelog briefly mentions the "Latest updates to Codex, OpenAI’s coding agent." Notice the terminology shift. It is no longer a "code completion model." It is a "coding agent." We have moved past the era of pressing Tab to finish a boilerplate function. The updated Codex architecture implies autonomous loops. It reads the issue ticket, clones the repo, parses the Abstract Syntax Tree (AST), generates the diff, runs the linter, and opens the Pull Request. ### The Codex Paradigm Shift The pivot from autocomplete to "coding agent" reflects the broader industry trend. Copilot and Cursor proved the market for AI assistance. The next phase is AI autonomy. An autocomplete model looks at the last 50 lines of code and guesses the next 5. A coding agent looks at the GitHub issue, searches the repository for relevant files, formulates a plan, executes terminal commands to install dependencies, writes the code, runs the test suite, reads the test failures, debugs its own code, and commits the result. This requires an entirely different infrastructure. You cannot run an autonomous agent on your local machine without serious risk. It will accidentally delete your `.git` folder or push secrets to a public registry. If your internal developer platform does not support sandbox environments for autonomous agents to execute code and fail safely, you are already behind. You should be building ephemeral Docker containers specifically designed for Codex to thrash around in. ```bash # Setting up an ephemeral agent workspace docker run -d \ --name codex-sandbox-$(uuidgen) \ --network none \ --memory="4g" \ --cpus="2.0" \ ubuntu-dev-base:latest ``` Give the agent access to the sandbox, not your bare metal. The future of software engineering isn't writing code. It is building the sandboxes, defining the constraints, and reviewing the diffs produced by systems like Codex. ## Architectural Shifts Comparison To understand how these changes fit together, we need to look at the execution context. | Feature | Old Paradigm | New Paradigm (2025/2026) | Engineering Impact | | :--- | :--- | :--- | :--- | | **API State** | Assistants API (OpenAI manages state) | Responses API (You manage state) | Rip out polling loops. Build robust local context management. | | **Compute Allocation** | Static ("Think step by step" prompts) | Dynamic (Thinking Level toggle) | Write routing middleware to optimize API costs and latency. | | **Enterprise Data** | CSV Uploads to Web UI | Native Excel/Sheets Sidebar with Skills | Audit internal API endpoints. Prepare for prompt injection via spreadsheet cells. | | **Code Generation** | Inline autocomplete | Autonomous coding agents | Build isolated, ephemeral sandboxes for agent execution. | ## Actionable Takeaways Stop treating AI features as magic black boxes. They are software systems with distinct I/O, latency profiles, and security perimeters. Based on these release notes, here is your immediate backlog: 1. **Deprecate the Assistants API:** Audit your codebase today. If you are importing `client.beta.threads`, write a migration plan. You have until 2026, but the API will stagnate long before then. Move to the Responses API and take back control of your state. 2. **Implement Thinking Middleware:** Do not use the same API parameters for every request. Build a router that analyzes the prompt. Send data extraction tasks to "light" thinking. Send algorithmic logic tasks to "extended" thinking. Your AWS bill will thank you. 3. **Audit Spreadsheet Skills:** If your company uses the Enterprise or Edu tiers, the native ChatGPT sidebar is coming. Work with InfoSec to restrict which internal APIs the spreadsheet sidebar can access. Treat every spreadsheet cell as untrusted user input. 4. **Sandbox Your Code Agents:** As Codex transitions from autocomplete to autonomous agent, you must isolate its execution environment. Do not give it write access to production databases. Give it isolated containers and let it generate diffs. OpenAI is shifting the burden of architectural complexity back onto the developer. They are providing raw compute toggles and direct data integrations, but they are stepping away from managing your state. This is a good thing. It means we get to engineer systems again, rather than just writing wrappers around an API. Act accordingly.