Building Self-Improving Tax Agents with Codex

Quick Summary

OpenAI and Thrive Holdings collaborated to build Tax AI, a self-improving tax agent that automates tax preparation and improves over time. The system uses Codex to turn production use into structured signals that fuel autonomous improvement.

Key Points

Tax AI processed 7,000 tax returns and saved practitioners about a third of their time on tax preparation.
The system drafts returns with up to 97% accuracy and increases throughput by about 50%.
Tax AI's accuracy improved from 25% to 86% in six weeks for returns reaching 75% correct field completion.
The system uses a three-part loop: expert practitioner feedback, production traces, and a Codex-driven iteration loop.
Codex investigates the root cause of failures, proposes changes, and validates them against targeted and regression evals.

What Actually Changed?

The system was designed to capture expert actions as structured data and use production traces to turn corrections into evals. This allows Codex to investigate the root cause of failures and propose changes.

Coding Impact

The system uses a bounded Codex task environment that separates the writable worktree from read-only production context. This allows Codex to inspect or modify the product surface, targeted and regression evals, and reusable skills/docs.

Model / Tool Comparison

Model/Tool	Description
Tax AI	Self-improving tax agent that automates tax preparation and improves over time.
Codex	AI model that investigates the root cause of failures, proposes changes, and validates them against targeted and regression evals.

Strengths

Tax AI improves over time and automates tax preparation.
The system uses a three-part loop that includes expert practitioner feedback, production traces, and a Codex-driven iteration loop.
Codex can investigate the root cause of failures and propose changes.

Limitations / Concerns

The system requires expert practitioner feedback to improve.
The system may not be able to handle complex or ambiguous cases.
The system requires a bounded Codex task environment to function.

Should I Try It?

Yes, if you are looking for a self-improving tax agent that automates tax preparation and improves over time.

Sources

https://openai.com/index/building-self-improving-tax-agents-with-codex/

Quick Summary

Key Points

What Actually Changed?

Coding Impact

Model / Tool Comparison

Strengths

Limitations / Concerns

Should I Try It?

Sources

Why This Matters

Related articles

Previewing GPT‑5.6 Sol, Terra, and Luna

Latest from X - 2026-06-26

GLM‑5.2 Brings 1M‑Token Context to Coding Agents