Model Signal logo Model Signal Fast, verified AI updates
Big Tech

Building Self-Improving Tax Agents with Codex

2 min read

Key Takeaways

  • * Tax AI processed 7,000 tax returns and saved practitioners about a third of their time on tax preparation.
  • * The system drafts returns with up to 97% accuracy and increases throughput by about 50%.
  • * Tax AI's accuracy improved from 25% to 86% in six weeks for returns reaching 75% correct field completion.
  • * The system uses a three-part loop: expert practitioner feedback, production traces, and a Codex-driven iteration loop.
  • * Codex investigates the root cause of failures, proposes changes, and validates them against targeted and regression evals.

Quick Summary

OpenAI and Thrive Holdings collaborated to build Tax AI, a self-improving tax agent that automates tax preparation and improves over time. The system uses Codex to turn production use into structured signals that fuel autonomous improvement.

Key Points

  • Tax AI processed 7,000 tax returns and saved practitioners about a third of their time on tax preparation.
  • The system drafts returns with up to 97% accuracy and increases throughput by about 50%.
  • Tax AI's accuracy improved from 25% to 86% in six weeks for returns reaching 75% correct field completion.
  • The system uses a three-part loop: expert practitioner feedback, production traces, and a Codex-driven iteration loop.
  • Codex investigates the root cause of failures, proposes changes, and validates them against targeted and regression evals.

What Actually Changed?

The system was designed to capture expert actions as structured data and use production traces to turn corrections into evals. This allows Codex to investigate the root cause of failures and propose changes.

Coding Impact

The system uses a bounded Codex task environment that separates the writable worktree from read-only production context. This allows Codex to inspect or modify the product surface, targeted and regression evals, and reusable skills/docs.

Model / Tool Comparison

Model/Tool Description
Tax AI Self-improving tax agent that automates tax preparation and improves over time.
Codex AI model that investigates the root cause of failures, proposes changes, and validates them against targeted and regression evals.

Strengths

  • Tax AI improves over time and automates tax preparation.
  • The system uses a three-part loop that includes expert practitioner feedback, production traces, and a Codex-driven iteration loop.
  • Codex can investigate the root cause of failures and propose changes.

Limitations / Concerns

  • The system requires expert practitioner feedback to improve.
  • The system may not be able to handle complex or ambiguous cases.
  • The system requires a bounded Codex task environment to function.

Should I Try It?

Yes, if you are looking for a self-improving tax agent that automates tax preparation and improves over time.

Tags

Tax AI, Codex, self-improving agents, tax preparation, expert practitioner feedback, production traces, bounded Codex task environment.

Sources

  1. https://openai.com/index/building-self-improving-tax-agents-with-codex/

Why This Matters

The system uses a bounded Codex task environment that separates the writable worktree from read-only production context. This allows Codex to inspect or modify the product surface, targeted and regression evals, and reusable skills/docs.