为 Agent 设计 Office:拆开 Excel 的三层

Office for Agent: Unbundling Excel’s Three Layers

Excel 把 DOM、计算、数据存储缝在一个文件里。Agent 时代应该把它们拆开 —— HTML 负责呈现,Python 负责计算,Agent 只管值。到什么规模才值得从 CSV 换成数据库?

Excel stitches DOM, computation, and storage into one file. In the Agent era we should unbundle them — HTML renders, Python computes, the Agent manages values. And when does CSV stop being enough?

中文

Excel 是一代人生产力工具的巅峰。它把三件事塞进了同一个文件、同一个运行时、同一个 UI 里 —— 你看到的单元格、单元格之间的公式计算、还有静静躺着的那些值。对人类来说很方便:打开文件,一切都在眼前。但对 Agent 来说,这种缝合恰恰是个麻烦。

这篇文章写我们对 Agent 时代 Office 的设想 —— 为什么应该把 Excel 的三层拆开、拆给谁、怎么让它们重新联动,以及值管理那一层什么时候该用 CSV、什么时候该上数据库。

Excel 缝在一起的三层

任何电子表格本质上都在做三件事:

  • DOM(呈现层) —— 单元格、边框、合并、格式、图表。你看到的一切。
  • 计算图 —— =SUM(A1:A10)=VLOOKUP(...)。单元格之间的表达式依赖。
  • 数据存储 —— 那些最终落在磁盘上的数值和字符串。既是输入,也是快照。

三层被缝在一个 .xlsx 里,这是 Excel 的天才 —— 对人类来说很优雅。但对 Agent 来说就是灾难。

为什么对 Agent 不再合适

Agent 不“看” DOM,它读结构。它不需要可视化的边框,它需要知道“这一列是什么语义”。它也不像人类那样通过鼠标点击来触发重算 —— 它写代码。

硬把 Agent 塞进 Excel 的范式,意味着:

  • 它得先翻译视觉布局才能理解数据;
  • 它得处理工作表里那些半死不活的公式引用;
  • 它得担心一次改动触发整张表重算的副作用。

更糟的是:Excel 的公式语言是一门受限的、沙箱化的、用表格坐标思考的小语言。Agent 已经会写 Python 了,它为什么还要学那门语言?

三层重新分工

对 Agent-native 的 Office,我们的主张很简单:把三层拆开,每一层交给最合适的执行者

  • DOM(呈现) 交给 HTML / Markdown。浏览器已经是过去三十年最好的渲染引擎,不用再造一个。
  • 计算 交给 Python / JS。Agent 原生就会写,工具链完整、可调试、可复用。
  • 数据(值管理) 交给 Agent 和文件存储。Agent 最擅长读、写、合并、版本化。

Agent 真正要负责的,只有值管理那一层 —— 这份工作簿里装了什么数据、现在是什么状态、历史是什么、要不要改。

三层怎么联动

这是最关键的问题。拆开不是目的,能重新合起来才是。

我们的答案是 —— 文件即接口

# Agent 的一次真实工作流:
> 用户问: "帮我做一份这周销售的汇总"

# Agent 按三层拆解执行:
huozi_write({ file_path: "data/sales.csv", ... })        # 1. 值
huozi_write({ file_path: "scripts/summary.py", ... })    # 2. 计算
# Agent 在本地/沙箱跑 python scripts/summary.py
#   → 读 data/sales.csv
#   → 写回 data/summary.json
huozi_write({ file_path: "report.html", ... })           # 3. 呈现
#   → 这个 HTML 读 data/summary.json 渲染出表格和图

# 三个文件. 三个角色. 每一步都可 diff, 可回放, 可单独替换.

联动是通过文件的读写和版本发生的,不是通过隐藏的单元格依赖。Agent 改一个 CSV,再跑一下 Python,再重渲染 HTML —— 这三步是显式的、可追溯的、可 diff 的。没有“改了某个单元格,整个工作簿偷偷重算了”这种黑盒。

这种架构还有一个很美好的副作用:每一层都可以被单独替换。今天计算用 pandas,明天想换 Polars;今天呈现用纯 HTML,明天想加个 React 交互层 —— 数据文件不动,上下游都可以重来。Excel 里你做不到这件事。

瓶颈和限制:CSV 到什么时候就顶不住

这套设计最直接的质疑是:值管理就靠一堆 CSV?到什么规模就塌了?

经验法则:

  • 百行到十万行 —— CSV 够用。pandas 读进来几十 MB 眼都不眨,Agent 也能一次性 grep / diff 全文件。这个范围覆盖了绝大多数实际业务场景。
  • 十万到千万行 —— 改用 Parquet 或分片 CSV。仍然是文件,仍然不需要服务器。Agent 用 duckdb 做列式查询即可。
  • 千万行以上,或强并发写 —— 这时候才上真正的数据库。Postgres、ClickHouse、或者对象存储 + 查询引擎。Agent 通过 SQL 或 MCP 连过去。

换句话说 —— 只有当“单机内存装不下”或“多个写入者要抢锁”时,才值得引入数据库。在那之前,文件就是最好的数据库:可读、可 diff、可版本化、可分享。

什么时候用 CSV,什么时候用数据库

一个更操作化的清单。

用 CSV / 文件,如果:

  • 只有一个 Agent 在写,或者写入是串行的;
  • 数据是“一次产出、多次引用”的快照性质;
  • 你希望变更能被人类 review(git diff 一个 CSV 是最舒服的事之一);
  • 行数在百万级以下。

用数据库,如果:

  • 多个 Agent / 多个用户并发写;
  • 需要事务:要么全部成功,要么全部回滚;
  • 需要在线查询(实时仪表盘、后端 API);
  • 数据量已经让 Python 吃不消。

大多数 Agent 场景 —— 写报告、做分析、维护一份状态清单、跑一次性的数据整理 —— 都落在文件这一侧。数据库是一个严肃的升级决定,不是默认选项。

一句话总结

Excel 的天才,在于把三层缝合给人类。Office for Agent 的天才,应该在于把三层拆开给 Agent

Agent 不需要一个假装是纸、实际是隐藏状态机的文件格式。它需要三样直白的东西:一个值(文件)、一段计算(代码)、一张呈现(HTML)。然后让它自己把它们串起来 —— 这恰恰是 Agent 最擅长的事。

值为文件,计算为代码,呈现为网页。

双 · Eng
English

Excel was the peak productivity tool of a generation. It stitched three things into one file, one runtime, one UI — the cells you see, the formulas between them, and the values that quietly sit on disk. For humans that’s elegant: open the file and everything is right there. For an Agent, the same stitching is exactly the problem.

This post lays out our design for an Agent-era Office — why Excel’s three layers should come apart, who each one should go to, how to make them talk again, and when the value-management layer should stop being CSVs and become a real database.

The three layers Excel stitches together

Any spreadsheet is, underneath, doing three things:

  • The DOM (presentation) — cells, borders, merges, formatting, charts. Everything you see.
  • The computation graph=SUM(A1:A10), =VLOOKUP(...). Expression-level dependencies between cells.
  • Data storage — the values and strings that end up on disk. Input and snapshot in one.

Three layers welded into one .xlsx. That is Excel’s genius — and it is elegant, for humans. For Agents, it’s a trap.

Why it no longer fits the Agent

An Agent doesn’t see a DOM; it reads structure. It doesn’t need cell borders. It needs to know what each column means. And it doesn’t trigger recalculation by clicking — it writes code.

Forcing an Agent into Excel’s paradigm means:

  • it has to translate visual layout before it can understand data;
  • it has to pick its way through half-alive formula references;
  • it has to worry about one edit silently recalculating the whole sheet.

And worst of all: Excel’s formula language is a small, sandboxed DSL that thinks in grid coordinates. The Agent already knows Python. Why would it learn a second, weaker language?

Redistribute the three layers

Our proposal for an Agent-native Office is simple: unbundle the three layers and hand each one to whoever is best at it.

  • DOM (presentation) goes to HTML / Markdown. The browser is the best rendering engine of the last thirty years. We don’t need to build another one.
  • Computation goes to Python / JS. The Agent already writes it natively. The toolchain is mature, debuggable, reusable.
  • Data (value management) goes to the Agent and file storage. Reading, writing, merging, versioning — that is exactly what the Agent is best at.

The Agent’s actual job is value management, and nothing else: what’s in this workbook, what state is it in, what’s the history, does it need to change?

How the three layers talk

This is the part that matters. Unbundling isn’t the goal — being able to recompose is.

Our answer is the file as the interface.

# A real Agent turn:
> "Make me a summary of this week's sales."

# The Agent executes across three layers:
huozi_write({ file_path: "data/sales.csv", ... })        # 1. values
huozi_write({ file_path: "scripts/summary.py", ... })    # 2. computation
# The Agent runs python scripts/summary.py in a sandbox:
#   reads  data/sales.csv
#   writes data/summary.json
huozi_write({ file_path: "report.html", ... })           # 3. presentation
#   this HTML reads data/summary.json and renders the table and charts.

# Three files. Three roles. Every step diffable, replayable, replaceable.

The layers coordinate through file reads, writes, and versions — not through hidden cell dependencies. The Agent edits a CSV, runs a Python script, re-renders the HTML. Three explicit steps, each traceable and diffable. No “I changed one cell and the whole workbook silently recalculated” black box.

There’s a lovely side-effect to this architecture: each layer is independently replaceable. Computation in pandas today, Polars tomorrow. Presentation as static HTML today, React for interactivity tomorrow. The data files don’t move; the layers above and below can be swapped freely. You can’t do that inside an .xlsx.

Bottlenecks and limits: when does CSV give out?

The most immediate objection to this design is: value management is just a pile of CSVs? At what scale does that collapse?

Rules of thumb:

  • Hundreds to hundreds of thousands of rows — CSV is fine. pandas reads tens of MB without blinking, and the Agent can grep or diff the whole file in one pass. This range covers the overwhelming majority of real business tasks.
  • 100K to 10M rows — switch to Parquet, or shard the CSVs. Still files, still no server. The Agent uses duckdb for columnar queries.
  • 10M+ rows, or heavy concurrent writes — this is where an actual database earns its keep. Postgres, ClickHouse, or object storage plus a query engine. The Agent connects over SQL or MCP.

Put another way: a database is warranted only when a single machine’s memory can’t hold the data, or multiple writers are contending for locks. Before that, the file is the best database: readable, diffable, versionable, shareable.

When to use CSV, when to reach for a database

A more operational checklist.

Use CSV / files if:

  • only one Agent is writing, or writes are serialised;
  • the data is “produce once, reference many times” in shape;
  • you want humans to review changes (git diff on a CSV is one of the most satisfying things in engineering);
  • row count is under a million.

Reach for a database if:

  • multiple Agents / multiple users are writing concurrently;
  • you need transactions — all-or-nothing commits;
  • there’s an online query path (live dashboard, backend API);
  • the data volume has outgrown what Python can chew on.

Most Agent work — writing reports, doing analysis, maintaining a status list, one-off data wrangling — lives firmly on the file side of that line. A database is a serious upgrade decision, not a default.

The one-line version

Excel’s genius is that it stitched three layers together for humans. An Agent-era Office’s genius should be that it takes those three layers apart for the Agent.

The Agent doesn’t need a file format that pretends to be paper but is secretly a state machine. It needs three plain things: a value (a file), a computation (some code), a presentation (an HTML page). Then it strings them together itself — which is exactly what the Agent is best at.

Values as files. Computation as code. Presentation as the web.