Hacker News new | ask | show | jobs
by lewisjoe 101 days ago
Excellent project! I see that the agent modifies the google docs using an interesting technique: convert doc to html, AI operates over the HTML and then diff the original html with ai-modified html, send the diff as batchUpdate to gdocs.

IMO, this is a better approach than the one used by Anthropic docx editing skill.

1. Did you compare this one with other document editing agents? Did you have any other ideas on how to make AI see and make edits to documents?

2. What happens if the document is a big book? How do you manage context when loading big documents?

PS:I'm working on an AI agent for Zoho Writer(gdocs alternative) and I've landed on a similar html based approach. The difference is I ask the AI to use my minimal commands (addnode, replacenode, removenode) to operate over the HTML and convert them into ops.

This works pretty well for me.

1 comments

re. comparing with other editing agents - actually, I didn't find any that could work with google docs. Many workflows were basically "replace the whole document" - and that was a non-starter.

re. what happens if its a big book - each "tab" in the google doc is a folder with its own document.xml. A top-level index.xml captures the table of contents across tabs. The agent reads index.xml and then decides what else to read. I am now improving this by giving it xpath expressions so it can directly pick the specific sections of interest.

Philosophically, we wanted "declarative" instead of "imperative". Our key design - the agent needs to "think" in terms of the business, and not worry about how to edit the document. We move all the reconcilliation logic in the library, and free the agent from worrying about the google doc. Same approach in other libraries as well.