I don’t use AI as a search engine. I use it as a system. That distinction matters more than which model you’re using or which interface you prefer.
The shift happened gradually and then all at once. For a while, AI tools were something I reached for occasionally. To help draft something, to summarize a document, to check my thinking on a decision. Useful, but episodic. Then agents started to become real and my relationship with the technology changed entirely. Now AI is woven into most of what I do during a working day, not because I sought that out as an end in itself, but because the friction of not using it became higher than the friction of using it.
Here’s how it actually works in practice.
The shift from tools to agents
The difference between an AI tool and an AI agent is the difference between a calculator and an assistant. A tool does the specific thing you tell it to do. An agent maintains context, can take a sequence of actions, can access external information, and can operate with a degree of autonomy over a longer horizon than a single prompt.
That sounds abstract. In practice it means the difference between “summarize this document” and “go through my inbox, pull the three most time-sensitive things, draft responses based on my previous communication style with each of these people, and flag anything that looks like it needs a decision before Friday.” The first is a tool interaction. The second is more like delegating to someone who knows your work.
Getting to the second type of interaction takes setup. It requires that the agent has enough context about how you work that it can make reasonable decisions without constant input. That investment in setup is where most people stop. It’s also where most of the value is.
What my actual workflow looks like
The largest single shift in how I work has been in software development. I use AI coding agents as the primary way I write and modify code. Claude Code, Codex, and others. This isn’t about generating boilerplate. It’s about having something that can hold the context of an entire codebase, understand what I’m trying to accomplish at a product level, and propose and implement changes that I then review and direct. The back-and-forth is collaborative in a way that feels different from using a coding assistant. I’m operating more as an architect and reviewer than as someone writing every line.
For communication, I use AI to draft email responses based on threads and context, then edit rather than write from scratch. The time savings compound over a day in a way that’s hard to convey until you’re doing it. The cognitive load of switching from a complex technical problem to drafting a thoughtful email is also genuinely reduced when the first version is already there.
Research and synthesis is another area where agents have changed my workflow significantly. Rather than reading through ten sources to get a sense of a topic, I can delegate that initial survey and get a structured summary that I then interrogate. The skepticism you need to apply to AI-generated summaries is real. They miss things, they sometimes confidently state things that are wrong. But as a first pass that narrows what I need to read carefully, the value is real.
Health and daily tracking sounds mundane but it’s a good example of where small frictions matter. I log food, exercise, and other health data through natural language. I speak or type what I ate, the agent extracts the relevant data and logs it. The difference between doing this consistently and not doing it was, for me, friction. Removing the friction changed the behavior.
What agents are still bad at
Judgment under genuine uncertainty. When a decision requires weighing factors that aren’t well-defined, or where the right answer depends on context that’s hard to specify, agents perform poorly. They’ll give you an answer but it won’t be the right one in the way that matters. The cases where I’ve been most burned are the cases where I asked an agent to make a call it wasn’t equipped to make and I didn’t scrutinize the output carefully enough.
Long-horizon autonomy with real stakes. Agents that operate over long sequences of actions without human checkpoints will eventually make a mistake that compounds. The longer the chain, the higher the chance of drift from what you actually wanted. I use agents for long-horizon tasks but I build in review points.
Novel creative work that needs genuine originality. AI-assisted writing that builds on what exists produces competent output. But the work I’m most proud of, the thinking that feels like it comes from a specific vantage point that only I have, doesn’t come from agents. They can help structure and refine it. They can’t originate it.
How to think about building your own workflow
Don’t start with the tool. Start with the task you find most tedious, most context-switching, or most high-volume in your working day. Then ask whether that task is the kind of thing that can be decomposed into a sequence of steps that could be automated or assisted with enough context. If yes, that’s where you start.
The setup investment is real. Getting an agent to work well for a specific workflow takes time. Writing the context documents, the instructions, the constraints, the edge cases. Most people try an agent once with minimal setup, get mediocre results, and conclude the technology isn’t useful. The technology is useful. The setup is the work.
Build incrementally. I didn’t switch to a fully agent-driven workflow in a week. Each piece was added when the benefit was clear and the friction of setup was worth it. That’s still how I add things.
The personal OS idea
About a year ago I started building what I call a personal operating system. A custom software environment where AI is integrated into how I manage tasks, communication, health, and knowledge. Not because I wanted to build software for its own sake, but because the off-the-shelf tools available weren’t integrating the way I needed them to, and the marginal cost of building something custom had dropped to almost nothing when AI handles most of the implementation work. This is the same principle behind how voice AI agents work at a product level: the components are cheap, the orchestration is the value.
The core of it is a task and inbox management system that ingests text from multiple sources. Voice transcriptions, emails, meeting notes, messages. It classifies the content using Claude and surfaces the things that actually need my attention. The classification piece took some iteration to get right, but the core principle is simple: not all inputs are equal and the system should have opinions about what’s urgent, what can wait, and what’s noise.
I log health data through the same system. Food, exercise, weight, sleep, all via natural language. I speak or type what I did, the system extracts the structured data and stores it. The implementation is less impressive than it sounds but the behavioral effect is real: the friction of the logging itself is low enough that I actually do it. Habit formation is usually a friction problem.
The emails are handled by an agent that watches my inbox, classifies messages, surfaces things that need responses, and drafts responses for the ones that are routine. I review and send, or I edit and send. The base rate of emails I’m writing from scratch has dropped significantly. The time saved isn’t just the writing time. It’s the mental overhead of holding “I need to respond to that” in working memory while doing something else.
Voice as the primary input method
I use voice transcription as the primary way I input information into the system. Not just for logging. For inboxing tasks, for capturing thoughts during a commute or a walk, for dictating context that an agent needs to act on something later.
This changes the design constraint for the AI layer. When you’re typing, you self-edit as you go. When you’re speaking, you don’t. The input is messier, more conversational, more likely to contain tangents and false starts and corrections mid-sentence. The system has to handle natural language generously or it breaks constantly. This is a solvable problem. The language models are very good at extracting structured information from messy speech. But it requires designing for voice from the start rather than bolting it on.
The practical upside is that voice input is fast and low-friction in contexts where typing isn’t practical. Driving, walking, standing at a whiteboard. The volume of information I capture that would otherwise be lost has gone up significantly since I started doing this.
How to write context documents that actually work
The single biggest lever for improving agent performance in a specific workflow is the quality of the context documents the agent has access to. Most people skip this step or do it minimally. It’s most of the work.
A context document for an agent that handles your email should include: how you communicate with different types of people (clients versus vendors versus colleagues versus new contacts), the specific recurring scenarios and how you typically handle each, what you never do (always reply-all, never use certain phrases, always include a call to action), and the exceptions that would look like patterns but aren’t. The agent uses this to produce first drafts that are close enough to what you’d write that editing takes two minutes instead of ten.
For coding agents, the equivalent is the project context. What the codebase is, how it’s structured, what the conventions are, what’s currently being worked on, what the known constraints and edge cases are. Claude Code, the tool I use most heavily for this, operates much better with a well-maintained CLAUDE.md file that gives it this context than it does making inferences from the code structure alone.
The maintenance discipline is important. Context documents go stale. When your communication style shifts, when a project’s architecture changes, when your priorities change, the context needs updating. Agents operating from stale context will produce output that feels slightly off in ways that are hard to diagnose if you don’t understand the underlying cause. Keep the documents current and the agent performance stays high.
MCP servers and tool use at scale
Model Context Protocol (MCP) is the standard that’s emerged for connecting AI agents to external tools and data sources. The pattern is: you run a small server that exposes specific capabilities (read from a database, send an email, query an API), and the language model can call those capabilities as tools in the course of generating a response.
I’ve built MCP servers for Gmail (reads and drafts across seven email accounts), for the Tyler OS task system, and for a few other specific integrations. The value is that the agent doesn’t have to be told explicitly what to do. It can reason about what tools are available and decide to use them in service of a goal. That’s the difference between giving someone a step-by-step process and giving them a set of capabilities and trusting them to figure out the steps.
The security consideration is real and worth spending time on. An MCP server that can send emails or make API calls is an agent that can do damage if it makes the wrong decision or if the tool definitions are loose enough to allow unintended actions. I design the tools to be as narrowly scoped as possible and build confirmation steps into workflows that have meaningful side effects. The autonomy is valuable but it needs guardrails.
Where this goes
The honest answer is I don’t know, and anyone who claims to with precision is overconfident. What I expect is that the capability ceiling for individual agents rises, that coordination between agents becomes easier and more reliable, and that the overlap between “things a person does in a working day” and “things that can be delegated or augmented” expands significantly.
The question I keep coming back to is not which tasks agents will take over but how the nature of the work that remains changes as that happens. The answer to that question will determine a lot about what skills and ways of working actually matter in the next ten years. I don’t think we’ve figured that out yet. I do think the people who are working with agents now, building the intuition for what they’re good at and what they’re not, will have a meaningful advantage when it becomes clearer.
Building the habit of using agents well
The biggest barrier to getting value from AI agents is not capability. It’s the consistent discipline of using them in the workflow rather than bypassing them when you’re in a hurry.
The pattern I’ve seen in myself and in other founders: you set up an agent workflow, it works well when you take the time to engage with it properly, but under time pressure you default to doing the thing manually because it’s faster in that moment. The manual path wins in the short term. The agent path compounds. Choosing the agent path under pressure is a discipline, not a preference.
The way I’ve addressed this is by making the agent path the path of least resistance wherever possible. If accessing the agent requires opening a separate application, navigating to a specific interface, and then composing a request, you will bypass it. If the agent is available wherever you already are, with minimal friction to engage it, you will use it. The interface design of your agent workflow matters as much as the capability of the agents.
Start with one workflow and make it irreplaceable before adding more. The temptation is to automate everything at once. The result is a set of half-configured agents that work well in ideal conditions and break unexpectedly enough that you stop trusting any of them. One well-configured agent that you use every day without thinking about it is worth more than ten that you use inconsistently.
The honest accounting
I want to be clear about the limits of what I’ve described, because there’s a version of the AI workflow story that implies it makes everything better. It doesn’t.
The work that has the highest value in what I do is the work that’s hardest to delegate to agents: the judgment calls about product direction, the relationships with customers and team members, the creative leaps that come from lived experience and domain knowledge accumulated over years. Agents can make me faster at the peripheral work. The scheduling, the drafting, the research synthesis, the logging. They don’t make the core work easier. The core work was never the bottleneck.
The risk of a very efficient peripheral layer is spending more time in it. When email drafting takes two minutes instead of ten, there’s a pull toward clearing the inbox rather than protecting the time for deeper work. The efficiency gains from agents are only valuable if you redirect the saved time toward the things that actually matter. That’s a personal discipline problem, not a technology problem, but it’s worth naming.
The founders and operators I know who are getting the most out of AI agents are the ones who are clearest about what they’re trying to accomplish at a high level, and who use that clarity to decide which tasks are worth automating versus which ones deserve their full attention. The technology is a multiplier. What it multiplies depends entirely on what you’re doing with the time.