AI agents / Aira
Aira: The Windows AI Agent I Built to Work on My Computer
A look at Aira, a Windows-first AI agent built for realtime voice, wake listening, computer control, screen summaries, local memory, and iPhone access.
Watch the demo first, then read how the system is put together and where this type of agent can be useful in a business.
Book a call and I will map the first useful AI agent version around your business workflow.
Aira started as a Windows-first AI agent for my own computer. The goal was simple enough to describe, but hard to build well. I wanted to speak to the computer, have it understand the screen, take action through local tools, and keep a visible record of what it did.
The current build runs as an Electron desktop app with a local host server on Windows. The interface is built in React. The voice layer uses OpenAI Realtime over WebRTC. The host owns the computer actions, so the model does not need to pretend it can use the machine. It calls tools, and the Windows host performs the work.
That distinction matters. A useful desktop agent needs access to the computer, but it also needs boundaries. Aira can open apps, type text, press keys, click, scroll, inspect the foreground window, capture screenshots, summarize the screen, run trusted PowerShell commands, and store completed work as local deliverables. It still needs explicit handling for sensitive actions such as deleting data, payments, credentials, account security changes, and private information.
The voice workflow is built around the phrase "Hey Aira." The Windows host can run a wake listener through System.Speech, and the app also has a browser or Electron fallback path. The Realtime session can stay preconnected with the microphone gated, which reduces the delay between the wake phrase and the response.
The practical test for this kind of agent is whether it can do useful work without forcing the user to babysit every click. Aira includes a safer inspect-then-act loop for Windows UI control. It can inspect the foreground app, collect visible controls, search controls by name or type, wait for a control to appear, set a field value, or click a named control. Coordinate clicking still exists, but it is the fallback because layouts move.
Aira also has a delivery rail. When it finishes a tool task, it can show the output as a saved delivery instead of burying the result in a chat transcript. That matters for real work. If the agent makes an image, writes a note, renders a Mermaid chart, creates a file, summarizes the screen, or runs a diagnostic, the result should be easy to review.
The screen summary tool is one of the most useful pieces. A normal screenshot only stores pixels. Aira can capture the screen and return readable observations. That lets the user ask what is on screen, what looks wrong, or what step appears next without manually describing the window.
The project also has local memory. Aira stores notes, records, activity, screenshots, delivery artifacts, and a memory profile under the local data folder. That gives the agent a place to keep useful context without turning every session into a blank start.
There is an iPhone path as well. The host can be served privately through Tailscale, which gives mobile Safari secure HTTPS access while keeping the app inside the tailnet. The app includes checks for whether Tailscale is running, whether Serve points to Aira, and whether the iPhone access URL is ready. This is useful because microphone and WebRTC APIs need a secure context on mobile.
Aira is a personal build, but the pattern applies to business agents. Many companies start with a generic chatbot when the real need is an agent that can sit near the work, understand the workflow, call the right tools, produce visible outputs, and stop when human approval is required.
A sales team might need an agent that reviews inbound leads, opens the CRM, fills missing fields, drafts a follow-up, and leaves the final send for a human. An operations team might need an agent that reads the screen, checks a dashboard, creates a task, and writes a short handoff note. A support team might need an agent that summarizes a ticket, searches SOPs, drafts a reply, and escalates edge cases.
The first version should be narrow. Pick one repeated workflow. Define the inputs, tools, approval rules, output format, and failure path. Build the smallest agent that can complete that workflow reliably. Then measure whether it saves time, reduces missed follow-up, improves record quality, or removes repeated manual work.
Aira shows what a custom AI agent can look like when the build is tied to a real operating environment. The useful part is the connection between voice, tools, screen context, local memory, and visible delivery. The same structure can be adapted for a business when the workflow is specific enough to build around.
For a business or personal workflow, the starting point is a short mapping session. We identify one task that repeats, decide which tools the agent needs, set the approval rules, and build a first version that can be tested against real work.