Image Analyst Agent: Nine Specialized Agents, Zero Infrastructure
#worksona#portfolio#document-ai#browser-native#zero-infrastructure#vision-ai
David OlssonWe built Image Analyst Agent as a fully browser-resident application for AI-powered image and document analysis. Users upload files through a drag-and-drop interface, select from nine specialized agents, and receive structured extraction or domain-specific analysis. Nothing runs on a server. There is no backend, no database service, and no build toolchain required.
Persistence is handled entirely by IndexedDB via Dexie.js, versioned through four schema migrations. LLM calls go directly from the browser tab through Worksona.js โ the portfolio's multi-provider abstraction layer for OpenAI, Anthropic, and Google.
Deployment requires only a static file server:
python3 -m http.server 8080
Why nine agents over one generalist?
A generalist model asked to "analyze this document" will produce something reasonable for every document type and optimal for none. The cost of that compromise compounds at batch scale.
Each of our nine agents is a JSON configuration file specifying provider, model, temperature, and a carefully crafted system prompt for one specific task. The ocr-specialist runs at temperature 0.0 for maximum determinism on raw text extraction. The research-analyst uses GPT-4o for synthesis and insight. Adding a new agent requires no code changes โ only a JSON file and a registry entry in agents/index.json. The architecture is additive by design.
| Agent | Purpose | Temperature |
|---|---|---|
image-analyzer | Full-structure extraction with layout context | 0.1 |
ocr-specialist | Raw text extraction, maximum determinism | 0.0 |
research-analyst | Content analysis and insight generation | โ |
text-analyzer | Sentiment, themes, key information | 0.3 |
markdown-specialist | Markdown enhancement and structure | 0.4 |
csv-analyst | Tabular data analysis | โ |
json-processor | JSON parsing, transformation, description | โ |
docx-converter | Word document conversion | โ |
report-generator | Structured report generation | โ |
Runtime prompt override is handled without mutating shared state. When a user enters a custom system prompt for a batch, the API zone synthesizes an ephemeral agent:
const ephemeralId = `${agentId}-custom-${Date.now()}`;
// Loaded into Worksona.js for exactly one LLM call
// Discarded after processing โ base agent unchanged
How the zero-infrastructure browser architecture works
The application is organized into seven zones communicating exclusively through a publish-subscribe observer pattern. There are no circular imports and no direct inter-module method calls outside defined public APIs.
The processing queue serializes all LLM calls โ one document at a time โ preventing rate-limit collisions and keeping browser memory usage under its 100 MB target. The queue supports pause, resume, and per-item retry from the Admin panel without re-uploading files.
The IndexedDB schema has been through four versioned migrations, including a v4 data migration that renames a field on all existing records without requiring users to clear storage. Processed documents survive page refresh and browser restart. The complete document library serializes to a portable JSON export and re-imports into any other browser instance.
Where it applies
Image Analyst Agent is the reference implementation for zero-infrastructure AI document processing in the Worksona portfolio. It uses Worksona.js as the AI abstraction layer โ making it the canonical example of how Worksona tools compose. The observer-pattern zone architecture, versioned IndexedDB schema, and JSON-defined agent system are all transferable patterns for browser-native AI applications.