Worksona Knowledge Graph: How Agents Remember Across 254 Projects

26 March 2026#worksona#portfolio#knowledge-graph#agents#memory#retrieval

Worksona KG is a hybrid graph-and-vector retrieval platform that makes the entire Worksona portfolio queryable from a single chat interface. It ingests knowledge from 254+ projects and stores them as a connected graph in Neo4j Aura — a cloud-hosted graph database. On top of the graph layer sits a vector search index backed by OpenAI embeddings. Both layers are queried together on every request, and an LLM (Anthropic Claude, OpenAI GPT, or Google Gemini, selectable at runtime) synthesises the combined results into a readable answer.

What is it?

The data model is deliberately simple. There are four node types: Project, Technology, Concept, and Directory. There are six relationship types: USES, IMPLEMENTS, EVOLVED_INTO, DEPENDS_ON, FEEDS_INTO, and IN_DIRECTORY. The canonical graph artifact — portfolio-graph.json — currently encodes 230 nodes and 772 links. Every project ingested by the companion kg-builder tool adds nodes and edges to this artifact, which is then loaded into Neo4j.

The application itself is a Next.js 15 App Router project with a streaming chat UI built on Vercel AI SDK 6. There is also an embedded CodeMirror 6 editor that lets power users write and execute Cypher queries directly against the graph.

Why is it useful?

The honest answer to "why not just use a document search?" is that document search cannot answer relational questions. A query like "which projects evolved from Atlas and share AI concepts with worksona-api?" has no useful document-search equivalent. The answer requires traversing typed edges across the graph, not ranking paragraphs by cosine similarity.

At the same time, graph traversal alone cannot answer semantic questions — "show me projects that deal with memory persistence, even if they don't use the word" — because those connections exist as meaning, not as explicit edges. Combining both retrieval modes produces answers that neither alone could surface.

There is also a compounding effect. Every project added to the graph does not just make that project queryable; it adds edges that connect to every other project sharing technologies, concepts, or lineage. Knowledge from project 255 makes project 1 more queryable. The graph grows denser with each addition.

How and where does it apply?

The ingestion pipeline runs as a CLI process. It is a ten-stage process: parse the manifest, parse markdown, chunk text into ~500-token segments, run a fast regex extractor, run Claude Haiku for entity extraction with confidence scores, normalise aliases, embed each chunk with text-embedding-3-small, write to Neo4j with confidence gating (entities scoring below 0.60 are dropped; 0.60–0.79 enter a review queue; 0.80+ are written immediately), then build the vector and full-text indexes.

A concrete Cypher query for the graph layer:

cypher

-- Find projects related to a given project via shared concepts
MATCH (p1:Project)-[:IMPLEMENTS]->(c:Concept)<-[:IMPLEMENTS]-(p2:Project)
WHERE p1.name = $name
RETURN p2.name, c.name AS sharedConcept
ORDER BY p2.name

The same query, issued through natural language in the chat interface, triggers the hybrid RAG layer automatically — users do not need to know Cypher to get relational answers.

One architectural detail worth noting: Worksona KG is itself one of the 254 projects indexed in the graph it queries. It appears as a node in portfolio-graph.json alongside all the other projects. The system indexes itself. This is not an accident — it means you can ask the system questions about the system, and it can answer.

Extension is straightforward: add a new Cypher query function in lib/graphrag/, or ingest a new project with pnpm ingest -- --path ~/WORKSONA/project-name. Neither requires touching the chat interface or the LLM layer.

𝕏 Post