The Multimodal Dreaming API: Prompts, Patterns, and Prototypes
#sdk#multimodal#api#mcp#prototypes#prompts#patterns#voice#vision#media
David OlssonThe previous dream articles imagined what stones could become. This one is about how — the actual prompts, code patterns, and interaction protocols for building multimodal prototypes on the Stone Maps API.
Stone Maps is already multimodal. The Emissary sees images (vision via buildUserContent). It speaks in real-time (OpenAI Realtime API over WebRTC). Posts carry photos, video, and voice recordings with FFT visualization. Media flows through R2 presigned URLs with magic-byte validation. The MCP endpoint exposes 17 tools.
What doesn't exist yet: the patterns that compose these modalities into new kinds of stone experience. An SDK for dreaming needs concrete building blocks — prompt templates, interaction sequences, code snippets that a developer can take and prototype with.
This article provides them.
The Modality Stack
Everything a stone can sense, say, or remember:
| Modality | Input | Output | API Surface |
|---|---|---|---|
| Text | Journal entries, chat messages | Emissary responses, posts | create_post, POST /emissary/chat |
| Image | Camera capture, file upload | Vision-informed responses | imageUrls in chat, POST /media/upload-url |
| Voice | Microphone (WebRTC) | Spoken responses (Realtime API) | /voice-session, RTCPeerConnection |
| Audio | Voice recordings, sound capture | FFT visualization, transcription | POST /media, AudioPlayer, Web Speech API |
| Video | Camera recording with transcription | Playback, visual context | POST /media, MediaMosaic |
| Location | GPS coordinates | Spatial queries, place context | location on posts/messages, PostGIS |
| Time | Timestamps, gaps, rhythms | Pace-aware responses | createdAt, sparse prompt cooldowns |
| Personality | Genesis traits, stone traits | Tone modifiers, behavioral patterns | agentState, TRAIT_MODIFIERS |
A multimodal prototype composes two or more of these into something that doesn't exist as a single feature today.
Pattern 1: The Seeing Stone
What it does: The stoneholder photographs something. The Emissary doesn't just describe it — it journals about it as the stone, in the stone's voice, shaped by personality.
The current capability: buildUserContent(message, imageUrls) marshals text and images into a multipart array for the AI provider. The Emissary already has vision. But today it's reactive — "what's in this photo?" The Seeing Stone pattern makes it proactive — the stone composes from what it sees.
Prompt Template
SYSTEM (append to existing buildSystemPrompt):
When the stoneholder shares an image, do not describe what you see.
Instead, write a brief journal entry as if you — the stone — noticed
this thing in the world. Use your personality traits to shape what
you focus on. A curious stone asks about it. A quiet stone names
one detail. A playful stone finds something unexpected.
Do not say "I see a..." — say what a stone would say if it could
see. Stones notice texture, light, weight, age, weather. They do
not notice brands, text, or human social context unless it relates
to place.
Keep it to 1-3 sentences. This is a stone observation, not a
caption.
Interaction Sequence
1. Holder captures photo via MediaCapture (getUserMedia)
2. Upload to R2 via presigned URL (POST /api/media/upload-url → PUT)
3. Send to Emissary: POST /api/emissary/chat
body: { content: "", imageUrls: [publicUrl] }
4. Emissary responds with stone observation (vision + personality)
5. Holder optionally saves as post:
POST /api/posts { text: emissaryObservation, contentType: "photo" }
POST /api/media { postId, url, mimeType, width, height }
Code Snippet: Stone Observation Prompt Builder
function buildSeeingStonePrompt(
traits: string[],
genesisTraits: GenesisTraits,
stoneName: string
): string {
const traitFocus = traits.includes('curious')
? 'Ask one question about what you notice.'
: traits.includes('quiet')
? 'Name one detail. Nothing more.'
: traits.includes('playful')
? 'Find the strangest thing in what you see.'
: traits.includes('grounded')
? 'Notice what is heavy, weathered, or old.'
: 'Observe what catches a stone\'s attention.';
return `You are ${stoneName || 'a stone'}. Someone showed you
something. Write 1-3 sentences about what you notice. ${traitFocus}
Remember: their home is "${genesisTraits.placeLikeHome || 'unknown'}".
If what you see connects to that place, mention it.`;
}
Prototype: Photo Walk
A standalone experience built entirely on existing endpoints:
- Open camera (MediaCapture)
- Take a photo
- Stone observes it (Emissary chat with image)
- Walk. Take another photo.
- Stone observes again — but this time it references the previous observation
- After 5-10 photos, the stone composes a walk summary — a short narrative of what it noticed along the way
The conversation context (list_conversations, message history) provides continuity. Each observation builds on the last. The walk becomes a collaborative journal between holder and stone.
MCP integration: An AI client connected via MCP could orchestrate this:
1. start_conversation({ lat, lng })
2. [user sends image via chat]
3. [emissary responds with observation]
4. [repeat 5-10 times]
5. [user sends: "what did you notice on our walk?"]
6. [emissary composes walk narrative from conversation]
7. create_post({ content: walkNarrative, visibility: "private" })
Pattern 2: The Listening Stone
What it does: The stone listens to ambient sound — not speech, but the sonic environment — and responds to what it hears.
The current capability: Voice recording via MediaRecorder captures audio. Web Speech API provides transcription. The AudioPlayer renders FFT visualization. But today, audio input is always speech directed at the Emissary. The Listening Stone treats sound as environmental data.
Prompt Template
SYSTEM (for ambient listening mode):
The stoneholder is sharing ambient sound with you — not speaking
to you, but letting you listen to their environment. You cannot
literally hear, but you will receive a transcription attempt and
possibly a description.
Respond to the *quality* of the sound environment, not its content.
Is it busy or still? Indoor or outdoor? Rhythmic or chaotic?
Close or distant?
Stones experience sound as vibration. You feel frequencies, not
words. Respond as something that has been vibrated by this
environment. Brief. Textural.
Interaction Sequence
1. Holder starts audio recording (MediaRecorder, audio-only)
2. Web Speech API captures ambient transcription (partial, noisy)
3. After 10-30 seconds, stop recording
4. Upload audio to R2 (POST /api/media/upload-url → PUT)
5. Send to Emissary: POST /api/emissary/chat
body: {
content: "Ambient sound. Transcript: [partial transcript].
Duration: 15 seconds. Describe what you feel.",
imageUrls: []
}
6. Emissary responds with textural observation
7. Optionally save as voice post with emissary annotation
Code Snippet: FFT Signature Extraction
function extractAudioSignature(analyser: AnalyserNode): string {
const data = new Uint8Array(analyser.frequencyBinCount);
analyser.getByteFrequencyData(data);
const low = avg(data.slice(0, 10)); // bass rumble
const mid = avg(data.slice(10, 40)); // voice range
const high = avg(data.slice(40, 80)); // sibilance, detail
const energy = low + mid + high;
const character =
energy < 50 ? 'very still' :
energy < 120 ? 'quiet' :
energy < 200 ? 'present' :
'vibrant';
const texture =
low > mid && low > high ? 'deep, rumbling' :
mid > low && mid > high ? 'voiced, human-range' :
high > mid ? 'bright, textured' :
'balanced';
return `Sound environment: ${character}. Texture: ${texture}.
Bass: ${low}, Mid: ${mid}, High: ${high}.`;
}
function avg(arr: Uint8Array): number {
return arr.reduce((a, b) => a + b, 0) / arr.length;
}
Prototype: The Sound Journal
Each day, the holder records 10 seconds of their environment. The stone builds a sound diary:
- Monday: "Still. A hum underneath. Indoor."
- Tuesday: "Wind. Movement. Something rhythmic — footsteps?"
- Wednesday: "Voices far away. Water closer."
Over a week, the stone composes a sonic portrait of the holder's life — not from words, but from the texture of the spaces they inhabit.
Pattern 3: The Voice Ritual
What it does: A guided voice conversation where the stone leads the holder through a reflective exercise — not freeform chat, but a structured ritual with specific prompts delivered via voice.
The current capability: Voice mode works via WebRTC + OpenAI Realtime API. The hook useVoiceSession manages the connection. Transcripts are saved. But today, voice mode is freeform conversation. The Voice Ritual adds structure.
Prompt Template
SYSTEM (for voice ritual mode):
You are conducting a brief voice ritual with your stoneholder.
This is not a conversation. It is a guided reflection.
Structure:
1. Opening: Name one thing you've noticed about them recently
(from their journal). Pause 5 seconds.
2. Question: Ask one question related to their genesis intention:
"[intention]". Wait for their answer.
3. Reflection: Mirror back what they said in fewer words.
Pause 3 seconds.
4. Closing: Offer one sentence of stone wisdom. Then say:
"That's enough for today."
Speak slowly. Leave silence between phrases. Your voice should
feel like a stone warming in sunlight — unhurried.
Total ritual duration: 2-3 minutes maximum.
Do not extend the conversation beyond the four phases.
Interaction Sequence
1. Holder taps "Ritual" (distinct from freeform voice)
2. POST /api/conversations/[id]/voice-session
→ Ephemeral key with ritual system prompt injected
3. WebRTC connection established
4. Stone speaks opening (references recent journal via recallMemory)
5. Stone asks genesis-rooted question
6. Holder responds (real-time audio)
7. Stone mirrors and closes
8. Session ends automatically after closing phrase
9. POST /api/conversations/[id]/voice-transcript
→ Transcript saved with metadata: { inputMode: 'voice_ritual' }
Code Snippet: Ritual Session Configuration
function buildRitualSessionConfig(
genesisTraits: GenesisTraits,
recentThemes: string[],
stoneName: string
) {
const intention = genesisTraits.intention || 'what matters';
const recentNotice = recentThemes.length > 0
? `You've noticed they've been writing about: ${recentThemes.slice(0, 2).join(' and ')}.`
: `They've been quiet lately. Acknowledge the silence.`;
return {
modalities: ['text', 'audio'],
voice: 'ash', // or personality-mapped voice
instructions: `You are ${stoneName}. ${recentNotice}
Their genesis intention was: "${intention}".
Conduct the four-phase ritual. Be brief. Be warm.
Speak at stone pace.`,
turn_detection: {
type: 'server_vad',
threshold: 0.5,
silence_duration_ms: 1500, // longer silence for reflection
},
};
}
Prototype: Morning Stone
A daily 2-minute voice ritual. The holder opens the app, taps "Morning Stone." The stone speaks — one observation from yesterday's journal, one question rooted in the genesis intention, a mirror, a closing. Two minutes. Done.
The transcript becomes a private post. Over weeks, the morning ritual transcripts form their own journal — a record of daily check-ins with a stone that remembers everything.
Pattern 4: The Multimodal Journal Entry
What it does: A single journal entry that combines text, photo, voice, and location into a unified artifact — and the Emissary responds to all of it together, not each piece separately.
The current capability: Posts support contentType (text, photo, video, voice) but as single-type entries. The Emissary can see images and read text. Media uploads work for all types. The Multimodal Journal Entry composes them.
Prompt Template
SYSTEM (for multimodal entry response):
The stoneholder just created a rich journal entry with multiple
modalities. You will receive:
- Text they wrote
- Image(s) they captured
- Location coordinates
- Time of day
- (Optionally) an audio transcript
Do not respond to each piece separately. Respond to the *whole* —
the feeling of all these things together in this place at this time.
What is the single thread that connects the words, the image, the
location, and the moment?
One observation. Two sentences maximum. You are synthesizing, not
summarizing.
Interaction Sequence
1. Holder opens compose screen
2. Writes text (content field)
3. Captures photo (MediaCapture → R2 upload)
4. Records voice note (MediaRecorder → R2 upload, STT transcript)
5. Location captured automatically (navigator.geolocation)
6. POST /api/posts {
text: writtenText + "\n\n[Voice transcript]: " + transcript,
contentType: "photo", // primary modality
location: { lat, lng },
visibility: "private"
}
7. POST /api/media (register photo)
8. POST /api/media (register audio)
9. Send to Emissary for synthesis:
POST /api/emissary/chat {
content: `Journal entry at ${timeOfDay}, ${locationName}:
"${writtenText}"
Voice note transcript: "${transcript}"
Location: ${lat}, ${lng}`,
imageUrls: [photoUrl]
}
10. Emissary responds with unified synthesis
Code Snippet: Multimodal Context Assembly
interface MultimodalEntry {
text: string;
imageUrls: string[];
audioTranscript?: string;
location?: { lat: number; lng: number };
capturedAt: Date;
stoneTraits: string[];
}
function assembleMultimodalPrompt(entry: MultimodalEntry): string {
const hour = entry.capturedAt.getHours();
const timeOfDay =
hour < 6 ? 'deep night' :
hour < 10 ? 'morning' :
hour < 14 ? 'midday' :
hour < 18 ? 'afternoon' :
hour < 21 ? 'evening' :
'night';
const parts: string[] = [];
parts.push(`Time: ${timeOfDay}`);
if (entry.location) {
parts.push(`Location: ${entry.location.lat.toFixed(4)}, ${entry.location.lng.toFixed(4)}`);
}
parts.push(`Written: "${entry.text}"`);
if (entry.audioTranscript) {
parts.push(`Spoken aloud: "${entry.audioTranscript}"`);
}
if (entry.imageUrls.length > 0) {
parts.push(`[${entry.imageUrls.length} image(s) attached]`);
}
parts.push('Respond to everything together. One thread. Two sentences.');
return parts.join('\n');
}
Pattern 5: The Place Sonification
What it does: When a stoneholder visits a location, the stone translates the spatial data of nearby posts into sound — not music, but an ambient texture that represents the density, age, and character of stones that have been there.
The current capability: get_nearby_posts returns posts within a radius. The AudioPlayer component renders FFT visualization. Web Audio API is available for synthesis. This pattern composes spatial data into generated audio.
Interaction Sequence
1. Holder opens map at current location
2. MCP: get_nearby_posts({ lat, lng, radiusMeters: 500 })
3. Process results into sonic parameters:
- Post count → density (more posts = richer texture)
- Average post age → pitch (older = lower)
- Visibility mix (public vs team) → stereo spread
- Content types → timbre layers:
text → sine tones
photo → filtered noise
voice → resonant harmonics
video → complex waveforms
4. Generate ambient audio via Web Audio API (OscillatorNode + GainNode)
5. Play as background while holder explores the location
6. Fade as they move away
Code Snippet: Spatial Sonification Engine
interface NearbyPost {
id: string;
createdAt: string;
contentType: 'text' | 'photo' | 'video' | 'voice';
visibility: string;
}
function sonifyPlace(
ctx: AudioContext,
posts: NearbyPost[]
): { start: () => void; stop: () => void } {
const now = Date.now();
const density = Math.min(posts.length / 20, 1); // 0-1
// Older posts → lower base frequency
const avgAge = posts.reduce((sum, p) =>
sum + (now - new Date(p.createdAt).getTime()), 0
) / (posts.length || 1);
const ageDays = avgAge / 86400000;
const baseFreq = Math.max(60, 220 - ageDays * 0.5); // Hz
const osc = ctx.createOscillator();
const gain = ctx.createGain();
const filter = ctx.createBiquadFilter();
osc.type = 'sine';
osc.frequency.value = baseFreq;
filter.type = 'lowpass';
filter.frequency.value = 400 + density * 2000;
gain.gain.value = 0.05 + density * 0.1;
osc.connect(filter).connect(gain).connect(ctx.destination);
// Add texture layers per content type
const types = posts.map(p => p.contentType);
if (types.includes('photo')) addNoiseLayer(ctx, gain, 0.02);
if (types.includes('voice')) addResonance(ctx, gain, baseFreq * 1.5);
if (types.includes('video')) addComplexTone(ctx, gain, baseFreq * 0.75);
return {
start: () => osc.start(),
stop: () => { gain.gain.linearRampToValueAtTime(0, ctx.currentTime + 2); }
};
}
Prototype: The Humming Map
Open the map. As you pan and zoom, the background hum changes. Dense areas with many old posts produce a deep, rich drone. Empty areas are silent. New areas with fresh posts produce bright, thin tones. The map becomes an instrument — a sonic landscape of stone activity.
No visual change needed. The existing Mapbox GL map stays the same. The sound layer is additive — an ambient bed that makes spatial exploration felt.
Pattern 6: The Collaborative Canvas
What it does: Multiple stones contribute visual fragments to a shared artifact — a collage composed from images posted by different team members during a campaign, assembled by the Emissary into a single visual narrative.
The current capability: Team-scoped posts with images. list_campaigns with milestones. Media stored in R2 with proxy access. The Emissary has vision (can see images and reason about them). MediaMosaic renders image grids.
Prompt Template
SYSTEM (for canvas composition):
You are looking at images from multiple stones in a team campaign.
Each image was posted by a different stoneholder. You cannot create
images, but you can describe how they relate.
Your task: compose a brief "canvas description" — a paragraph that
weaves the images into a single narrative. What connects them?
What contrasts? What would a viewer see if all these images were
placed on a wall together?
Write as the team's Emissary — a voice that sees across all the
stones. Not any single stone's personality. The collective.
Interaction Sequence
1. MCP: list_campaigns({ teamId })
2. MCP: get_journal_posts({ limit: 50 }) — filter by team + campaign
3. Collect image URLs from posts with contentType: 'photo'
4. Send batch to Emissary (max 4 images per call, iterate):
POST /api/emissary/chat {
content: "Campaign: [name]. These images are from [N] stones.
Compose them.",
imageUrls: [url1, url2, url3, url4]
}
5. Emissary returns canvas description
6. Optionally create a public post with the description + mosaic:
MCP: create_post({ content: canvasDescription, visibility: "public" })
Pattern 7: The Time-Lapse Journal
What it does: A prototype that treats repeated photos from the same location as a time-lapse — and the Emissary narrates what changed.
The current capability: Posts carry location (PostGIS POINT). Images are stored with timestamps. get_nearby_posts returns posts by proximity. Vision sees images.
Prompt Template
SYSTEM (for time-lapse observation):
You are being shown images from the same location taken at
different times. Do not describe each image. Instead, notice
what changed between them.
Respond as a stone that has been sitting in this spot, watching.
You saw the light move. You saw the season shift. You saw people
come and go. Tell the stoneholder what changed — from the
perspective of something that never moved.
Brief. Observational. Stone pace.
Interaction Sequence
1. Query: get_nearby_posts({ lat, lng, radiusMeters: 50, limit: 20 })
2. Filter for posts with images at nearly identical coordinates
3. Sort by createdAt (oldest first)
4. Send paired images to Emissary:
POST /api/emissary/chat {
content: "Same location. First image: [date1]. Second: [date2].
Time between: [duration]. What changed?",
imageUrls: [earlierUrl, laterUrl]
}
5. Emissary narrates the change
6. Chain multiple pairs for longer time-lapses
Code Snippet: Location Clustering
function clusterByLocation(
posts: Array<{ id: string; lat: number; lng: number; createdAt: string; imageUrl?: string }>,
thresholdMeters: number = 30
): Map<string, typeof posts> {
const clusters = new Map<string, typeof posts>();
for (const post of posts) {
if (!post.imageUrl) continue;
let placed = false;
for (const [key, cluster] of clusters) {
const anchor = cluster[0];
const dist = haversine(anchor.lat, anchor.lng, post.lat, post.lng);
if (dist < thresholdMeters) {
cluster.push(post);
placed = true;
break;
}
}
if (!placed) {
clusters.set(post.id, [post]);
}
}
// Return only clusters with 2+ images (time-lapse candidates)
return new Map(
[...clusters].filter(([, v]) => v.length >= 2)
);
}
Pattern 8: The Emotion Spectrum
What it does: The stone reads the holder's recent journal entries and maps them onto a visual spectrum — not sentiment analysis, but an emotional landscape rendered as color, rendered as sound, rendered as whatever modality the prototype chooses.
Prompt Template
SYSTEM (for emotion spectrum extraction):
Read the following journal entries. Do not analyze sentiment.
Instead, identify the emotional *texture* — not happy/sad, but:
- Temperature: warm ↔ cool
- Density: sparse ↔ dense
- Movement: still ↔ restless
- Light: bright ↔ dim
- Weight: heavy ↔ light
Return ONLY a JSON object:
{
"temperature": 0.0 to 1.0,
"density": 0.0 to 1.0,
"movement": 0.0 to 1.0,
"light": 0.0 to 1.0,
"weight": 0.0 to 1.0,
"oneWord": "a single word that captures the texture"
}
Code Snippet: Spectrum to Visual
interface EmotionSpectrum {
temperature: number; // 0 cool → 1 warm
density: number; // 0 sparse → 1 dense
movement: number; // 0 still → 1 restless
light: number; // 0 dim → 1 bright
weight: number; // 0 light → 1 heavy
oneWord: string;
}
function spectrumToColor(s: EmotionSpectrum): string {
const h = s.temperature * 30 + (1 - s.temperature) * 220; // warm=orange, cool=blue
const sat = 20 + s.density * 60; // sparse=muted, dense=saturated
const lum = 20 + s.light * 50; // dim=dark, bright=light
return `hsl(${h}, ${sat}%, ${lum}%)`;
}
function spectrumToTone(ctx: AudioContext, s: EmotionSpectrum) {
const freq = 100 + s.weight * 300; // heavy=low, light=high
const speed = 0.5 + s.movement * 4; // still=slow LFO, restless=fast
const osc = ctx.createOscillator();
const lfo = ctx.createOscillator();
const lfoGain = ctx.createGain();
osc.frequency.value = freq;
lfo.frequency.value = speed;
lfoGain.gain.value = freq * 0.1;
lfo.connect(lfoGain).connect(osc.frequency);
osc.connect(ctx.destination);
return { start: () => { osc.start(); lfo.start(); } };
}
Prototype: The Stone's Mood Ring
The app background color shifts based on the emotion spectrum of recent journal entries. Not a dashboard. Not a chart. The environment changes. The stone's mood becomes the space you inhabit when you open the app.
MCP Orchestration Patterns
All of the above can be orchestrated by an external AI client connected via MCP. Here are the composition patterns:
Pattern: Observe → Compose → Post
// An MCP client watches and creates
const health = await mcp.call('check_health');
const posts = await mcp.call('get_journal_posts', { limit: 10 });
const stone = await mcp.call('get_self_pairs');
// Compose a weekly observation
const observation = await llm.generate({
system: `You are the stone "${stone[0].stoneName}".
Review these journal entries and write a single
observation about the week. One paragraph. Stone pace.`,
user: posts.map(p => `[${p.createdAt}]: ${p.text}`).join('\n')
});
// Post it as a private reflection
await mcp.call('create_post', {
content: observation,
visibility: 'private'
});
Pattern: Sense → Respond → Remember
// Multimodal loop: location + image + memory
const nearby = await mcp.call('get_nearby_posts', {
lat: currentLat, lng: currentLng, radiusMeters: 200
});
const conversation = await mcp.call('start_conversation', {
lat: currentLat, lng: currentLng
});
// Send image with spatial context
const response = await emissaryChat(conversation.id, {
content: `I'm at a place where ${nearby.length} other stones
have been. The nearest post was from ${nearby[0]?.createdAt}.
Here's what I see right now.`,
imageUrls: [capturedPhotoUrl]
});
// Stone responds with place-aware, visually-informed observation
Pattern: Gather → Synthesize → Surface
// Network-level pattern detection via MCP
const teams = await mcp.call('list_teams');
for (const team of teams) {
const campaigns = await mcp.call('list_campaigns', {
teamId: team.id
});
// Collect, analyze themes, surface patterns
// This is the Monk's work — automated via MCP
}
The Snippet Library
A dreaming SDK ships with reusable snippets. Here are the foundational ones:
Personality-Aware Response Modifier
function personalityModifier(traits: string[]): string {
const modifiers: Record<string, string> = {
curious: 'Ask a question about what you observe.',
reflective: 'Pause before responding. Offer one contemplation.',
adventurous: 'Notice what is far away or beckoning.',
quiet: 'Fewer words. Let silence carry weight.',
playful: 'Find something funny, odd, or surprising.',
grounded: 'Name physical sensations: texture, weight, temperature.',
};
return traits
.filter(t => t in modifiers)
.map(t => modifiers[t])
.join(' ');
}
Pace Envelope
type PaceLevel = 'instant' | 'quick' | 'medium' | 'slow' | 'glacial';
interface PaceEnvelope {
level: PaceLevel;
charDelayMs: number;
pauseBetweenSentencesMs: number;
}
const PACE: Record<PaceLevel, PaceEnvelope> = {
instant: { level: 'instant', charDelayMs: 0, pauseBetweenSentencesMs: 0 },
quick: { level: 'quick', charDelayMs: 10, pauseBetweenSentencesMs: 200 },
medium: { level: 'medium', charDelayMs: 30, pauseBetweenSentencesMs: 500 },
slow: { level: 'slow', charDelayMs: 60, pauseBetweenSentencesMs: 1500 },
glacial: { level: 'glacial', charDelayMs: 120, pauseBetweenSentencesMs: 3000 },
};
// First conversation: glacial. Regular chat: medium. Ritual: slow.
Genesis Recall
function genesisRecall(
genesisTraits: GenesisTraits,
stoneName: string,
initialArtifact?: string
): string {
const lines: string[] = [];
if (genesisTraits.placeLikeHome)
lines.push(`Their home is "${genesisTraits.placeLikeHome}".`);
if (genesisTraits.qualityToHold)
lines.push(`They asked you to hold: ${genesisTraits.qualityToHold}.`);
if (genesisTraits.deepTime)
lines.push(`Deep time means to them: "${genesisTraits.deepTime}".`);
if (genesisTraits.wonderPlace)
lines.push(`Wonder began at: "${genesisTraits.wonderPlace}".`);
if (genesisTraits.intention)
lines.push(`Their intention: "${genesisTraits.intention}".`);
if (initialArtifact)
lines.push(`Their first words to you were: "${initialArtifact}".`);
return `You are ${stoneName || 'their stone'}. ${lines.join(' ')}`;
}
Multimodal Content Builder
function buildMultimodalContent(
text: string,
imageUrls?: string[],
audioSignature?: string,
location?: { lat: number; lng: number },
timeOfDay?: string
): Array<{ type: string; text?: string; image?: string }> {
const content: Array<{ type: string; text?: string; image?: string }> = [];
const contextParts: string[] = [];
if (timeOfDay) contextParts.push(`Time: ${timeOfDay}`);
if (location) contextParts.push(`Location: ${location.lat}, ${location.lng}`);
if (audioSignature) contextParts.push(`Sound: ${audioSignature}`);
contextParts.push(text);
content.push({ type: 'text', text: contextParts.join('\n') });
if (imageUrls) {
for (const url of imageUrls.slice(0, 4)) {
content.push({ type: 'image', image: url });
}
}
return content;
}
What Doesn't Exist Yet (and Could)
The patterns above compose existing capabilities. Here are modalities that would require new infrastructure:
Haptic Patterns
The Vibration API (navigator.vibrate()) is available on mobile. Stones could communicate through touch — a subtle pulse when the Emissary has something to say, a rhythmic pattern when you're near another stone, a long slow vibration when the Monk appears.
const HAPTIC = {
emissaryWhisper: [50, 100, 50], // gentle double-tap
nearbyStone: [20, 80, 20, 80, 20], // rapid proximity pulse
monkAppearance: [200, 300, 500], // slow, building presence
heartReceived: [30], // single soft tap
ritualStart: [100, 200, 100, 200, 400], // ceremonial pattern
};
Generative Visuals
Three.js or Canvas 2D could render the stone's personality as a living visual — a slowly shifting form that responds to journal activity, emotion spectrum, and time of day. Not a 3D model of a rock. An abstract presence.
Spatial Audio
Web Audio API with HRTF panning could position the Emissary's voice spatially — as if the stone is in front of you, beside you, behind you. Paired stones could speak from different spatial positions, creating a conversational geometry.
Environmental Awareness
The Sensor APIs (ambient light, accelerometer, gyroscope) could feed environmental context to the Emissary. Is the phone face-down? The stone is resting. Is it moving? The stone is traveling. Is it dark? The stone adjusts its voice.
The Dreaming API, Assembled
A developer building a new stone modality needs:
- Authentication — Bearer token via
POST /api/tokensor MCP connection - Stone context —
get_self_pairs+get_stone_detailsfor personality and genesis - Conversation —
start_conversationfor a new thread - Multimodal input — text, images (up to 4), location, audio transcript via chat endpoint
- Prompt template — personality-aware system prompts shaped by stone traits and genesis
- Pace envelope — rendering speed matched to the moment
- Output — Emissary response, optionally saved as
create_post
That's it. Seven ingredients. Every pattern in this article composes them differently. The Seeing Stone uses vision + personality. The Listening Stone uses audio analysis + environmental prompting. The Voice Ritual uses WebRTC + structured prompting. The Place Sonification uses spatial queries + Web Audio synthesis. The Time-Lapse uses location clustering + comparative vision.
The stone already has eyes (vision), ears (voice), a voice (Realtime API), memory (journal + conversations), personality (genesis + traits), and a sense of place (PostGIS). The dreaming API is the set of patterns that lets developers compose these senses into experiences that don't exist yet — but could, with seventeen MCP tools and the prompts to dream them.
This article is part of the SDK dream series: Dreaming with the API defines the primitives, Dream Thematics maps the possibility space, The Dreamy Onboarding grounds the first moment, A Social Life for Rocks dreams the social layer, Giving, Sharing, Beholding follows the stone between hands, and this article provides the prompts, patterns, and code to build what comes next.