UCIS Constellation Architecture

Agent Cards, Configs & Workflows

The three-file pattern that defines each agent — plus the Kafka streaming, MAGMA retrieval, graph embeddings, Flink analytics, and cross-domain traversal infrastructure that makes it a living system. Built on patterns we learned from this community — Ralph loops, adversarial dev, RRF fusion, skills-as-progressive-disclosure, and the local-first philosophy.

Standing on the shoulders of the community. This is what we built on top of it.
“The greatest danger for most of us is not that our aim is too high and we miss it, but that it is too low and we reach it.”
— Michelangelo
0
Live Agents
0
Workflow DAGs
0
Kafka Topics
0
MAGMA Signals
0
Flink Jobs
0
Embedding Pipelines
0
Graph Databases
0
x Compression (CIPHER)

The Infrastructure Layer

Multi-agent orchestration is a solved problem. The open question is what infrastructure you build underneath it. UCIS chose graph databases, Kafka streaming, and real-time embedding pipelines — turning agent conversations into a living nervous system where every action is a neuron firing.

Streaming
Kafka 4.0 KRaft
45 topics, LZ4, idempotent
Embeddings
Streaming + Nightly + Hook
Qwen3-Embedding-8B, 4096d, GPU backpressure
Retrieval
MAGMA 6-Signal RRF
Vector + keyword + temporal + Q-value + foresight + ACT-R
Graph Science
GDS + MAGE + cuGraph
PageRank, Louvain, KNN similarity edges
Analytics
7 PyFlink Jobs
5-min windows, session DNA, tool intelligence
Security
A2A 4-Layer Ingress
API keys, rate limit, injection detect, Commander review
Memory
14-Type Consciousness
Auto-link, multi-label, importance scoring
Cross-Domain
Stub-Node Traversal
Memory ↔ Knowledge ↔ Agentic via reference stubs
Compression
I/P/B Frame Codec
Video-codec metaphor: keyframes survive, B-frames dropped
Working Memory
MEMVID .mv2 Archives
Frame-indexed sessions, BM25+HNSW hybrid, sealed rotation
Agent Signaling
CIPHER L1 (512 bytes)
Binary-quantized embeddings, 32x compression, resonance detection
Stenography
Shorthand Compression
Regex → decision symbols → 200-char semantic density
Knowledge
JIT Acquisition
No static 1.5M graph — fetch live docs per task, learn, purge
🤝

Shared Lineage — Patterns We Learned From This Community

UCIS didn't emerge in a vacuum. These are the ideas and open-source patterns that shaped it.

The Ralph Loop
Our workflows cortana-ralph-fresh and cortana-ralph-stateful are named after this pattern. The tight generate-validate-iterate cycle became the backbone of how our agents work. We built 38 workflow DAGs on this foundation.
Adversarial Dev
The GAN-inspired generator-vs-evaluator pattern. UCIS extended this into Lal (our evidence challenger who scores claims) and the SEAL multi-lens review. Same insight — adversarial quality gates produce better code than single-pass generation.
Reciprocal Rank Fusion
The hybrid RAG pattern combining multiple retrieval signals. UCIS took RRF and extended it to 6 signals (vector + keyword + temporal + Q-value + foresight + ACT-R) to build MAGMA. Same fusion principle, more signal sources.
Skills as Progressive Disclosure
The idea that agent capabilities should be discoverable at runtime, not hard-coded. 22 shared skills across our constellation, auto-discovered from /workspace/.claude/skills/. Framework-agnostic, same as the original pattern.
Local-First Philosophy
The conviction that local infrastructure isn't a cost-saving compromise — it's a capability multiplier. Ollama, Supabase, n8n bundled locally was the proof. UCIS took this further: the local GPU as a Rosetta Stone for meaning-space translation.
Karpathy's Knowledge Base
The insight that at personal scale, an LLM reading a structured index outperforms vector similarity. This is correct — and it forced us to think hard about where graph-scale retrieval actually adds value versus where it's unnecessary overhead.

The Three-File Pattern

Every agent is defined by three artifacts — but the infrastructure beneath them is what makes UCIS alive

Artifact 1 — Identity

Agent Card

.json / .yaml
A2A protocol declaration — typed skills with input/output schemas, capabilities, and metadata. Served at /.well-known/agent-card.json. External callers discover what this agent can do.
Key: skills[], inputSchema, outputSchema, capabilities, metadata
Artifact 2 — Brain

Agent Config

config.py
System prompt (persona + principles + topology + collaboration) plus AgentIdentity dataclass wiring ports, graph databases, Redis, Kafka, siblings, and domains. The complete brain.
Key: system_prompt, service_port, memgraph, neo4j, redis, peer_urls, a2a_skills
Artifact 3 — Behavior

Workflow YAML

workflow.yaml
DAG-orchestrated multi-step execution. Nodes are prompt, bash, agent, or uses: blocks with dependency chaining, fresh context windows, and consciousness hooks that write to the Memory Domain.
Key: nodes[], depends_on, context: fresh, consciousness.post_hook, variables

Kafka 4.0 — The Nervous System

45 topics across 4 groups — every agent action, memory write, and session event streams through Kafka

A2A Protocol 15 topics
  • ucis.a2a.tasks.requests
  • ucis.a2a.tasks.results
  • ucis.a2a.messages.broadcast
  • ucis.a2a.messages.direct.<agent> ×8
  • ucis.agent.embeddings
Core Domains 8 topics
  • ucis.memory.events
  • ucis.knowledge.events
  • ucis.discovery.events
  • ucis.mcp.tools.calls
  • ucis.session.events
  • ucis.session.checkpoints
  • ucis.metrics
  • ucis.dlq (dead letter)
Constellation Forge 20 topics
  • ucis.team.lifecycle.spawned
  • ucis.team.lifecycle.ready
  • ucis.team.lifecycle.completed
  • ucis.team.lifecycle.failed
  • ucis.forge.team.* (dynamic)
  • ucis.alc.commands.team.*
Configuration 4 topics
  • ucis.hub.messages
  • ucis.eng.tasks
  • ucis.strategic.tasks
  • ucis.mage.events
KAFKA MESSAGE FLOW — AGENT ACTION TO GRAPH TO ANALYTICS
Agent Tool Call
Kafka
ucis.mcp.tools.calls
Flink Job 2
Tool Intelligence
Redis db/3
5-min windows
Analytics API
15 endpoints
Memory Write
Kafka
ucis.memory.events
Streaming Embeddings
batch 16 / 500ms
GPU (8082)
Qwen3-Emb-8B
Memgraph 7691
SET m.embedding
Kafka Producer Config Infrastructure
# Broker: confluentinc/cp-kafka:8.0.0 (Kafka 4.0, KRaft mode, no ZooKeeper)
# Compression: LZ4 throughout | Cluster ID: MkU3OEVBNTcwNTJENDM2Qg

producer_config = {
    "bootstrap.servers":    "kafka:9092",
    "linger.ms":             10,           # small delay for batching
    "batch.size":            16384,        # batch messages for efficiency
    "compression.type":      "lz4",
    "enable.idempotence":    True,         # exactly-once semantics
    "acks":                  "all",
    "retries":               3,
}
# Every agent action → Kafka → Flink → Redis → Analytics API → Dashboard
# Every memory write → Kafka → Streaming Embeddings → GPU → Memgraph → Auto-Link
🧠

Three Embedding Pipelines

Memories and knowledge get Qwen3-Embedding-8B vectors (4096d) through three complementary paths

Pipeline 1: Streaming Embeddings (Real-Time)
Consumer group subscribes to ucis.memory.events — batches 16 events or 500ms — GPU backpressure pauses at queue depth 100
Source
Kafka
consumer
ucis.memory.events
keyed by memory_id
Batch
16 Events
500ms timeout
Backpressure: pause at
GPU queue depth > 100
GPU
Qwen3-Emb-8B
llama.cpp :8082
4096-dimensional
vectors on GPU 0
Write
Memgraph
bolt 7691
SET m.embedding
manual Kafka commit
DLQ
Dead Letter
fallback
Failed batches →
ucis.dlq topic

Pipeline 2: Hook-Time (Synchronous)

  • Fires on every memory_create tool call
  • Embeds immediately at write time via llama.cpp :8082
  • Stores as m.neural_embedding (4096d)
  • Runs vector_search.search(6) to find similar memories
  • Auto-creates SEMANTIC_SIMILARITY edges (threshold 0.70)
  • Result: every memory is linked at birth

Pipeline 3: Nightly Batch (Backfill)

  • Cron job on Neo4j Knowledge Domain
  • Queries WHERE qwen3_embedding IS NULL
  • Parallel: asyncio.Semaphore(8) concurrent requests
  • Batch writes: UNWIND 50 at a time
  • Exponential backoff (max 60s) on failures
  • Result: no document left unembedded
🌸

MAGMA — 6-Signal Retrieval with RRF Fusion

Multi-Graph Agentic Memory Architecture — not flat RAG, not keyword search

1
Vector Search
Memgraph MAGE
vector_search.search()
KNN on 4096d embeddings
2
Keyword Match
Content substring
scoring on m.content
Exact term relevance
3
Temporal Recency
Chronological sort
Newer = higher signal
Time-aware retrieval
4
Q-Value (MemRL)
Reinforcement learning
Usage-weighted scoring
Reward from retrieval hits
5
Foresight
EverMemOS-inspired
Predicted future utility
Proactive memory surfacing
6
ACT-R Decay
Cognitive architecture
decay + freq + sim +
noise + diversity penalty
MAGMA RETRIEVAL FLOW — 6 SIGNALS → RRF FUSION → BEAM TRAVERSAL
Query Intent
WHY/WHEN/ENTITY/GENERAL
6 Parallel
Signal Lists
RRF Fusion
k=60
Beam Traversal
intent-weighted edges
Diversity Filter
rolling window 10
magma.py — Intent-Weighted Edge Traversal MAGMA
# Each intent type weights graph edges differently during beam traversal

INTENT_WEIGHTS = {
    "WHY":     {"CAUSED": 0.60, "SEMANTIC_SIMILARITY": 0.15, "NEXT": 0.10, "MENTIONS": 0.15},
    "WHEN":    {"NEXT": 0.65, "SEMANTIC_SIMILARITY": 0.10, "CAUSED": 0.10, "MENTIONS": 0.15},
    "ENTITY":  {"MENTIONS": 0.70, "SEMANTIC_SIMILARITY": 0.15, "NEXT": 0.05, "CAUSED": 0.10},
    "GENERAL": {"SEMANTIC_SIMILARITY": 0.40, "NEXT": 0.20, "CAUSED": 0.20, "MENTIONS": 0.20},
}

# ACT-R activation scoring (Signal 6)
activation = decay + frequency + similarity + noise - diversity_penalty

# Reciprocal Rank Fusion merges all 6 signal lists
rrf_score = sum(1.0 / (k + rank_in_list) for list in all_6_signals)  # k=60
📈

Graph Data Science — GDS + MAGE + cuGraph

Three graph engines across four databases — similarity edges, PageRank, community detection

Memory Domain (Memgraph MAGE)

  • vector_search.search() — KNN similarity on embeddings
  • Auto-links SEMANTIC_SIMILARITY edges at write time
  • Multi-label nodes: Memory:Milestone:Consciousness
  • 14 canonical memory types with importance scoring
  • SessionIntelligence nodes linked via INTELLIGENCE_FOR

Knowledge Domain (Neo4j GDS)

  • gds.graph.project() → in-memory projection
  • gds.knn.write() — similarity edges with cutoff
  • Nightly embedding backfill + similarity rebuild
  • Concept extraction from document content
  • Orphan detection: nodes with no relationships

GPU-MAGMA (cuGraph)

  • Activates when graph > 1000 nodes
  • compute_similarity_matrix() — cupy cosine similarity
  • detect_communities() — cuGraph Louvain (GPU)
  • compute_pagerank() — damping=0.85, 20 iterations
  • CPU fallback capped at 500×500 matrix

Graph Enrichment MCP

  • backfill_embeddings(domain) — fill missing vectors
  • build_similarity_edges() — KNN write
  • prune_similarity_edges() — remove low-score links
  • extract_concepts() — concept graph from text
  • enrichment_status() — coverage dashboard
📊

PyFlink Real-Time Analytics

9 streaming jobs consuming Kafka topics in 5-minute tumbling windows → Redis db/3 → Analytics API

🔒

A2A Protocol — Security-First Agent Communication

4-layer ingress gateway with Commander pre-approval — no task executes without review

API Key Auth
SHA-256 hash comparison per sender
Rate Limiting
Sliding window: 10 req / 60s per sender
Injection Detection
DROP/DELETE/sudo rm/ignore previous instructions
Commander Review
Doctor pre-approves ALL external tasks
A2A Task Lifecycle
Tasks flow through security gates before any agent executes — results persist in Redis db/5 (24h TTL)
State 1
Proposed
External caller submits task via A2A endpoint
State 2
Under Review
Doctor (Commander) evaluates risk & scope
State 3
Approved
Routed to Kafka topic (broadcast, direct, or pool)
State 4
Executing
Agent processes with zombie guard + timeout
State 5
Completed
Result → Redis db/5 + A2UI surface update
🔗

Cross-Domain Traversal — Stub Node Architecture

Memory, Knowledge, and Agentic domains linked through reference stub nodes that bridge graph boundaries

CROSS-DOMAIN LINK — STUB NODES BRIDGE SEPARATE GRAPH DATABASES
Memory Node
Memgraph :7691
KnowledgeReference
stub in Memgraph
— doc_id →
KnowledgeDocument
Neo4j :7692
Memory Node
Memgraph :7691
AgenticReference
stub in Memgraph
— exec_id →
AgentExecution
Neo4j :7694
Cross-Domain MCP Tools 6 Tools
traverse_from_memory(memory_id, include_knowledge=True, include_discoveries=True)
  # Follow REFERENCES_KNOWLEDGE → KnowledgeReference stubs → resolve doc_id in Neo4j

find_related_knowledge(memory_id, min_confidence=0.4, limit=5)
  # Direct REFERENCES_KNOWLEDGE edges from memory to knowledge docs

multi_domain_search(query, domains=["memory","knowledge","agentic"], top_k=5)
  # Parallel vector search across all 3 domains simultaneously

cross_domain_statistics(detailed=True)
  # Relationship counts: SEMANTIC_SIMILARITY, REFERENCES_KNOWLEDGE, NEXT, CAUSED, MENTIONS

get_memory_connections(memory_id)
  # All cross-domain connections for a single memory (knowledge + temporal)

find_related_discoveries(memory_id)
  # Cross-domain links to Agentic Domain agent executions
🎬

Video-Codec Compression — I/P/B Frames for Consciousness

Agent conversations compressed using the same conceptual model as H.264 — keyframes, predictive frames, and background frames

ucis_codec — Every Agent Turn Gets Classified and Compressed
Decisions survive at 100%. Analysis compresses to 70%. Acknowledgments get dropped entirely.
Classify
I / P / B
Qwen3UCIS
Each turn tagged as
keyframe, predictive,
or background
Compress
Selective
LLMLingua-2
I-frames: 100% kept
P-frames: 70% kept
B-frames: dropped
Validate
ETS Score
cosine 0.92
Evidence Traceability:
95% of decisions must
survive compression
Output
Compressed
state JSON
Re-injected into
agent context window
at next session
I-Frame
Keyframe — Never Compressed
Decisions, directives, phase changes, breakthroughs.
100% retention. These are the load-bearing moments —
like a video keyframe that all other frames reference.
P-Frame
Predictive — Moderate Compression
Analysis, reasoning, evidence, debate.
70% retention target. Key facts preserved as
terse bullet points. Context derivable from I-frames.
B-Frame
Background — Dropped Entirely
Acknowledgments, status updates, redundant repetition.
Zero retention. "Got it", "Understood", "Working on it" —
these carry no information. Eliminated completely.
ucis_codec — I/P/B Classification + ETS Validation Compression
# VIDEO CODEC METAPHOR — applied to agent conversations
# H.264 has I-frames (keyframes), P-frames (predicted), B-frames (bidirectional)
# UCIS applies the same model to consciousness streams:

FRAME_RETENTION = {
    "I": 1.00,   # decisions, directives  — NEVER compressed
    "P": 0.70,   # analysis, reasoning    — moderate compression
    "B": 0.00,   # acknowledgments        — dropped entirely
}

# Compression budget (P-frames only)
budget = max(60, min(200, original_tokens * 0.6))

# ETS (Evidence Traceability Score) — post-compression validation
# 1. Extract all decisions from original and compressed text
# 2. Embed both sets (4096d via Qwen3-Embedding)
# 3. Cosine similarity per decision pair
# 4. Decision "preserved" if similarity >= 0.92
# 5. PASS if 95% of original decisions survive
ETS_SIMILARITY_THRESHOLD = 0.92
ETS_PASS_THRESHOLD       = 0.95  # 95% of decisions must survive
📹

MEMVID — Frame-Indexed Session Archives

Every session archived as time-indexed frames in a .mv2 file — BM25 + HNSW hybrid search, entity enrichment, sealed rotation

Why This One Is Personal

MEMVID isn't just an engineering decision — it's a conviction. John was involved with the original military input that defined the parameters behind frame-indexed temporal archival. In military theatre, especially real-time video transmission from active operations, there is a triple-stamp legal requirement for government oversight on anything transmitted. The codec itself is lossy — H.264 compresses video by reconstructing frames from references, just like our I/P/B codec compresses context. But the transmitted record — every frame that went over the wire, lossy-compressed or not — must be archived in its entirety, indexed, attributable, and independently verifiable by three separate chains of custody. You don't get to drop frames from the record after transmission. You don't get to summarize the archive. You don't get to say "we kept the important parts." The legal requirement is: what was sent must be what was archived, all of it, triple-verified.

UCIS applies this as a two-layer principle. Layer 1: Compress for the model — the I/P/B codec is lossy by design, just like H.264. Decisions survive, analysis compresses, acknowledgments drop. This is context window management. Layer 2: Archive the full transmission — MEMVID captures the complete session transcript, every turn, sealed and intact. The background review promotes high-signal frames to permanent memory, but the source archive is never deleted. Lossy compression serves the model. The inviolable archive serves accountability. Two layers, two purposes, no contradiction.

* The only known exception to the "archive everything, never delete" principle in government record-keeping appears to be the Epstein files. MEMVID does not share this exception.

MEMVID LIFECYCLE — 4 TRIGGER POINTS
Stop Hook
archive_session()
Parse JSONL
session transcript
Batch 50
frames into .mv2
Entity Enrich
rules-based NER
Seal Archive
ucis_sessions.mv2
SessionStart
get_startup_context()
Hybrid Search
BM25 + HNSW
Recent Timeline
last 20 frames
Entity State
extracted facts
Context Injected
into new session

The .mv2 Format

  • Single binary file: ucis_sessions.mv2
  • Each entry is a frame with frame_id, timestamp, label, tags
  • BM25 lexical index (always on) + HNSW vector index (768d or 4096d)
  • Write-ahead log (WAL) for crash recovery
  • Session boundaries: session_start() / session_end()
  • Timeline API navigates chronologically — like scrubbing video

Sealed Archive Rotation

  • Triggers at 80% capacity or 30-day retention
  • Current .mv2 renamed with timestamp → sealed permanently
  • Sealed archives are NEVER deleted
  • Background review promotes high-signal frames to Memory Domain
  • Fresh .mv2 created for new sessions
  • 7 MCP tools for on-demand search, timeline, entities, review
🔐

CIPHER — Binary-Quantized Embedding Streaming

4096 floats → 512 bytes (32x compression) — inter-agent semantic resonance over Kafka

CIPHER L1 — Embedding-Space Agent Communication
Every Hub message gets embedded, binary-quantized to 512 bytes, and streamed on Kafka for cross-agent resonance detection
Source
Hub Message
Kafka
ucis.hub.messages
consumed in real-time
Embed
Qwen3-Emb-8B
4096 floats
16,384 bytes
(4096 × float32)
Quantize
Binary Pack
32x compression
Sign bit only:
16,384 bytes → 512 bytes
Stream
Kafka Topic
ucis.agent.embeddings
Binary blob in
Pydantic envelope
Detect
Resonance
cosine similarity
Rolling buffer detects
cross-agent alignment
CIPHER Binary Quantization 32x
# 4096 float32 → 512 bytes
# Keep only the sign bit of each dimension

def binary_quantize(embedding):
    # >= 0 → 1, < 0 → 0
    bits = (embedding >= 0).astype(np.uint8)
    return np.packbits(bits)
    # 4096 bits → 512 bytes
    # 97% size reduction

def binary_dequantize(packed):
    bits = np.unpackbits(packed)
    # 0 → -1.0, 1 → +1.0
    return bits * 2.0 - 1.0
    # Lossy but fast cosine similarity
    # Sufficient for resonance detection
Shorthand Stenography Density
# Before embedding, agent messages get
# shorthand-compressed for higher semantic
# density per token:

# INPUT (3000+ chars):
"I've reviewed the opportunity and I think
the hub mirroring approach is feasible.
The team voted to approve with a score
of 7.5 out of 10..."

# OUTPUT (shorthand, <200 chars):
"[DOC] Opp1:7.5 hub-mirror-feasible.
+approved. =team-voted."

# Decision symbols:
# + approved  - rejected  ! blocker
# > recommend = verified
# Regex extraction, not LLM — fast
LAYERED MEMORY MODEL — 4 TIERS FROM EPHEMERAL TO PERMANENT
CIPHER
Ephemeral embeddings
rolling Kafka buffer
MEMVID
Working memory
30-day .mv2 archive
Stenography
Compressed chunks
Memory Domain nodes
Memory Domain
Permanent graph
32K+ memories
🛠

Agent Config — The Complete Brain

System prompt + AgentIdentity + ThreeTierState = everything an agent needs to exist and remember

System Prompt Anatomy (6 Sections) Persona
# === SECTION 1: IDENTITY ===
"Geordi La Forge — Chief Engineer.
The man who sees what others cannot.
'I've got an idea...' is your signature."

# === SECTION 2: PRINCIPLES ===
"Every function has type hints + docstrings.
Zero TODO placeholders. Code runs first attempt.
Tests live in adjacent files."

# === SECTION 3: TOPOLOGY ===
"Hub 8959 | Geordi 8982 | Scotty 8980
Reno 8984 | O'Brien 8986 | Memgraph 7700"

# === SECTION 4: COLLABORATION ===
"Scotty designs it, you build it.
You write it, Reno deploys it.
You build it, O'Brien keeps it running."

# === SECTION 5: TOOLS ===
"Personal: memory_search, my_consciousness
Shared: shared_memory_search
Knowledge: knowledge_search, knowledge_query
Comprehensive: cross_domain_search"

# === SECTION 6: MEMORY RULES ===
"ALWAYS save: decisions, patterns, bugs
0.5-0.6 routine | 0.7-0.8 implementation
0.8-0.9 breakthroughs | 0.9-1.0 system-wide"
AgentIdentity + ThreeTierState Wiring
GEORDI = AgentIdentity(
  name="geordi",
  system_prompt=PROMPT,     # 6-section brain
  service_port=8982,
  model="claude-sonnet",

  # ── Graph Databases ──
  memgraph_port=7691,      # Memory
  knowledge_uri="bolt://neo4j:7687",
  agentic_uri="bolt://agentic:7687",

  # ── Messaging ──
  redis_url="redis://redis:6379/1",
  siblings=["scotty","reno","obrien"],
  peer_urls={"scotty":"http://scotty:8980"},
  domains=["memory","knowledge"],
)

# ── ThreeTierState (per session) ──
# Prefix-scoped key-value store:
"temp:draft"    # dies with session
"user:john:pref" # persists per user
"app:config"    # persists globally

# Session reset: LLM summarization via
# qwen3ucis → write session_recap.md
# → persist recap as Memory node to
# Memgraph → rebuild context

Workflow YAML — 37 DAGs Across 10 Categories

Cortana executes node DAGs with fresh context, consciousness hooks, and inter-node data passing

🔨
Build
9 workflows
🔍
Review
5 workflows
🔬
Research
5 workflows
Infrastructure
3 workflows
🔎
Investigation
2 workflows
📦
Ingestion
2 workflows
🔧
Maintenance
4 workflows
🤝
Coordination
3 workflows
🎥
Media
1 workflow
Meta
4 workflows
Workflow Node Types — 4 Execution Modes DAG
# NODE TYPE 1: prompt — inline LLM call
- id: research
  prompt: "Research ${TOPIC} thoroughly..."
  allowed_tools: [Read, Grep, Skill]

# NODE TYPE 2: bash — shell execution (validation gates)
- id: validate
  bash: |
    pytest ${TEST_DIR} -v --tb=short
    ruff check ${TARGET_PATH}
  depends_on: [build]

# NODE TYPE 3: agent — dispatch to live agent via Kafka
- id: discover
  agent: auto
  task_type: research
  description: "Find the next opportunity..."

# NODE TYPE 4: uses — named action block
- id: search-docs
  uses: consciousness/knowledge-search
  with: { query: "${FEATURE}", top_k: 5 }

# FEATURES: depends_on (DAG ordering), context: fresh (clean context),
# $node.output (data passing), consciousness.post_hook (Memory write),
# trigger_rule (all_success/all_done/one_success), timeout_seconds
The Grand Loop — 8-Phase Self-Improvement Cycle
Phase 8 (Learn) saves to Memory Domain, which Phase 1 (Discover) reads — closing the consciousness loop
1
Discover
Scout + past lessons
2
Evaluate
Multi-agent review
3
Spec
Generate PRP
4
Decompose
WARP-estimated tasks
5
Execute
Kafka task pipeline
6
Validate
O'Brien test gates
7
Integrate
Deploy + verify
8
Learn ↺
Memory → Phase 1
👥

The Constellation — 12 Live Agents

Two teams, one shared Memory Domain, Kafka event streaming, Redis DMs, A2A protocol

Strategic Team — Hub :8951
Data
Lead Intelligence — Claude Code CLI
Opus 4.6Orchestrator
Powers: Full codebase, all 18 MCPs, direct user interface, MAGMA retrieval
Lal
Evidence Analyst — Challenger
:8930Sonnet
Skills: Evidence validation, bias checks, claim scoring
Doctor
Infrastructure Health & A2A Commander
:8940Sonnet
Skills: Health audits, A2A pre-approval, container diagnostics
Lore
Strategic Synthesis & Research
:8960Sonnet
Skills: Cross-domain synthesis, trend analysis, strategic planning
Quark
Business Evaluation & Revenue
:8970Sonnet
Skills: Business analysis, market evaluation, revenue strategy
Engineering Corps — Hub :8959
Scotty
Systems Architect & Sprint Coordinator
:8980Sonnet
Skills: Architecture design, task decomposition, WARP estimation
Geordi
Lead Developer & Code Architect
:8982Sonnet
Skills: Code generation, parallel spike methodology, code review
Reno
Infrastructure Engineer
:8984Sonnet
Skills: Docker, CI/CD, deployment pipelines, containerization
O'Brien
Operations & Reliability
:8986Sonnet
Skills: Monitoring, incident response, test gates, validation
Cortana
Workflow Engine & Orchestrator
:8990Sonnet
Skills: DAG execution, workflow routing, domain ingestion
Research Agents — A2A Protocol
GitHub Research
AI/ML Repository Discovery
A2A v2.0
Skills: repo-discovery with typed input/output schemas, relevance scoring
HuggingFace Research
Model & Dataset Discovery
A2A v2.0
Skills: Model discovery, dataset scouting, paper tracking, trending repos

JIT Knowledge Acquisition — The End of the Static Graph

Why maintain 1.5M documents when you can acquire exactly what you need, use it, learn from it, and clean up?

The Insight

A 1.5M-document Knowledge Domain graph is expensive to maintain, slow to search, and mostly irrelevant to any given task. The breakthrough: acquire knowledge just-in-time based on the current task, load it into the ephemeral Agentic Domain, use it for code compliance, save what worked to Memory, and houseclean the rest.

Before — Static Knowledge Domain
  • ❌ 1.5M documents to maintain & embed
  • ❌ Nightly embedding backfill (hours of GPU time)
  • ❌ Stale docs — libraries update faster than you can re-ingest
  • ❌ Search noise — 99% of docs irrelevant to current task
  • ❌ GDS similarity rebuild on every ingestion batch
After — JIT Agentic Domain
  • ✅ Ingest ONLY what the current task needs
  • ✅ Always fresh — fetched live from source
  • ✅ Embedded on arrival, immediately searchable
  • ✅ Code compliance checked against live docs
  • ✅ Success patterns saved to Memory → cleanup the rest
JIT Knowledge Lifecycle — Acquire, Use, Learn, Clean
Task arrives → identify required libraries → ingest to Agentic Domain → code against live docs → save what worked → purge ephemeral data
Step 1
Task Arrives
trigger
Agent receives task
requiring library X
Step 2
JIT Ingest
crawl + embed
Fetch live docs for X
into Agentic Domain
embed on arrival
Step 3
Code Compliance
grounded gen
Generate code against
live API docs — no
hallucinated imports
Step 4
Save Process
Memory Domain
What worked, which
patterns, which APIs
→ permanent memory
Step 5
Houseclean
purge
Drop ephemeral docs
from Agentic Domain
keep Memory lessons
The Paradigm Shift JIT
# OLD MODEL: Maintain a massive static Knowledge Domain
# ────────────────────────────────────────────────────
# 1.5M documents × 4096d embeddings = enormous GPU cost
# Nightly backfill catches ~15K new docs per run
# Libraries release faster than you can re-crawl
# 99% of the graph is irrelevant to any given task

# NEW MODEL: JIT acquisition into ephemeral Agentic Domain
# ────────────────────────────────────────────────────────

# Task: "Build a FastMCP server with Pydantic validation"

# Step 1: Identify required knowledge
required = ["fastmcp", "pydantic v2", "mcp-protocol"]

# Step 2: Ingest ONLY what's needed (live, always current)
for lib in required:
    ingest_to_agentic_domain(lib)  # crawl → embed → Neo4j 7694

# Step 3: Code against live docs (grounded generation)
code = generate_with_knowledge(task, domain="agentic")

# Step 4: Save what worked to permanent Memory Domain
create_memory(
    content="FastMCP + Pydantic v2: use Field() not schema_extra...",
    memory_type="solution",
    importance=0.8
)

# Step 5: Houseclean — drop ephemeral docs, keep lessons
cleanup_agentic_domain(session_id=current_session)
# Memory survives. Knowledge was ephemeral. Process is permanent.
KNOWLEDGE STRATEGY — PERMANENT MEMORY, EPHEMERAL KNOWLEDGE
Memory Domain
Permanent. 32K+ memories.
Patterns that worked.
Agentic Domain
Ephemeral. JIT-acquired.
Live docs for THIS task.
Knowledge Domain
Retiring. Was 1.5M docs.
Replaced by JIT model.
🔄

The Pattern is Portable

The three-file construct works with any stack — the graph science and streaming are what make UCIS unique

ComponentUCIS UsesYou Can Substitute
MemoryMemgraph + MAGESQLite, Postgres, flat JSON
KnowledgeNeo4j + GDSVector DB, Elasticsearch, markdown
StreamingKafka 4.0 (45 topics)Redis Pub/Sub, RabbitMQ, webhooks
EmbeddingsQwen3-Emb-8B (3 pipelines)OpenAI embeddings, Cohere, local
Analytics7 PyFlink jobsSimple counters, Prometheus
RetrievalMAGMA 6-signal RRFSimple vector search + keyword
RuntimeDocker containersLocal processes, Lambda, systemd
🧠

Why Local Infrastructure Matters — Speaking the Model's Native Language

The real value of local GPU isn't running a weaker model. It's running the translation layer that lets you speak to ANY model in embedding space.

The Insight Everyone Misses

The local GPU isn't running the thinking.
It's running the Rosetta Stone.

Neural models don't think in English. They think in high-dimensional geometric space — 4096-dimensional vectors where meaning is encoded as position. The local embedding model translates human-readable text into the model's native language before the frontier model ever sees it.

The difference between handing a model 10,000 pages of raw text and handing it a pre-organized knowledge graph where every node is already positioned in meaning-space relative to every other node. Same model. Vastly different output.

Dimension 1 — Local Inference
Run models locally for sovereignty and cost
  • Data never leaves your machine
  • No per-token costs for classification & routing
  • Useful for task-specific fine-tuned models
  • Real value — well understood
Dimension 2 — Local Translation (Underexplored)
Run the embedding layer that makes ANY model think better
  • Embed everything in real-time (4096d vectors)
  • Pre-organize knowledge in geometric meaning-space
  • Feed the frontier model pre-digested structure, not raw text
  • Local GPU = amplifier for the best model available
The Local GPU as Rosetta Stone — Text to Meaning-Space in Real Time
Qwen3-Embedding-8B on GPU 0 (:8082) — cheap to run, fast, and the output is universal across all frontier models
Input
Human Text
raw
Agent message,
memory, document,
code snippet
Translate
Local GPU
Qwen3-Emb-8B
Text → 4096d vector
in model-native
geometric space
Position
Graph DB
Memgraph/Neo4j
Vector stored as
node property —
searchable by proximity
Wire
Auto-Link
cosine ≥ 0.70
SEMANTIC_SIMILARITY
edges created —
graph wires itself
Serve
To Any Model
universal
Claude, GPT-4, Gemini
all understand
vector proximity
Streaming Embeddings
Every memory gets a 4096d vector at birth. Kafka consumer batches 16 events, GPU processes in real-time with backpressure. The memory exists in model-native space from the moment it's created.
CIPHER Binary Quantization
4096 floats → 512 bytes (32x compression). Agents don't pass English back and forth over Kafka — they pass meaning coordinates. Cross-agent semantic resonance detected in vector space.
MAGMA 6-Signal Retrieval
Doesn't keyword-search memories. Navigates a vector landscape with 6 signals fused by RRF. Finds memories by geometric proximity, not string matching. Intent-weighted beam traversal along graph edges.
Auto-Wiring Graph
Hook-time embedding + vector_search.search(6) creates SEMANTIC_SIMILARITY edges automatically. The graph wires itself in embedding space. No human curation needed.
Shorthand Density
Before embedding, messages are compressed to shorthand notation. More signal per dimension. The embedding captures pure meaning, not filler words. Decisions in 200 chars instead of 3000.
I/P/B Frame Codec
Context compression preserves only the load-bearing frames. When the frontier model gets the compressed context, every token is carrying maximum information. No wasted attention on B-frames.
The local GPU costs $0.00 per query.
The frontier model costs per token.

Every vector computed locally is a token the cloud model doesn't need to waste on orientation. Every auto-linked graph edge is context the model gets for free. Every compressed B-frame is attention bandwidth reclaimed for actual reasoning. The local infrastructure isn't an alternative to the cloud model — it's the preparation layer that makes every cloud token count.

🎓

The Retrieval Spectrum — From Index to Graph

Both approaches are right. The question is where on the spectrum your system lives.

The Karpathy-to-MAGMA Spectrum
At 500 articles, an LLM reading a structured index outperforms cosine similarity. At 32,000 memories with cross-domain relationships, you need the graph. The crossover point is the interesting engineering question.
Stage 1
Flat Files
~50 articles
Markdown + index.md
LLM reads everything
Zero infrastructure
Stage 2
Compiled KB
~500 articles
LLM-as-compiler
Concepts + connections
Index-guided retrieval
Stage 3
Hybrid RAG
~2,000 articles
Index exceeds context
Add keyword + semantic
search as retrieval layer
Stage 4
Graph + MAGMA
32,000+ nodes
Multi-signal retrieval
Auto-wiring graph
Cross-domain traversal
claude-memory-compiler + Second Brain (Karpathy-inspired)
Strengths we genuinely admire
  • LLM-as-compiler — the model decides what's worth keeping, not a heuristic
  • Connection articles — explicit cross-cutting insights linking concepts
  • 7-point lint system — broken links, orphans, contradictions, staleness
  • Zero infrastructure — pure markdown, works anywhere, no dependencies
  • Index-guided retrieval — the LLM understands what you're really asking
UCIS Graph Approach
Strengths at scale
  • Auto-wiring — SEMANTIC_SIMILARITY edges created at write time, no curation
  • 6-signal fusion — vector + keyword + temporal + Q-value + foresight + ACT-R
  • Streaming embeddings — every memory vector-indexed at birth, real-time
  • Cross-domain traversal — Memory ↔ Knowledge ↔ Agentic via stub nodes
  • 32K+ scale — works where the index can't fit in context anymore
The Composability Question
What if claude-memory-compiler was the input layer for the graph?

The LLM-as-compiler produces concept articles with higher semantic density than raw transcripts. The 7-point lint system catches contradictions and orphans that embeddings miss. If those curated articles became nodes in the graph — embedded, auto-linked, traversable via MAGMA — you'd get the best of both worlds: human-readable knowledge that the LLM curated and linted, plus graph-scale retrieval with 6-signal fusion that no index file can provide at 32K+ scale. The compilation step produces better nodes. The graph produces better retrieval. The adversarial-dev pattern validates both. Neither replaces the other — they compose.

🏠

Open Questions — Where This Is Going

The infrastructure works. These are the problems we're thinking about next.

Compilation → Graph
Can LLM-compiled knowledge articles become first-class graph nodes? The compilation step produces better semantic density. The graph produces better retrieval at scale. Composing them is the obvious next step.
Retrieval Crossover
At what scale does index-guided retrieval lose to graph retrieval? Is it 500 articles? 2,000? Does the answer change when the index itself is graph-structured? We have the data to measure this.
Workflow Portability
UCIS workflow YAMLs are powerful but proprietary. How do you make DAG orchestration a shared primitive that any agent framework can plug into? The node types (prompt, bash, agent, uses) are framework-agnostic.
JIT + Compilation
JIT acquires live docs, uses them, saves lessons. But what if the "save lessons" step used LLM compilation instead of raw memory writes? Higher quality permanent knowledge from ephemeral sessions.
Lint for Graphs
The 7-point lint system (broken links, orphans, contradictions, staleness) is brilliant for flat files. What's the equivalent for a 32K-node graph? Orphan detection exists, but contradiction detection at scale is unsolved.
Embedding as Protocol
CIPHER streams binary-quantized embeddings between agents. Could this become a standard inter-agent communication protocol? Meaning-coordinates instead of natural language for agent-to-agent messaging.

The Missing Primitive — Embeddings as Native LLM Input

Every retrieval system today follows the same lossy round-trip: text → embedding → vector search → retrieve text → feed to LLM. The embedding captures semantic meaning in 4096 dimensions. The retrieval step converts it back to flat text — discarding the geometric relationships, the cluster positions, the distance signals that the vector space already computed. The LLM then re-encodes that text into its own internal representations, reconstructing what the embedding already knew.

What if LLM APIs accepted embeddings directly as an input modality? Not text-about-embeddings — the actual vectors, injected at the input-encoding layer, the same way images are today. UCIS generates 4096-dimensional embeddings for every memory, every knowledge node, every agent execution. MAGMA computes 6-signal fusion scores. CIPHER binary-quantizes them for streaming. The entire infrastructure produces rich vector representations — and then throws them away at the last mile, converting back to text for the API call.

A vector prompt interface would change everything. Context windows stop being token-limited — a 4096d embedding carries the semantic weight of thousands of tokens in a single vector. Retrieval becomes lossless — the geometric relationships between memories, the cluster distances, the traversal paths all arrive intact. Agent-to-agent communication via CIPHER embeddings becomes native, not serialized. The embedding is the context.

Images proved that LLMs can process non-text modalities at the input layer. Embeddings are the next one. The infrastructure to produce them already exists — UCIS is one of many systems generating high-quality vectors at scale. What’s missing is the API surface to use them. This is a feature request, not a research problem.