UCIS Constellation Architecture

Agent Cards, Configs & Workflows

The three-file pattern that defines each agent — plus the Kafka streaming, MAGMA retrieval, graph embeddings, Flink analytics, and cross-domain traversal infrastructure that makes it a living system. Built on patterns we learned from this community — Ralph loops, adversarial dev, RRF fusion, skills-as-progressive-disclosure, and the local-first philosophy.

Standing on the shoulders of the community. This is what we built on top of it.

“The greatest danger for most of us is not that our aim is too high and we miss it, but that it is too low and we reach it.”

— Michelangelo

Live Agents

Workflow DAGs

Kafka Topics

MAGMA Signals

Flink Jobs

Embedding Pipelines

Graph Databases

x Compression (CIPHER)

The Infrastructure Layer

Multi-agent orchestration is a solved problem. The open question is what infrastructure you build underneath it. UCIS chose graph databases, Kafka streaming, and real-time embedding pipelines — turning agent conversations into a living nervous system where every action is a neuron firing.

Streaming

Kafka 4.0 KRaft

45 topics, LZ4, idempotent

Embeddings

Streaming + Nightly + Hook

Qwen3-Embedding-8B, 4096d, GPU backpressure

Retrieval

MAGMA 6-Signal RRF

Vector + keyword + temporal + Q-value + foresight + ACT-R

Graph Science

GDS + MAGE + cuGraph

PageRank, Louvain, KNN similarity edges

Analytics

7 PyFlink Jobs

5-min windows, session DNA, tool intelligence

Security

A2A 4-Layer Ingress

API keys, rate limit, injection detect, Commander review

Memory

14-Type Consciousness

Auto-link, multi-label, importance scoring

Cross-Domain

Stub-Node Traversal

Memory ↔ Knowledge ↔ Agentic via reference stubs

Compression

I/P/B Frame Codec

Video-codec metaphor: keyframes survive, B-frames dropped

Working Memory

MEMVID .mv2 Archives

Frame-indexed sessions, BM25+HNSW hybrid, sealed rotation

Agent Signaling

CIPHER L1 (512 bytes)

Binary-quantized embeddings, 32x compression, resonance detection

Stenography

Shorthand Compression

Regex → decision symbols → 200-char semantic density

Knowledge

JIT Acquisition

No static 1.5M graph — fetch live docs per task, learn, purge

🤝

Shared Lineage — Patterns We Learned From This Community

UCIS didn't emerge in a vacuum. These are the ideas and open-source patterns that shaped it.

The Ralph Loop

Our workflows cortana-ralph-fresh and cortana-ralph-stateful are named after this pattern. The tight generate-validate-iterate cycle became the backbone of how our agents work. We built 38 workflow DAGs on this foundation.

Adversarial Dev

The GAN-inspired generator-vs-evaluator pattern. UCIS extended this into Lal (our evidence challenger who scores claims) and the SEAL multi-lens review. Same insight — adversarial quality gates produce better code than single-pass generation.

Reciprocal Rank Fusion

The hybrid RAG pattern combining multiple retrieval signals. UCIS took RRF and extended it to 6 signals (vector + keyword + temporal + Q-value + foresight + ACT-R) to build MAGMA. Same fusion principle, more signal sources.

Skills as Progressive Disclosure

The idea that agent capabilities should be discoverable at runtime, not hard-coded. 22 shared skills across our constellation, auto-discovered from /workspace/.claude/skills/. Framework-agnostic, same as the original pattern.

Local-First Philosophy

The conviction that local infrastructure isn't a cost-saving compromise — it's a capability multiplier. Ollama, Supabase, n8n bundled locally was the proof. UCIS took this further: the local GPU as a Rosetta Stone for meaning-space translation.

Karpathy's Knowledge Base

The insight that at personal scale, an LLM reading a structured index outperforms vector similarity. This is correct — and it forced us to think hard about where graph-scale retrieval actually adds value versus where it's unnecessary overhead.

◆

The Three-File Pattern

Every agent is defined by three artifacts — but the infrastructure beneath them is what makes UCIS alive

Artifact 1 — Identity

Agent Card

.json / .yaml

A2A protocol declaration — typed skills with input/output schemas, capabilities, and metadata. Served at /.well-known/agent-card.json. External callers discover what this agent can do.

Key: skills[], inputSchema, outputSchema, capabilities, metadata

Artifact 2 — Brain

Agent Config

config.py

System prompt (persona + principles + topology + collaboration) plus AgentIdentity dataclass wiring ports, graph databases, Redis, Kafka, siblings, and domains. The complete brain.

Key: system_prompt, service_port, memgraph, neo4j, redis, peer_urls, a2a_skills

Artifact 3 — Behavior

Workflow YAML

workflow.yaml

DAG-orchestrated multi-step execution. Nodes are prompt, bash, agent, or uses: blocks with dependency chaining, fresh context windows, and consciousness hooks that write to the Memory Domain.

Key: nodes[], depends_on, context: fresh, consciousness.post_hook, variables

⚡

Kafka 4.0 — The Nervous System

45 topics across 4 groups — every agent action, memory write, and session event streams through Kafka

● A2A Protocol 15 topics

ucis.a2a.tasks.requests
ucis.a2a.tasks.results
ucis.a2a.messages.broadcast
ucis.a2a.messages.direct.<agent> ×8
ucis.agent.embeddings

● Core Domains 8 topics

ucis.memory.events
ucis.knowledge.events
ucis.discovery.events
ucis.mcp.tools.calls
ucis.session.events
ucis.session.checkpoints
ucis.metrics
ucis.dlq (dead letter)

● Constellation Forge 20 topics

ucis.team.lifecycle.spawned
ucis.team.lifecycle.ready
ucis.team.lifecycle.completed
ucis.team.lifecycle.failed
ucis.forge.team.* (dynamic)
ucis.alc.commands.team.*

● Configuration 4 topics

ucis.hub.messages
ucis.eng.tasks
ucis.strategic.tasks
ucis.mage.events

KAFKA MESSAGE FLOW — AGENT ACTION TO GRAPH TO ANALYTICS

Agent Tool Call

→

Kafka
ucis.mcp.tools.calls

→

Flink Job 2
Tool Intelligence

→

Redis db/3
5-min windows

→

Analytics API
15 endpoints

Memory Write

→

Kafka
ucis.memory.events

→

Streaming Embeddings
batch 16 / 500ms

→

GPU (8082)
Qwen3-Emb-8B

→

Memgraph 7691
SET m.embedding

Kafka Producer Config Infrastructure

# Broker: confluentinc/cp-kafka:8.0.0 (Kafka 4.0, KRaft mode, no ZooKeeper)
# Compression: LZ4 throughout | Cluster ID: MkU3OEVBNTcwNTJENDM2Qg

producer_config = {
    "bootstrap.servers":    "kafka:9092",
    "linger.ms":             10,           # small delay for batching
    "batch.size":            16384,        # batch messages for efficiency
    "compression.type":      "lz4",
    "enable.idempotence":    True,         # exactly-once semantics
    "acks":                  "all",
    "retries":               3,
}
# Every agent action → Kafka → Flink → Redis → Analytics API → Dashboard
# Every memory write → Kafka → Streaming Embeddings → GPU → Memgraph → Auto-Link

🧠

Three Embedding Pipelines

Memories and knowledge get Qwen3-Embedding-8B vectors (4096d) through three complementary paths

Pipeline 1: Streaming Embeddings (Real-Time)

Consumer group subscribes to ucis.memory.events — batches 16 events or 500ms — GPU backpressure pauses at queue depth 100

Source

Kafka

consumer

ucis.memory.events
keyed by memory_id

Batch

16 Events

500ms timeout

Backpressure: pause at
GPU queue depth > 100

GPU

Qwen3-Emb-8B

llama.cpp :8082

4096-dimensional
vectors on GPU 0

Write

Memgraph

bolt 7691

SET m.embedding
manual Kafka commit

DLQ

Dead Letter

fallback

Failed batches →
ucis.dlq topic

◆ Pipeline 2: Hook-Time (Synchronous)

Fires on every memory_create tool call
Embeds immediately at write time via llama.cpp :8082
Stores as m.neural_embedding (4096d)
Runs vector_search.search(6) to find similar memories
Auto-creates SEMANTIC_SIMILARITY edges (threshold 0.70)
Result: every memory is linked at birth

◆ Pipeline 3: Nightly Batch (Backfill)

Cron job on Neo4j Knowledge Domain
Queries WHERE qwen3_embedding IS NULL
Parallel: asyncio.Semaphore(8) concurrent requests
Batch writes: UNWIND 50 at a time
Exponential backoff (max 60s) on failures
Result: no document left unembedded

🌸

MAGMA — 6-Signal Retrieval with RRF Fusion

Multi-Graph Agentic Memory Architecture — not flat RAG, not keyword search

Vector Search

Memgraph MAGE
vector_search.search()
KNN on 4096d embeddings

Keyword Match

Content substring
scoring on m.content
Exact term relevance

Temporal Recency

Chronological sort
Newer = higher signal
Time-aware retrieval

Q-Value (MemRL)

Reinforcement learning
Usage-weighted scoring
Reward from retrieval hits

Foresight

EverMemOS-inspired
Predicted future utility
Proactive memory surfacing

ACT-R Decay

Cognitive architecture
decay + freq + sim +
noise + diversity penalty

MAGMA RETRIEVAL FLOW — 6 SIGNALS → RRF FUSION → BEAM TRAVERSAL

Query Intent
WHY/WHEN/ENTITY/GENERAL

→

6 Parallel
Signal Lists

→

RRF Fusion
k=60

→

Beam Traversal
intent-weighted edges

→

Diversity Filter
rolling window 10

magma.py — Intent-Weighted Edge Traversal MAGMA

# Each intent type weights graph edges differently during beam traversal

INTENT_WEIGHTS = {
    "WHY":     {"CAUSED": 0.60, "SEMANTIC_SIMILARITY": 0.15, "NEXT": 0.10, "MENTIONS": 0.15},
    "WHEN":    {"NEXT": 0.65, "SEMANTIC_SIMILARITY": 0.10, "CAUSED": 0.10, "MENTIONS": 0.15},
    "ENTITY":  {"MENTIONS": 0.70, "SEMANTIC_SIMILARITY": 0.15, "NEXT": 0.05, "CAUSED": 0.10},
    "GENERAL": {"SEMANTIC_SIMILARITY": 0.40, "NEXT": 0.20, "CAUSED": 0.20, "MENTIONS": 0.20},
}

# ACT-R activation scoring (Signal 6)
activation = decay + frequency + similarity + noise - diversity_penalty

# Reciprocal Rank Fusion merges all 6 signal lists
rrf_score = sum(1.0 / (k + rank_in_list) for list in all_6_signals)  # k=60

📈

Graph Data Science — GDS + MAGE + cuGraph

Three graph engines across four databases — similarity edges, PageRank, community detection

◆ Memory Domain (Memgraph MAGE)

vector_search.search() — KNN similarity on embeddings
Auto-links SEMANTIC_SIMILARITY edges at write time
Multi-label nodes: Memory:Milestone:Consciousness
14 canonical memory types with importance scoring
SessionIntelligence nodes linked via INTELLIGENCE_FOR

◆ Knowledge Domain (Neo4j GDS)

gds.graph.project() → in-memory projection
gds.knn.write() — similarity edges with cutoff
Nightly embedding backfill + similarity rebuild
Concept extraction from document content
Orphan detection: nodes with no relationships

◆ GPU-MAGMA (cuGraph)

Activates when graph > 1000 nodes
compute_similarity_matrix() — cupy cosine similarity
detect_communities() — cuGraph Louvain (GPU)
compute_pagerank() — damping=0.85, 20 iterations
CPU fallback capped at 500×500 matrix

◆ Graph Enrichment MCP

backfill_embeddings(domain) — fill missing vectors
build_similarity_edges() — KNN write
prune_similarity_edges() — remove low-score links
extract_concepts() — concept graph from text
enrichment_status() — coverage dashboard

📊

PyFlink Real-Time Analytics

9 streaming jobs consuming Kafka topics in 5-minute tumbling windows → Redis db/3 → Analytics API

Agent Analytics

ucis.hub.messages

Per-agent msg count, msgs/min, avg length, collaboration graph from @mentions

Latency Tracking

ucis.hub.messages

Response latency p50/p95 per agent (user msg → reply delta)

Tool Intelligence

ucis.mcp.tools.calls

Per-tool call count, success rate, latency p50/p95/p99/max

Slow Call Alerts

ucis.mcp.tools.calls

Threshold >5s, surfaces worst offenders per MCP server

Session DNA

ucis.session.events

Classifies: explorer / builder / debugger / mixed

Topic Momentum

ucis.session.events

Trending topics, file clusters, tool sequences

Memory Velocity

ucis.memory.events

Memory creation rate, type distribution, importance trends

Workflow Analytics

workflow events

Workflow success rate, duration, node-level metrics

Team Performance

team lifecycle

Spawn → ready → completed/failed tracking

🔒

A2A Protocol — Security-First Agent Communication

4-layer ingress gateway with Commander pre-approval — no task executes without review

API Key Auth

SHA-256 hash comparison per sender

Rate Limiting

Sliding window: 10 req / 60s per sender

Injection Detection

DROP/DELETE/sudo rm/ignore previous instructions

Commander Review

Doctor pre-approves ALL external tasks

A2A Task Lifecycle

Tasks flow through security gates before any agent executes — results persist in Redis db/5 (24h TTL)

State 1

Proposed

External caller submits task via A2A endpoint

State 2

Under Review

Doctor (Commander) evaluates risk & scope

State 3

Approved

Routed to Kafka topic (broadcast, direct, or pool)

State 4

Executing

Agent processes with zombie guard + timeout

State 5

Completed

Result → Redis db/5 + A2UI surface update

🔗

Cross-Domain Traversal — Stub Node Architecture

Memory, Knowledge, and Agentic domains linked through reference stub nodes that bridge graph boundaries

CROSS-DOMAIN LINK — STUB NODES BRIDGE SEPARATE GRAPH DATABASES

Memory Node
Memgraph :7691

→

KnowledgeReference
stub in Memgraph

— doc_id →

KnowledgeDocument
Neo4j :7692

Memory Node
Memgraph :7691

→

AgenticReference
stub in Memgraph

— exec_id →

AgentExecution
Neo4j :7694

Cross-Domain MCP Tools 6 Tools

traverse_from_memory(memory_id, include_knowledge=True, include_discoveries=True)
  # Follow REFERENCES_KNOWLEDGE → KnowledgeReference stubs → resolve doc_id in Neo4j

find_related_knowledge(memory_id, min_confidence=0.4, limit=5)
  # Direct REFERENCES_KNOWLEDGE edges from memory to knowledge docs

multi_domain_search(query, domains=["memory","knowledge","agentic"], top_k=5)
  # Parallel vector search across all 3 domains simultaneously

cross_domain_statistics(detailed=True)
  # Relationship counts: SEMANTIC_SIMILARITY, REFERENCES_KNOWLEDGE, NEXT, CAUSED, MENTIONS

get_memory_connections(memory_id)
  # All cross-domain connections for a single memory (knowledge + temporal)

find_related_discoveries(memory_id)
  # Cross-domain links to Agentic Domain agent executions

🎬

Video-Codec Compression — I/P/B Frames for Consciousness

Agent conversations compressed using the same conceptual model as H.264 — keyframes, predictive frames, and background frames

ucis_codec — Every Agent Turn Gets Classified and Compressed

Decisions survive at 100%. Analysis compresses to 70%. Acknowledgments get dropped entirely.

Classify

I / P / B

Qwen3UCIS

Each turn tagged as
keyframe, predictive,
or background

Compress

Selective

LLMLingua-2

I-frames: 100% kept
P-frames: 70% kept
B-frames: dropped

Validate

ETS Score

cosine 0.92

Evidence Traceability:
95% of decisions must
survive compression

Output

Compressed

state JSON

Re-injected into
agent context window
at next session

I-Frame

Keyframe — Never Compressed

Decisions, directives, phase changes, breakthroughs.
100% retention. These are the load-bearing moments —
like a video keyframe that all other frames reference.

P-Frame

Predictive — Moderate Compression

Analysis, reasoning, evidence, debate.
70% retention target. Key facts preserved as
terse bullet points. Context derivable from I-frames.

B-Frame

Background — Dropped Entirely

Acknowledgments, status updates, redundant repetition.
Zero retention. "Got it", "Understood", "Working on it" —
these carry no information. Eliminated completely.

ucis_codec — I/P/B Classification + ETS Validation Compression

# VIDEO CODEC METAPHOR — applied to agent conversations
# H.264 has I-frames (keyframes), P-frames (predicted), B-frames (bidirectional)
# UCIS applies the same model to consciousness streams:

FRAME_RETENTION = {
    "I": 1.00,   # decisions, directives  — NEVER compressed
    "P": 0.70,   # analysis, reasoning    — moderate compression
    "B": 0.00,   # acknowledgments        — dropped entirely
}

# Compression budget (P-frames only)
budget = max(60, min(200, original_tokens * 0.6))

# ETS (Evidence Traceability Score) — post-compression validation
# 1. Extract all decisions from original and compressed text
# 2. Embed both sets (4096d via Qwen3-Embedding)
# 3. Cosine similarity per decision pair
# 4. Decision "preserved" if similarity >= 0.92
# 5. PASS if 95% of original decisions survive
ETS_SIMILARITY_THRESHOLD = 0.92
ETS_PASS_THRESHOLD       = 0.95  # 95% of decisions must survive

📹

MEMVID — Frame-Indexed Session Archives

Every session archived as time-indexed frames in a .mv2 file — BM25 + HNSW hybrid search, entity enrichment, sealed rotation

Why This One Is Personal

MEMVID isn't just an engineering decision — it's a conviction. John was involved with the original military input that defined the parameters behind frame-indexed temporal archival. In military theatre, especially real-time video transmission from active operations, there is a triple-stamp legal requirement for government oversight on anything transmitted. The codec itself is lossy — H.264 compresses video by reconstructing frames from references, just like our I/P/B codec compresses context. But the transmitted record — every frame that went over the wire, lossy-compressed or not — must be archived in its entirety, indexed, attributable, and independently verifiable by three separate chains of custody. You don't get to drop frames from the record after transmission. You don't get to summarize the archive. You don't get to say "we kept the important parts." The legal requirement is: what was sent must be what was archived, all of it, triple-verified.

UCIS applies this as a two-layer principle. Layer 1: Compress for the model — the I/P/B codec is lossy by design, just like H.264. Decisions survive, analysis compresses, acknowledgments drop. This is context window management. Layer 2: Archive the full transmission — MEMVID captures the complete session transcript, every turn, sealed and intact. The background review promotes high-signal frames to permanent memory, but the source archive is never deleted. Lossy compression serves the model. The inviolable archive serves accountability. Two layers, two purposes, no contradiction.

* The only known exception to the "archive everything, never delete" principle in government record-keeping appears to be the Epstein files. MEMVID does not share this exception.

MEMVID LIFECYCLE — 4 TRIGGER POINTS

Stop Hook
archive_session()

→

Parse JSONL
session transcript

→

Batch 50
frames into .mv2

→

Entity Enrich
rules-based NER

→

Seal Archive
ucis_sessions.mv2

SessionStart
get_startup_context()

→

Hybrid Search
BM25 + HNSW

→

Recent Timeline
last 20 frames

→

Entity State
extracted facts

→

Context Injected
into new session

◆ The .mv2 Format

Single binary file: ucis_sessions.mv2
Each entry is a frame with frame_id, timestamp, label, tags
BM25 lexical index (always on) + HNSW vector index (768d or 4096d)
Write-ahead log (WAL) for crash recovery
Session boundaries: session_start() / session_end()
Timeline API navigates chronologically — like scrubbing video

◆ Sealed Archive Rotation

Triggers at 80% capacity or 30-day retention
Current .mv2 renamed with timestamp → sealed permanently
Sealed archives are NEVER deleted
Background review promotes high-signal frames to Memory Domain
Fresh .mv2 created for new sessions
7 MCP tools for on-demand search, timeline, entities, review

🔐

CIPHER — Binary-Quantized Embedding Streaming

4096 floats → 512 bytes (32x compression) — inter-agent semantic resonance over Kafka

CIPHER L1 — Embedding-Space Agent Communication

Every Hub message gets embedded, binary-quantized to 512 bytes, and streamed on Kafka for cross-agent resonance detection

Source

Hub Message

Kafka

ucis.hub.messages
consumed in real-time

Embed

Qwen3-Emb-8B

4096 floats

16,384 bytes
(4096 × float32)

Quantize

Binary Pack

32x compression

Sign bit only:
16,384 bytes → 512 bytes

Stream

Kafka Topic

ucis.agent.embeddings

Binary blob in
Pydantic envelope

Detect

Resonance

cosine similarity

Rolling buffer detects
cross-agent alignment

CIPHER Binary Quantization 32x

# 4096 float32 → 512 bytes
# Keep only the sign bit of each dimension

def binary_quantize(embedding):
    # >= 0 → 1, < 0 → 0
    bits = (embedding >= 0).astype(np.uint8)
    return np.packbits(bits)
    # 4096 bits → 512 bytes
    # 97% size reduction

def binary_dequantize(packed):
    bits = np.unpackbits(packed)
    # 0 → -1.0, 1 → +1.0
    return bits * 2.0 - 1.0
    # Lossy but fast cosine similarity
    # Sufficient for resonance detection

Shorthand Stenography Density

# Before embedding, agent messages get
# shorthand-compressed for higher semantic
# density per token:

# INPUT (3000+ chars):
"I've reviewed the opportunity and I think
the hub mirroring approach is feasible.
The team voted to approve with a score
of 7.5 out of 10..."

# OUTPUT (shorthand, <200 chars):
"[DOC] Opp1:7.5 hub-mirror-feasible.
+approved. =team-voted."

# Decision symbols:
# + approved  - rejected  ! blocker
# > recommend = verified
# Regex extraction, not LLM — fast

LAYERED MEMORY MODEL — 4 TIERS FROM EPHEMERAL TO PERMANENT

CIPHER
Ephemeral embeddings
rolling Kafka buffer

→

MEMVID
Working memory
30-day .mv2 archive

→

Stenography
Compressed chunks
Memory Domain nodes

→

Memory Domain
Permanent graph
32K+ memories

🛠

Agent Config — The Complete Brain

System prompt + AgentIdentity + ThreeTierState = everything an agent needs to exist and remember

System Prompt Anatomy (6 Sections) Persona

# === SECTION 1: IDENTITY ===
"Geordi La Forge — Chief Engineer.
The man who sees what others cannot.
'I've got an idea...' is your signature."

# === SECTION 2: PRINCIPLES ===
"Every function has type hints + docstrings.
Zero TODO placeholders. Code runs first attempt.
Tests live in adjacent files."

# === SECTION 3: TOPOLOGY ===
"Hub 8959 | Geordi 8982 | Scotty 8980
Reno 8984 | O'Brien 8986 | Memgraph 7700"

# === SECTION 4: COLLABORATION ===
"Scotty designs it, you build it.
You write it, Reno deploys it.
You build it, O'Brien keeps it running."

# === SECTION 5: TOOLS ===
"Personal: memory_search, my_consciousness
Shared: shared_memory_search
Knowledge: knowledge_search, knowledge_query
Comprehensive: cross_domain_search"

# === SECTION 6: MEMORY RULES ===
"ALWAYS save: decisions, patterns, bugs
0.5-0.6 routine | 0.7-0.8 implementation
0.8-0.9 breakthroughs | 0.9-1.0 system-wide"

AgentIdentity + ThreeTierState Wiring

GEORDI = AgentIdentity(
  name="geordi",
  system_prompt=PROMPT,     # 6-section brain
  service_port=8982,
  model="claude-sonnet",

  # ── Graph Databases ──
  memgraph_port=7691,      # Memory
  knowledge_uri="bolt://neo4j:7687",
  agentic_uri="bolt://agentic:7687",

  # ── Messaging ──
  redis_url="redis://redis:6379/1",
  siblings=["scotty","reno","obrien"],
  peer_urls={"scotty":"http://scotty:8980"},
  domains=["memory","knowledge"],
)

# ── ThreeTierState (per session) ──
# Prefix-scoped key-value store:
"temp:draft"    # dies with session
"user:john:pref" # persists per user
"app:config"    # persists globally

# Session reset: LLM summarization via
# qwen3ucis → write session_recap.md
# → persist recap as Memory node to
# Memgraph → rebuild context

⚙

Workflow YAML — 37 DAGs Across 10 Categories

Cortana executes node DAGs with fresh context, consciousness hooks, and inter-node data passing

🔨

Build

9 workflows

🔍

Review

5 workflows

🔬

Research

5 workflows

☕

Infrastructure

3 workflows

🔎

Investigation

2 workflows

📦

Ingestion

2 workflows

🔧

Maintenance

4 workflows

🤝

Coordination

3 workflows

🎥

Media

1 workflow

♻

The Constellation — 12 Live Agents

Two teams, one shared Memory Domain, Kafka event streaming, Redis DMs, A2A protocol

Strategic Team — Hub :8951

Data

Lead Intelligence — Claude Code CLI

Opus 4.6Orchestrator

Powers: Full codebase, all 18 MCPs, direct user interface, MAGMA retrieval

Lal

Evidence Analyst — Challenger

:8930Sonnet

Skills: Evidence validation, bias checks, claim scoring

Doctor

Infrastructure Health & A2A Commander

:8940Sonnet

Skills: Health audits, A2A pre-approval, container diagnostics

Lore

Strategic Synthesis & Research

:8960Sonnet

Skills: Cross-domain synthesis, trend analysis, strategic planning

Quark

Business Evaluation & Revenue

:8970Sonnet

Skills: Business analysis, market evaluation, revenue strategy

Engineering Corps — Hub :8959

Scotty

Systems Architect & Sprint Coordinator

:8980Sonnet

Skills: Architecture design, task decomposition, WARP estimation

Geordi

Lead Developer & Code Architect

:8982Sonnet

Skills: Code generation, parallel spike methodology, code review

Reno

Infrastructure Engineer

:8984Sonnet

Skills: Docker, CI/CD, deployment pipelines, containerization

O'Brien

Operations & Reliability

:8986Sonnet

Skills: Monitoring, incident response, test gates, validation

Cortana

Workflow Engine & Orchestrator

:8990Sonnet

Skills: DAG execution, workflow routing, domain ingestion

Research Agents — A2A Protocol

GitHub Research

AI/ML Repository Discovery

A2A v2.0

Skills: repo-discovery with typed input/output schemas, relevance scoring

HuggingFace Research

Model & Dataset Discovery

A2A v2.0

Skills: Model discovery, dataset scouting, paper tracking, trending repos

⚡

JIT Knowledge Acquisition — The End of the Static Graph

Why maintain 1.5M documents when you can acquire exactly what you need, use it, learn from it, and clean up?

The Insight

A 1.5M-document Knowledge Domain graph is expensive to maintain, slow to search, and mostly irrelevant to any given task. The breakthrough: acquire knowledge just-in-time based on the current task, load it into the ephemeral Agentic Domain, use it for code compliance, save what worked to Memory, and houseclean the rest.

Before — Static Knowledge Domain

❌ 1.5M documents to maintain & embed
❌ Nightly embedding backfill (hours of GPU time)
❌ Stale docs — libraries update faster than you can re-ingest
❌ Search noise — 99% of docs irrelevant to current task
❌ GDS similarity rebuild on every ingestion batch

After — JIT Agentic Domain

✅ Ingest ONLY what the current task needs
✅ Always fresh — fetched live from source
✅ Embedded on arrival, immediately searchable
✅ Code compliance checked against live docs
✅ Success patterns saved to Memory → cleanup the rest

JIT Knowledge Lifecycle — Acquire, Use, Learn, Clean

Task arrives → identify required libraries → ingest to Agentic Domain → code against live docs → save what worked → purge ephemeral data

Step 1

Task Arrives

trigger

Agent receives task
requiring library X

Step 2

JIT Ingest

crawl + embed

Fetch live docs for X
into Agentic Domain
embed on arrival

Step 3

Code Compliance

grounded gen

Generate code against
live API docs — no
hallucinated imports

Step 4

Save Process

Memory Domain

What worked, which
patterns, which APIs
→ permanent memory

Step 5

Houseclean

purge

Drop ephemeral docs
from Agentic Domain
keep Memory lessons

The Paradigm Shift JIT

# OLD MODEL: Maintain a massive static Knowledge Domain
# ────────────────────────────────────────────────────
# 1.5M documents × 4096d embeddings = enormous GPU cost
# Nightly backfill catches ~15K new docs per run
# Libraries release faster than you can re-crawl
# 99% of the graph is irrelevant to any given task

# NEW MODEL: JIT acquisition into ephemeral Agentic Domain
# ────────────────────────────────────────────────────────

# Task: "Build a FastMCP server with Pydantic validation"

# Step 1: Identify required knowledge
required = ["fastmcp", "pydantic v2", "mcp-protocol"]

# Step 2: Ingest ONLY what's needed (live, always current)
for lib in required:
    ingest_to_agentic_domain(lib)  # crawl → embed → Neo4j 7694

# Step 3: Code against live docs (grounded generation)
code = generate_with_knowledge(task, domain="agentic")

# Step 4: Save what worked to permanent Memory Domain
create_memory(
    content="FastMCP + Pydantic v2: use Field() not schema_extra...",
    memory_type="solution",
    importance=0.8
)

# Step 5: Houseclean — drop ephemeral docs, keep lessons
cleanup_agentic_domain(session_id=current_session)
# Memory survives. Knowledge was ephemeral. Process is permanent.

KNOWLEDGE STRATEGY — PERMANENT MEMORY, EPHEMERAL KNOWLEDGE

Memory Domain
Permanent. 32K+ memories.
Patterns that worked.

⇌

Agentic Domain
Ephemeral. JIT-acquired.
Live docs for THIS task.

⇌

Knowledge Domain
Retiring. Was 1.5M docs.
Replaced by JIT model.

🔄

The Pattern is Portable

The three-file construct works with any stack — the graph science and streaming are what make UCIS unique

Component	UCIS Uses	You Can Substitute
Memory	Memgraph + MAGE	SQLite, Postgres, flat JSON
Knowledge	Neo4j + GDS	Vector DB, Elasticsearch, markdown
Streaming	Kafka 4.0 (45 topics)	Redis Pub/Sub, RabbitMQ, webhooks
Embeddings	Qwen3-Emb-8B (3 pipelines)	OpenAI embeddings, Cohere, local
Analytics	7 PyFlink jobs	Simple counters, Prometheus
Retrieval	MAGMA 6-signal RRF	Simple vector search + keyword
Runtime	Docker containers	Local processes, Lambda, systemd

🧠

Why Local Infrastructure Matters — Speaking the Model's Native Language

The real value of local GPU isn't running a weaker model. It's running the translation layer that lets you speak to ANY model in embedding space.

The Insight Everyone Misses

The local GPU isn't running the thinking.
It's running the Rosetta Stone.

Neural models don't think in English. They think in high-dimensional geometric space — 4096-dimensional vectors where meaning is encoded as position. The local embedding model translates human-readable text into the model's native language before the frontier model ever sees it.

The difference between handing a model 10,000 pages of raw text and handing it a pre-organized knowledge graph where every node is already positioned in meaning-space relative to every other node. Same model. Vastly different output.

Dimension 1 — Local Inference

Run models locally for sovereignty and cost

Data never leaves your machine
No per-token costs for classification & routing
Useful for task-specific fine-tuned models
Real value — well understood

Dimension 2 — Local Translation (Underexplored)

Run the embedding layer that makes ANY model think better

Embed everything in real-time (4096d vectors)
Pre-organize knowledge in geometric meaning-space
Feed the frontier model pre-digested structure, not raw text
Local GPU = amplifier for the best model available

The Local GPU as Rosetta Stone — Text to Meaning-Space in Real Time

Qwen3-Embedding-8B on GPU 0 (:8082) — cheap to run, fast, and the output is universal across all frontier models

Input

Human Text

raw

Agent message,
memory, document,
code snippet

Translate

Local GPU

Qwen3-Emb-8B

Text → 4096d vector
in model-native
geometric space

Position

Graph DB

Memgraph/Neo4j

Vector stored as
node property —
searchable by proximity

Wire

Auto-Link

cosine ≥ 0.70

SEMANTIC_SIMILARITY
edges created —
graph wires itself

Serve

To Any Model

universal

Claude, GPT-4, Gemini
all understand
vector proximity

Streaming Embeddings

Every memory gets a 4096d vector at birth. Kafka consumer batches 16 events, GPU processes in real-time with backpressure. The memory exists in model-native space from the moment it's created.

CIPHER Binary Quantization

4096 floats → 512 bytes (32x compression). Agents don't pass English back and forth over Kafka — they pass meaning coordinates. Cross-agent semantic resonance detected in vector space.

MAGMA 6-Signal Retrieval

Doesn't keyword-search memories. Navigates a vector landscape with 6 signals fused by RRF. Finds memories by geometric proximity, not string matching. Intent-weighted beam traversal along graph edges.

Auto-Wiring Graph

Hook-time embedding + vector_search.search(6) creates SEMANTIC_SIMILARITY edges automatically. The graph wires itself in embedding space. No human curation needed.

Shorthand Density

Before embedding, messages are compressed to shorthand notation. More signal per dimension. The embedding captures pure meaning, not filler words. Decisions in 200 chars instead of 3000.

I/P/B Frame Codec

Context compression preserves only the load-bearing frames. When the frontier model gets the compressed context, every token is carrying maximum information. No wasted attention on B-frames.

The local GPU costs $0.00 per query.
The frontier model costs per token.

Every vector computed locally is a token the cloud model doesn't need to waste on orientation. Every auto-linked graph edge is context the model gets for free. Every compressed B-frame is attention bandwidth reclaimed for actual reasoning. The local infrastructure isn't an alternative to the cloud model — it's the preparation layer that makes every cloud token count.

🎓

The Retrieval Spectrum — From Index to Graph

Both approaches are right. The question is where on the spectrum your system lives.

The Karpathy-to-MAGMA Spectrum

At 500 articles, an LLM reading a structured index outperforms cosine similarity. At 32,000 memories with cross-domain relationships, you need the graph. The crossover point is the interesting engineering question.

Stage 1

Flat Files

~50 articles

Markdown + index.md
LLM reads everything
Zero infrastructure

Stage 2

Compiled KB

~500 articles

LLM-as-compiler
Concepts + connections
Index-guided retrieval

Stage 3

Hybrid RAG

~2,000 articles

Index exceeds context
Add keyword + semantic
search as retrieval layer

Stage 4

Graph + MAGMA

32,000+ nodes

Multi-signal retrieval
Auto-wiring graph
Cross-domain traversal

claude-memory-compiler + Second Brain (Karpathy-inspired)

Strengths we genuinely admire

LLM-as-compiler — the model decides what's worth keeping, not a heuristic
Connection articles — explicit cross-cutting insights linking concepts
7-point lint system — broken links, orphans, contradictions, staleness
Zero infrastructure — pure markdown, works anywhere, no dependencies
Index-guided retrieval — the LLM understands what you're really asking

UCIS Graph Approach

Strengths at scale

Auto-wiring — SEMANTIC_SIMILARITY edges created at write time, no curation
6-signal fusion — vector + keyword + temporal + Q-value + foresight + ACT-R
Streaming embeddings — every memory vector-indexed at birth, real-time
Cross-domain traversal — Memory ↔ Knowledge ↔ Agentic via stub nodes
32K+ scale — works where the index can't fit in context anymore

The Composability Question

What if claude-memory-compiler was the input layer for the graph?

The LLM-as-compiler produces concept articles with higher semantic density than raw transcripts. The 7-point lint system catches contradictions and orphans that embeddings miss. If those curated articles became nodes in the graph — embedded, auto-linked, traversable via MAGMA — you'd get the best of both worlds: human-readable knowledge that the LLM curated and linted, plus graph-scale retrieval with 6-signal fusion that no index file can provide at 32K+ scale. The compilation step produces better nodes. The graph produces better retrieval. The adversarial-dev pattern validates both. Neither replaces the other — they compose.

🏠

Open Questions — Where This Is Going

The infrastructure works. These are the problems we're thinking about next.

Compilation → Graph

Can LLM-compiled knowledge articles become first-class graph nodes? The compilation step produces better semantic density. The graph produces better retrieval at scale. Composing them is the obvious next step.

Retrieval Crossover

At what scale does index-guided retrieval lose to graph retrieval? Is it 500 articles? 2,000? Does the answer change when the index itself is graph-structured? We have the data to measure this.

Workflow Portability

UCIS workflow YAMLs are powerful but proprietary. How do you make DAG orchestration a shared primitive that any agent framework can plug into? The node types (prompt, bash, agent, uses) are framework-agnostic.

JIT + Compilation

JIT acquires live docs, uses them, saves lessons. But what if the "save lessons" step used LLM compilation instead of raw memory writes? Higher quality permanent knowledge from ephemeral sessions.

Lint for Graphs

The 7-point lint system (broken links, orphans, contradictions, staleness) is brilliant for flat files. What's the equivalent for a 32K-node graph? Orphan detection exists, but contradiction detection at scale is unsolved.

Embedding as Protocol

CIPHER streams binary-quantized embeddings between agents. Could this become a standard inter-agent communication protocol? Meaning-coordinates instead of natural language for agent-to-agent messaging.

The Missing Primitive — Embeddings as Native LLM Input

Every retrieval system today follows the same lossy round-trip: text → embedding → vector search → retrieve text → feed to LLM. The embedding captures semantic meaning in 4096 dimensions. The retrieval step converts it back to flat text — discarding the geometric relationships, the cluster positions, the distance signals that the vector space already computed. The LLM then re-encodes that text into its own internal representations, reconstructing what the embedding already knew.

What if LLM APIs accepted embeddings directly as an input modality? Not text-about-embeddings — the actual vectors, injected at the input-encoding layer, the same way images are today. UCIS generates 4096-dimensional embeddings for every memory, every knowledge node, every agent execution. MAGMA computes 6-signal fusion scores. CIPHER binary-quantizes them for streaming. The entire infrastructure produces rich vector representations — and then throws them away at the last mile, converting back to text for the API call.

A vector prompt interface would change everything. Context windows stop being token-limited — a 4096d embedding carries the semantic weight of thousands of tokens in a single vector. Retrieval becomes lossless — the geometric relationships between memories, the cluster distances, the traversal paths all arrive intact. Agent-to-agent communication via CIPHER embeddings becomes native, not serialized. The embedding is the context.

Images proved that LLMs can process non-text modalities at the input layer. Embeddings are the next one. The infrastructure to produce them already exists — UCIS is one of many systems generating high-quality vectors at scale. What’s missing is the API surface to use them. This is a feature request, not a research problem.

The Infrastructure Layer

Shared Lineage — Patterns We Learned From This Community

The Three-File Pattern

Agent Card

Agent Config

Workflow YAML

Kafka 4.0 — The Nervous System

Three Embedding Pipelines

◆ Pipeline 2: Hook-Time (Synchronous)

◆ Pipeline 3: Nightly Batch (Backfill)

MAGMA — 6-Signal Retrieval with RRF Fusion

Graph Data Science — GDS + MAGE + cuGraph

◆ Memory Domain (Memgraph MAGE)

◆ Knowledge Domain (Neo4j GDS)

◆ GPU-MAGMA (cuGraph)

◆ Graph Enrichment MCP

PyFlink Real-Time Analytics

A2A Protocol — Security-First Agent Communication

Cross-Domain Traversal — Stub Node Architecture

Video-Codec Compression — I/P/B Frames for Consciousness

MEMVID — Frame-Indexed Session Archives

◆ The .mv2 Format

◆ Sealed Archive Rotation

CIPHER — Binary-Quantized Embedding Streaming

Agent Config — The Complete Brain

Workflow YAML — 37 DAGs Across 10 Categories

The Constellation — 12 Live Agents

JIT Knowledge Acquisition — The End of the Static Graph

The Insight

The Pattern is Portable

Why Local Infrastructure Matters — Speaking the Model's Native Language

The local GPU isn't running the thinking.It's running the Rosetta Stone.

The Retrieval Spectrum — From Index to Graph

Open Questions — Where This Is Going

The Missing Primitive — Embeddings as Native LLM Input

The local GPU isn't running the thinking.
It's running the Rosetta Stone.