Skip to content

Research grounding

The Agentic Knowledge Compiler is designed in line with recent research on agentic code synthesis, memory, and correctness. This page summarizes the main papers and external references to anchor design choices.

Core synthesis and repair

DeepCode (arXiv:2512.07921)

  • Focus: Document-to-codebase synthesis with “channel optimization” via blueprint distillation, stateful code memory, RAG, and closed-loop error correction.
  • Relevance to AKC: Code memory in memory/ and retrieval-before-generation in the compile loop; repair step in Plan → Generate → Execute → Repair. PaperBench-style evaluation can inform future benchmarks.
  • Implementation reference: HKUDS/DeepCode (also cited in OSS direction planning).

ARCS (arXiv:2504.20434)

  • Focus: Synthesize–execute–repair over a frozen LLM; retrieval-before-generation; provable termination, monotonic improvement, and bounded cost; tiered controller (Small/Medium/Large).
  • Relevance to AKC: The compile loop (Plan → Retrieve → Generate → Execute → Repair) and optional tiered controller for latency/quality; emphasis on retrieval and repair for correctness.

DocAgent (Meta, arXiv:2504.08725)

  • Focus: Multi-agent roles (Reader, Searcher, Writer, Verifier, Orchestrator); topological processing of code and dependencies.
  • Relevance to AKC: Ingest and compile can adopt role-based or topological processing where useful; verification as an explicit step.

Reasoning and acting

ReAct (arXiv:2210.03629)

  • Focus: Interleaved reasoning and acting; foundation for tool use and iterative refinement.
  • Relevance to AKC: Plan state and the iterative compile loop (plan → act → observe → repair) align with ReAct-style reasoning and acting.

ActMem (arXiv:2603.00026)

  • Focus: Memory and reasoning; causal/semantic graphs over dialogue; conflict detection.
  • Relevance to AKC: Optional knowledge/causal graph in memory/ for “why” and conflict detection; plan state as a form of working memory.

Comparable agents, benchmarks, and eval caveats

  • SWE-agent: Cited in OSS direction planning as a comparable framework for how open-source agent stacks approach engineering tasks and sandboxing; project home princeton-nlp/SWE-agent.
  • Coding-agent / benchmark caveats: OpenAI note on SWE-bench Verified (Feb 2026) — useful context when interpreting leaderboard-style results (contamination, methodology).

OSS security, supply chain, and CI (planning anchors)

These appear in OSS direction and hardening plans as patterns, not prescriptions to copy blindly; AKC keeps tenant-scoped, artifact-local truth.

Observability and telemetry contracts

  • LLM spans: OpenTelemetry GenAI semantic conventions (gen_ai.*, token usage where available) — cited for control-plane / trace alignment.
  • Distributed correlation: W3C Trace Context — optional propagation for cross-service use; compile run id as correlation id is already a reasonable default.

Multi-agent coordination and workflow graphs

Industry and framework references used when reasoning about coordination semantics, fork/join, and dynamic parallelism (see coordination plans):

Identity, policy, and audit trails

Reconciliation, operators, and progressive delivery

References from runtime and “recompile” planning for operator-style control loops, leases, and progressive analysis:

RAG, retrieval quality, and evaluation

Human quality, UX, and governance anchors

References used to ground quality dimensions such as taste, judgment, and user_empathy:

Lead time, DORA, and “time compression” measurement

Used in time-compression benchmark planning to anchor metric semantics (lead time, anti-gaming, caveats for AI-assist speedup claims):

Sandboxing and container security

From Rust/Docker sandbox hardening plans:

Compiler IR and system representation (background)

Formal verification (optional later phase)

  • AlphaVerus: Self-improving verified code generation.
  • AlgoVeri: Benchmarks for Dafny/Veris/Lean.
  • ProofWright: Agentic verification.

These can support a future “correctness guarantees” phase (e.g. Dafny/Verus or agentic verifiers for critical paths). Add stable public links here when the project standardizes on specific versions or papers.

Inputs in practice

  • Messaging (current and future): Today the repo ships Slack, Discord, Telegram, and WhatsApp-oriented messaging ingest paths. The common design goal is still the same: structure messages as Q&A from threads rather than raw dumps, with auth and channel/date/user filters where the platform supports them.
  • Docs/APIs: Chunking, embedding, retrieval; optional schema extraction (OpenAPI) for API-derived workflows.
  • Living docs: Bidirectional sync and validation (e.g. SpecWeave-style) can be a later “living system” feature.

Ingestion-specific choices

  • Chunking: Default to recursive/structure-aware chunking with overlap to preserve coherence and improve retrieval quality. This aligns with common RAG chunking guidance (e.g. Glukhov/Firecrawl surveys of chunking strategies).
  • OpenAPI ingestion: Prefer operation-level chunks (“GET /users”) plus an endpoint inventory document, matching the “discover relevant endpoints, then retrieve details” pattern described in recent OpenAPI-for-agents work.

Platform and delivery (supplementary)

We occasionally cite golden paths and developer-portal patterns; these are product/delivery context, not core AKC research:

Mobile and store-specific distribution links (TestFlight, Firebase App Distribution, Play Developer API, PWA installability) appear in delivery plans; see delivery-architecture.md and getting-started.md for product-facing consolidation rather than duplicating them here.

When implementing features, we align with and cite these works where relevant.