Research grounding¶
The Agentic Knowledge Compiler is designed in line with recent research on agentic code synthesis, memory, and correctness. This page summarizes the main papers and external references to anchor design choices.
Core synthesis and repair¶
DeepCode (arXiv:2512.07921)¶
- Focus: Document-to-codebase synthesis with “channel optimization” via blueprint distillation, stateful code memory, RAG, and closed-loop error correction.
- Relevance to AKC: Code memory in
memory/and retrieval-before-generation in the compile loop; repair step in Plan → Generate → Execute → Repair. PaperBench-style evaluation can inform future benchmarks. - Implementation reference: HKUDS/DeepCode (also cited in OSS direction planning).
ARCS (arXiv:2504.20434)¶
- Focus: Synthesize–execute–repair over a frozen LLM; retrieval-before-generation; provable termination, monotonic improvement, and bounded cost; tiered controller (Small/Medium/Large).
- Relevance to AKC: The compile loop (Plan → Retrieve → Generate → Execute → Repair) and optional tiered controller for latency/quality; emphasis on retrieval and repair for correctness.
DocAgent (Meta, arXiv:2504.08725)¶
- Focus: Multi-agent roles (Reader, Searcher, Writer, Verifier, Orchestrator); topological processing of code and dependencies.
- Relevance to AKC: Ingest and compile can adopt role-based or topological processing where useful; verification as an explicit step.
Reasoning and acting¶
ReAct (arXiv:2210.03629)¶
- Focus: Interleaved reasoning and acting; foundation for tool use and iterative refinement.
- Relevance to AKC: Plan state and the iterative compile loop (plan → act → observe → repair) align with ReAct-style reasoning and acting.
ActMem (arXiv:2603.00026)¶
- Focus: Memory and reasoning; causal/semantic graphs over dialogue; conflict detection.
- Relevance to AKC: Optional knowledge/causal graph in
memory/for “why” and conflict detection; plan state as a form of working memory.
Comparable agents, benchmarks, and eval caveats¶
- SWE-agent: Cited in OSS direction planning as a comparable framework for how open-source agent stacks approach engineering tasks and sandboxing; project home princeton-nlp/SWE-agent.
- Coding-agent / benchmark caveats: OpenAI note on SWE-bench Verified (Feb 2026) — useful context when interpreting leaderboard-style results (contamination, methodology).
OSS security, supply chain, and CI (planning anchors)¶
These appear in OSS direction and hardening plans as patterns, not prescriptions to copy blindly; AKC keeps tenant-scoped, artifact-local truth.
- OpenSSF OSPS Baseline and maintainer guidance.
- OpenSSF Scorecard — automated checks and remediation patterns.
- SLSA build provenance (v1.2) and slsa-github-generator.
- GitHub Actions security hardening.
- CII OpenSSF Best Practices badge — criteria alignment for dependencies, versioning, and docs.
Observability and telemetry contracts¶
- LLM spans: OpenTelemetry GenAI semantic conventions (
gen_ai.*, token usage where available) — cited for control-plane / trace alignment. - Distributed correlation: W3C Trace Context — optional propagation for cross-service use; compile run id as correlation id is already a reasonable default.
Multi-agent coordination and workflow graphs¶
Industry and framework references used when reasoning about coordination semantics, fork/join, and dynamic parallelism (see coordination plans):
- Microsoft AutoGen — GraphFlow.
- LangGraph persistence — checkpointing / state persistence patterns.
- Fork/join: Orkes Conductor fork/join operator — classical parallel-branch aggregation pattern.
- Dynamic parallelism: LangGraph
Send()/ map-reduce style branches, Send API overview — variable fan-out at plan time. - Advanced (optional): HyperFlow process-network model — hypergraph-style workflows; only if coordination needs outgrow DAGs.
Identity, policy, and audit trails¶
- Workload identity: SPIFFE workload endpoint; SPIFFE + OPA with Envoy — common patterns for mutual authentication and authorization between autonomous workloads; AKC maps this to cryptographic binding of role identity to bundle hash + tenant scope in phased designs.
- Policy engines: Open Policy Agent — reference for policy-as-code patterns.
- Audit narrative: Multi-agent systems and synchronized audit trails — correlating decisions across handoffs; AKC emphasizes immutable artifact hashes + replay.
Reconciliation, operators, and progressive delivery¶
References from runtime and “recompile” planning for operator-style control loops, leases, and progressive analysis:
- Kubernetes: Controller, Operator pattern, Leases, Liveness/readiness/startup probes, Deployments, Pod Security Standards, Recommended labels.
- Controller-runtime: FAQ (reconciliation semantics).
- GitOps / progressive: Flux Kustomization reconciliation, Argo Rollouts — analysis, Argo Rollouts — canary, Argo CD application spec (declarative desired state).
- SRE: Canarying releases, Error budgets.
- SLO vocabulary (patterns): OpenSLO, Harness SLO-as-code — declarative SLO targets; AKC evaluates exported evidence bundles, not a hosted TSDB.
RAG, retrieval quality, and evaluation¶
- RAGAS evaluation framework — grounding-focused RAG evaluation dimensions.
- Google Cloud — optimizing RAG retrieval — retrieval quality patterns.
- Evidently AI — RAG evaluation guide — offline eval dimensions for RAG systems.
Human quality, UX, and governance anchors¶
References used to ground quality dimensions such as taste, judgment, and user_empathy:
- ISO 9241-210:2019 — human-centered design principles.
- NN/g — 10 Usability Heuristics — user control, error prevention, and recovery heuristics.
- NN/g — Aesthetic-Usability Effect — perceived usability impact from visual quality.
- NIST AI RMF 1.0 — AI risk/governance framing for judgment and controls.
- Data Mesh Principles — domain ownership and bounded-context framing.
- ACM Software Engineering Code of Ethics — engineering-discipline and professional obligations.
Lead time, DORA, and “time compression” measurement¶
Used in time-compression benchmark planning to anchor metric semantics (lead time, anti-gaming, caveats for AI-assist speedup claims):
- DORA metrics guide and history.
- Apache DevLake — lead time for changes.
- METR — exploratory transcript analysis for coding-agent time savings — methodology caveats for upper-bound factors.
Sandboxing and container security¶
From Rust/Docker sandbox hardening plans:
- Docker — seccomp,
[docker runreference](https://docs.docker.com/reference/cli/docker/container/run/). - OWASP Docker Security Cheat Sheet.
Compiler IR and system representation (background)¶
- Intermediate representation (IR) for compilers — general background for IR-as-spine discussions.
Formal verification (optional later phase)¶
- AlphaVerus: Self-improving verified code generation.
- AlgoVeri: Benchmarks for Dafny/Veris/Lean.
- ProofWright: Agentic verification.
These can support a future “correctness guarantees” phase (e.g. Dafny/Verus or agentic verifiers for critical paths). Add stable public links here when the project standardizes on specific versions or papers.
Inputs in practice¶
- Messaging (current and future): Today the repo ships Slack, Discord, Telegram, and WhatsApp-oriented messaging ingest paths. The common design goal is still the same: structure messages as Q&A from threads rather than raw dumps, with auth and channel/date/user filters where the platform supports them.
- Docs/APIs: Chunking, embedding, retrieval; optional schema extraction (OpenAPI) for API-derived workflows.
- Living docs: Bidirectional sync and validation (e.g. SpecWeave-style) can be a later “living system” feature.
Ingestion-specific choices¶
- Chunking: Default to recursive/structure-aware chunking with overlap to preserve coherence and improve retrieval quality. This aligns with common RAG chunking guidance (e.g. Glukhov/Firecrawl surveys of chunking strategies).
- OpenAPI ingestion: Prefer operation-level chunks (“
GET /users”) plus an endpoint inventory document, matching the “discover relevant endpoints, then retrieve details” pattern described in recent OpenAPI-for-agents work.
Platform and delivery (supplementary)¶
We occasionally cite golden paths and developer-portal patterns; these are product/delivery context, not core AKC research:
- Backstage — software templates, what is Backstage.
- Glen Thomas — golden paths / opinionated platforms, Jellyfish — golden paths.
Mobile and store-specific distribution links (TestFlight, Firebase App Distribution, Play Developer API, PWA installability) appear in delivery plans; see delivery-architecture.md and getting-started.md for product-facing consolidation rather than duplicating them here.
When implementing features, we align with and cite these works where relevant.