✓

Follow along with this comprehensive guide

In early April, engineers at Meta published a detailed account of how they deployed a swarm of over 50 specialized AI agents to map tribal knowledge across a sprawling data pipeline. The pipeline spanned four repositories, three programming languages, and more than 4,100 files. While the complexity of their solution grabbed headlines in platform engineering circles, a deeper reading reveals a quieter, more portable insight: the most durable piece of their stack is an architectural pattern that any organization, regardless of size, can adopt today.

The Challenge: Tribal Knowledge in Polyrepo Systems

Meta's pipeline relies on config-as-code, combining Python configurations, C++ services, and Hack automation scripts across multiple repositories. A single data field onboarding touches six subsystems that must stay synchronized. Previously, engineers had to manually align these subsystems, relying on undocumented knowledge stored only in their heads. When Meta pointed AI coding agents at this codebase, the results were predictable: the agents would compile but produce subtly incorrect outputs. They didn't know that certain deprecated enum values must never be removed due to serialization compatibility, nor that two configuration modes use different field names for the same operation—swapping them would cause silent errors. This tribal knowledge was nowhere written down.

Meta's AI Agent Swarm Revealed a Simple Knowledge Mapping Pattern Any Team Can Use — Source: dev.to

Meta's Large-Scale Solution: A Pre-Compute Engine

Meta's response was a multi-stage AI pipeline they call a "pre-compute engine." It consists of:

Two explorer agents that map the codebase structure
Eleven module analysts that read every file and answer five specific questions per module
Two writers that produce concise context files (25–35 lines each)
Ten or more critic passes that perform three rounds of independent quality review
Four fixer agents that apply corrections automatically

The output is 59 context files covering 100% of code modules, up from just 5% before. Every few weeks, the system self-refreshes: it validates file paths, detects coverage gaps, re-runs critics, and auto-fixes stale references. On a six-task evaluation, the results were significant: roughly 40% fewer tool calls per task, and workflow guidance that once took two days of asking engineers now takes 30 minutes.

Beyond the AI Swarm: The Durable Architectural Pattern

Buried near the end of Meta's post is a paragraph that reveals the true architectural insight. It describes how they generated cross-repo dependency summaries that capture the relationships between modules. This is not a complex agent swarm; it's a structured approach to extracting and codifying knowledge. The key pattern is the pre-compute engine: a system that runs on a regular cadence to produce and maintain a knowledge base from the codebase itself. This pattern is independent of the specific AI agents used. Any team can build a simpler version today using basic scripts, linters, and documentation generators.

Key Components of the Pattern

Automated exploration: Crawl your codebase to identify modules, dependencies, and configuration files.
Structured extraction: For each module, answer a fixed set of questions (e.g., purpose, inputs, outputs, side effects, required context).
Context packaging: Generate concise, human- and machine-readable summaries (e.g., markdown or JSON files).
Validation and refresh: Periodically check for stale entries, missing coverage, and broken references, then update automatically.

This pattern doesn't require a massive platform team. A single engineer can implement it with existing CI/CD tools. The durability comes from the separation of concerns: the knowledge extraction logic is decoupled from the AI agents, so improvements to one don't break the other.

How to Implement the Pre-Compute Engine Approach

Step 1: Define Your Knowledge Questions

Start by asking: what knowledge do your engineers most often need? Common questions include: "What does this module do?", "What are its external dependencies?", "Are there any hidden constraints?" For Meta, they settled on five questions per module. Keep your list small and actionable.

Step 2: Build a Simple Crawler

Write a script that parses your repository structure. Use existing tools: find all files with a certain extension, parse import statements, read configuration files. This can be a shell script, a Python script, or even a GitHub Action.

Step 3: Generate Context Files

For each module, output a structured context file. Use a format that's easy to version control and diff, such as YAML or Markdown. Include the answers to your knowledge questions, plus a list of files and dependencies.

Step 4: Automate Refresh

Schedule your crawl to run on a regular basis—daily, weekly, or triggered by pull requests. Include a validation step that fails if any context file becomes incomplete or stale. This ensures your knowledge base stays current.

Meta's approach shows that even a small team can benefit from this pattern. The AI agents are just one possible implementation. The durable stack is the structured knowledge repository itself.

Conclusion: The Durable Insight

While Meta's deployment of 50+ AI agents is impressive, the lasting lesson is simpler: build a pre-compute engine that continuously extracts and validates tribal knowledge from your codebase. This pattern scales from startups to enterprises. It reduces onboarding time, prevents subtle bugs, and makes AI coding agents far more effective. The best part? You can start building it today, with tools you already have.

For more on knowledge management in platform engineering, see our guide on tackling tribal knowledge and structured extraction.

Meta's AI Agent Swarm Revealed a Simple Knowledge Mapping Pattern Any Team Can Use