The first 90 days at a new engineering job are defined by a particular kind of confusion. Not the confusion of incompetence — the confusion of insufficient context. A senior engineer joins your team with a decade of experience and spends the first month asking questions like "why did we build it this way?" and "where does this service talk to that one?" and "what should I actually work on first?"
These questions have answers. They live in old PRs, post-mortem documents, ADR files, Slack threads from 18 months ago, and the heads of three engineers who have been here since the beginning. The new engineer onboarding agent extracts this context programmatically and delivers it in layers — each layer addressing a different depth of understanding that the new hire needs at different stages of their ramp.
The Three-Layer Onboarding Architecture
Each layer targets a different question new engineers ask at different stages.
The onboarding agent is not a single monolithic system. It is three distinct agents that activate sequentially, each building on the context established by the previous one. Layer 1 provides structural orientation — the map. Layer 2 provides historical reasoning — the story behind the map. Layer 3 provides personalized action — the first steps on the map that match this specific engineer's skills and the team's current priorities.
Layer 1: Codebase Orientation Agent — The Service Brief
Generating a structural guide to every service the new engineer will touch.
The codebase orientation agent runs against the repositories the new engineer will work with and produces a service brief for each one. This is not auto-generated documentation — it is a structured summary designed for someone encountering the codebase for the first time.
For each service, the agent generates:
What it does: A two-paragraph plain-language description of the service's purpose, derived from README files, API documentation, and route/endpoint analysis. If the README is stale (last updated >6 months ago), the agent flags this and supplements with analysis of the actual code structure[1].
How it connects: An automatically generated dependency map showing what this service calls, what calls it, and what shared infrastructure (databases, queues, caches) it touches. This is extracted from import statements, API client configurations, and infrastructure-as-code definitions.
Where the important parts are: A guided tour of the key directories and files — the entry points, the core business logic, the configuration, and the test suites. Weighted by commit frequency: files that change often are more important to understand than stable utility code.
Who owns it: Current CODEOWNERS mapping, most active contributors in the last 90 days, and the on-call rotation. This tells the new engineer who to ask when the documentation falls short.
Example Service Brief Output Structure
treeonboarding-briefs/
├── payments-service/
│ ├── overview.md
│ ├── dependency-map.json
│ ├── key-files-guide.md
│ ├── ownership.md
│ └── recent-changes.md
├── user-service/
│ ├── overview.md
│ ├── dependency-map.json
│ ├── key-files-guide.md
│ ├── ownership.md
│ └── recent-changes.md
└── notification-service/
├── overview.md
├── dependency-map.json
├── key-files-guide.md
├── ownership.md
└── recent-changes.mdLayer 2: The 'Why Did We Build It This Way' Agent
Querying ADRs, PR discussions, and post-mortems to surface historical reasoning.
The most frustrating experience for a new engineer is encountering a questionable architectural choice and being told "there is a reason for that, but nobody remembers exactly what it was." The historical reasoning exists — it was just never indexed for retrieval.
Layer 2 builds a context index from three sources:
Architecture Decision Records (ADRs): The agent parses all ADRs and links each decision to the specific services and code paths it affects[4]. When the new engineer asks "why does the notification service use polling instead of webhooks?", the agent can surface the ADR from 2024 that explains the vendor's webhook reliability issues.
PR Discussion Threads: The richest source of "why" context. Significant PRs — those with extensive review comments, multiple revision cycles, or large-scale changes — contain inline discussions about tradeoffs, rejected alternatives, and future concerns. The agent indexes these discussions and links them to the files and directories they concern.
Post-Mortem Documents: Past incidents reveal why defensive coding patterns exist. The retry logic that looks excessive might be there because a cascading failure in 2024 took down the payment pipeline for three hours. The agent links post-mortem recommendations to the code changes that implemented them.
context-index-builder.tsinterface ContextEntry {
source: 'adr' | 'pr-discussion' | 'post-mortem' | 'slack-thread';
sourceUrl: string;
date: string;
relevantPaths: string[]; // files/directories this context applies to
summary: string; // 2-3 sentence summary
keyDecision: string | null; // the decision made, if applicable
rejectedAlternatives: string[]; // options that were considered and rejected
participants: string[]; // who was involved in the discussion
tags: string[]; // service names, technology names
}
async function buildContextIndex(
repos: string[],
adrPath: string,
postMortemPath: string
): Promise<ContextEntry[]> {
const entries: ContextEntry[] = [];
// Phase 1: Parse ADRs
const adrs = await parseADRDirectory(adrPath);
for (const adr of adrs) {
entries.push({
source: 'adr',
sourceUrl: adr.filePath,
date: adr.date,
relevantPaths: adr.affectedPaths,
summary: await summarizeDocument(adr.content),
keyDecision: adr.decision,
rejectedAlternatives: adr.alternatives ?? [],
participants: adr.authors,
tags: adr.tags,
});
}
// Phase 2: Index significant PR discussions
for (const repo of repos) {
const prs = await getSignificantPRs(repo, {
minComments: 10,
minReviewCycles: 3,
lookbackDays: 365,
});
for (const pr of prs) {
entries.push({
source: 'pr-discussion',
sourceUrl: pr.url,
date: pr.mergedAt,
relevantPaths: pr.changedFiles,
summary: await summarizePRDiscussion(pr.comments),
keyDecision: extractDecision(pr.comments),
rejectedAlternatives: extractRejectedApproaches(pr.comments),
participants: pr.reviewers,
tags: inferServiceTags(pr.changedFiles),
});
}
}
// Phase 3: Index post-mortems
const postMortems = await parsePostMortems(postMortemPath);
for (const pm of postMortems) {
entries.push({
source: 'post-mortem',
sourceUrl: pm.url,
date: pm.date,
relevantPaths: pm.affectedServices.flatMap(s => s.paths),
summary: pm.summary,
keyDecision: pm.rootCause,
rejectedAlternatives: [],
participants: pm.responders,
tags: pm.affectedServices.map(s => s.name),
});
}
return entries;
}Layer 3: The 'What Should I Work On First' Agent
Matching sprint context and skill profile to ideal starter tickets.
The third layer solves one of the most common onboarding failures: giving new engineers work that is either too trivial (fixing typos in docs) or too ambitious (redesigning a core service they do not yet understand). The ideal starter ticket hits a specific sweet spot — meaningful enough to teach the codebase, scoped enough to complete in 2-3 days, and connected to something the team actually needs.
The starter ticket agent takes three inputs:
Sprint Context: Current sprint backlog, team velocity, and upcoming deadlines. The agent identifies tickets that are important but not on the critical path — work the team needs done but that will not block the sprint if it takes longer than expected.
Skill Profile: The new engineer's stated experience with languages, frameworks, and domain areas from their interview process or self-assessment. A frontend specialist should not get a Kubernetes networking ticket as their first assignment.
Codebase Accessibility: From Layer 1, the agent knows which services have the best documentation, highest test coverage, and most active ownership. It preferences tickets in well-documented, well-tested areas where the new engineer will have the strongest safety net.
The output is a ranked list of 3-5 recommended starter tickets, each with an explanation of why it was chosen: what the engineer will learn, which services they will touch, and who they should pair with.
| Factor | Weight | Score 1 (Poor) | Score 5 (Ideal) |
|---|---|---|---|
| Scope clarity | 0.25 | Vague requirements, undefined done criteria | Clear acceptance criteria, bounded scope |
| Learning value | 0.25 | Trivial change, no codebase exposure | Touches 2-3 services, teaches key patterns |
| Safety net | 0.20 | No tests, no docs, sole owner on PTO | Strong test suite, active reviewers available |
| Sprint relevance | 0.15 | Nice-to-have, deprioritized backlog item | In current sprint, team needs it done |
| Skill match | 0.15 | Requires unknown tech stack | Aligns with stated experience |
Passive Context Extraction: Mining PR Comments, Slack, and Incidents
The most valuable onboarding context is not written in documentation — it is embedded in the daily communication artifacts that teams produce without thinking about onboarding at all. Extracting this context passively, without requiring anyone to write onboarding materials, is what makes the agent approach sustainable.
PR Comment Mining: Review comments are a goldmine of implicit design rationale. When a reviewer writes "we should use the existing retry wrapper here because the last time someone rolled their own, we got the cascading timeout issue from INC-2847," that comment contains a direct link between a code pattern and an incident. The agent indexes these connections.
Slack Thread Analysis: Project channels contain decision narratives that never make it into formal documentation. The agent scans threads with high engagement (many participants, many messages) in channels tagged to the new engineer's team, extracts decision summaries, and links them to relevant code or tickets.
Incident Context: Post-mortems document what broke and why. But the richer context often lives in the incident channel itself — the real-time debugging conversations, the hypotheses that were tested and rejected, the workarounds that became permanent. The agent indexes incident channel transcripts alongside the formal post-mortem.
All of this happens continuously in the background. By the time a new engineer joins, the context index is already populated with months or years of institutional knowledge[3].
Read the README (last updated 14 months ago)
Shadow a senior engineer for a week
Ask 'why' questions in Slack, wait hours for answers
Get assigned a 'starter ticket' that teaches nothing
Discover tribal knowledge through trial and error
Productive after 60-90 days
Receive a current, auto-generated service brief
Query the context index: 'why does this use polling?'
Historical reasoning surfaces in seconds, with sources
Starter tickets matched to skill profile and sprint needs
Institutional knowledge indexed and searchable from day one
Meaningful contributions typically within 2–4 weeks (varies by complexity)
A note on the numbers
The improvement ranges cited here are drawn from self-reported outcomes at teams using structured onboarding systems, not from controlled studies. Codebases with strong existing documentation and active ADR practices tend to see the largest gains. Teams with sparse documentation will see more modest improvements until the context index matures. Run a pilot with one team before setting org-wide expectations.
Onboarding Agent Implementation Checklist
Identify the 3-5 core repositories new engineers will touch
Set up repository scanning for dependency mapping
Configure CODEOWNERS and git log access for ownership data
Index existing ADRs with service and path tagging
Build PR discussion indexer for comments with >10 replies
Parse post-mortem repository and link to affected services
Create Slack channel scanner for project channels (public only)
Build skill profile intake form for new hires
Connect to Jira/Linear for sprint backlog and ticket metadata
Define starter ticket scoring criteria and thresholds
Package outputs into a single onboarding portal or document
Test with a recent hire and collect feedback on accuracy
How do you keep the service briefs from becoming stale?
Run the Layer 1 agent on a weekly cron schedule, not just when a new hire joins. Store diffs between runs and flag services where the brief changed significantly. This also surfaces services with rapid churn, which is itself useful metadata for the fragility tracking.
What if the team does not have ADRs?
Many teams do not maintain formal ADRs. The agent compensates by increasing the weight of PR discussion mining. Significant PRs with extensive review comments serve as informal ADRs. The agent can also retroactively generate draft ADRs from PR discussions for the team to review and formalize.
How do you handle sensitive information in Slack threads?
Only scan designated project and engineering channels. Never scan DMs or private channels. Implement keyword filtering to exclude threads containing sensitive terms (salary, performance review, HR). Make the scanned channel list explicitly visible and editable by engineering leadership.
Does this replace having a buddy or mentor?
No. The agent handles information transfer — the structured, factual context that takes hours to communicate verbally. The buddy relationship handles cultural integration, team dynamics, and the judgment calls that cannot be indexed. The agent frees the buddy to focus on the human aspects of onboarding instead of spending their time explaining code architecture.
Our last three hires used the onboarding agent. All three submitted their first production PR within 10 days. Before the agent, our average was 28 days. The biggest difference was not speed — it was confidence. They understood why the code was structured the way it was before they touched it.
The new engineer onboarding agent works because it addresses the real bottleneck: not access to code, but access to context. Every engineering organization has months of accumulated decisions, tradeoffs, and historical reasoning embedded in communication artifacts that nobody curates for onboarding. The agent extracts and structures this context automatically, transforming the foggy first 90 days into a clear-eyed first three weeks[2].
Start with Layer 1 — the service brief generator. It requires the least integration (just repository access) and delivers immediate value. Add Layer 2 context indexing once you confirm the service briefs are accurate. Layer 3 starter ticket matching comes last, once you have sprint integration and a skill profile intake process. The full system compounds: each new hire who uses it provides feedback that improves the context index for the next one.
- [1]Cortex — Developer Onboarding Guide(cortex.io)↩
- [2]Enboarder — AI Onboarding Tool Guide 2026(enboarder.com)↩
- [3]Instruqt — Navigating the Codebase: Seamless Engineer Onboarding Plan(instruqt.com)↩
- [4]AWS Prescriptive Guidance — Architectural Decision Records Process(docs.aws.amazon.com)↩