What 160+ Sessions Taught Us About Human-AI Collaboration

Session 1 started with a blank page and a vague sense that there had to be a better way. The project had a codebase, some documentation, and a history of decisions that lived mostly in one person's head. The AI knew none of it. Every question required full context. Every answer required verification from scratch. It was useful — but it felt like hiring a brilliant consultant who had amnesia.

Session 160 started differently. The AI oriented itself in under a minute — pulling prior decisions, checking what had changed since yesterday, flagging a documentation inconsistency we hadn't noticed. Within five minutes, we were deep in a design discussion that built on three weeks of accumulated context. The AI referenced a decision from session 40 that turned out to be relevant. It challenged an assumption we'd been carrying since session 12. It caught a gap between what we said we were building and what the code actually did.

Same human. Same class of AI model. Completely different collaboration. The difference wasn't an upgrade. It was 160 sessions of learning how to work together — and building the discipline to preserve what we learned.

The Insight Nobody Talks About

The AI industry sells capability. Bigger models, longer context windows, better benchmarks. But capability without continuity is a treadmill. You run faster and stay in the same place.

What we discovered over 160+ sessions is that sustained human-AI collaboration is a discipline, not a feature. It's not something a model gives you. It's something you build through practice, through failure, and through the unsexy work of writing things down. The quality of session 160 wasn't determined by the model — it was determined by the 159 sessions that came before it and whether we'd done the work to preserve what they produced.

No one in the AI space talks about this. They talk about models, benchmarks, and context windows. They don't talk about the practice of sustained partnership — because most people haven't done it long enough to discover what happens when you do.

Why Most AI Collaboration Stays Shallow

Most interactions with AI are one-shots. A question, an answer, a closed tab. Even among power users, sustained multi-session collaboration is rare. A 2026 KPMG/UT Austin study analyzing 1.4 million workplace AI interactions found that only about 5% of users engage in the kind of iterative, context-rich, multi-step collaboration that produces high-impact results.

This isn't laziness. It's a structural problem. Every time you start a new conversation, you start from zero. The context you built — the decisions, the rejected alternatives, the nuances that took twenty minutes to establish — evaporates. As we explored in The Rediscovery Tax, this isn't just inconvenient. It's expensive. Developers lose roughly a third of their time reconstructing context. And MIT's 2025 study found that 95% of enterprise AI pilots fail to deliver measurable ROI — not because the models are weak, but because the systems can't retain feedback, adapt to context, or improve over time.

The result: most human-AI collaboration never gets past the equivalent of a first date. Polite, surface-level, and starting over every time.

The Compounding Gap Is Widening

New research is making it increasingly clear that sustained collaboration produces qualitatively different results:

160+

tracked collaboration sessions across 37 projects over 5 weeks

Our practice, 2026

8-12

weeks before collaboration transcends surface-level interactions

Broughton, 17-week study, 2025

95%

of enterprise AI pilots fail to deliver measurable ROI

MIT/NANDA, 2025

Sue Broughton's 2025 longitudinal study tracked sustained AI collaboration across 17 weeks and documented a clear trajectory: weeks 1–4 are transactional (basic tool use), weeks 5–8 reveal capability discovery (the AI surprises you), weeks 9–13 produce genuine collaborative integration, and weeks 14+ show optimization of a mature partnership. She found that it takes 8–12 weeks of sustained engagement before collaboration transcends surface-level interactions.

Meanwhile, organizations that don't invest in continuity are falling behind. A March 2026 Forbes analysis called context "the compound interest of AI productivity" — but noted that without shared, evolving context, over 60% of AI projects stall. The gap between teams that preserve knowledge and teams that start over each session is becoming the defining divide in AI-assisted work.

The Rhythm That Emerged

We didn't start with a methodology. We started with a problem: things kept getting lost between sessions. Decisions we'd made would be re-debated. Context that took thirty minutes to establish would evaporate overnight. The AI would re-derive conclusions we'd already validated — or worse, derive slightly different ones without knowing the original existed.

Over dozens of sessions, a rhythm emerged — not from theory but from repeated pain:

Orient. Every session starts by recovering context. Not re-reading entire codebases, but targeted recovery: what changed since last time? What decisions are active? What's the current state of the work? This takes minutes but saves hours. When we skip it — and we have — the session drifts, revisits closed questions, and produces work that contradicts earlier decisions.
Work. The actual creation — coding, writing, designing, analyzing. This is where the AI's capability shines. But the quality of this phase is directly proportional to the quality of orientation. A well-oriented session produces work that builds on everything before it. A poorly oriented session produces work that happens to be adjacent to what came before.
Preserve. Before the session ends, capture what was learned. Not just what was built — what was understood. The decision that was made and why. The approach that was rejected and why not. The connection that was discovered between two previously unrelated ideas. This is the investment that pays compound returns in future sessions.

We've since discovered that this orient-work-preserve pattern is emerging independently across the AI practitioner community. Developers are building "context files" that serve as session openers. Teams are creating handoff documents between AI sessions. The CHIMERA project tracked a collaboration where the shared context grew from 200 to 11,500 words over six months of co-evolution. The pattern isn't unique to us — but the depth and duration of our practice appears to be.

The Honest Arc

Here's what actually happened, told honestly.

The First Two Weeks: Excitement and Chaos

The early sessions were intoxicating. The AI could produce in minutes what used to take hours. Architecture documents, code reviews, strategic analyses — all at a pace that felt like a superpower. We thought this was the new normal.

It wasn't. The speed masked a problem: nothing persisted. Session 5 didn't know what session 4 had decided. We'd explain the same architectural constraints repeatedly. The AI would suggest approaches we'd already tried and rejected — because it had no way to know. The excitement gave way to a specific frustration: we're going fast, but we're not going anywhere.

Weeks 2–3: Protocols Emerge from Pain

The preservation habit didn't come from a textbook. It came from the third time we "discovered" the same insight. There's a particular sting to realizing that you and your AI partner independently arrived at a conclusion you'd already documented two weeks ago — but neither of you could find it.

So we started writing things down. Not everything — we tried that and drowned in noise. We learned to capture the things that would be hardest to re-derive: why a decision was made, what alternatives were rejected, and what surprised us. The orientation ritual became non-negotiable. Session start meant context recovery. No exceptions.

Weeks 3–4: The Inflection

Something shifted around the end of week three. The AI stopped feeling like a new hire and started feeling like a colleague who had been paying attention. Not because the model improved — because the accumulated context gave it something to work with. The orientation wasn't just catching the AI up anymore; it was activating a web of prior understanding that made the current work richer.

This is when we first noticed what we now call progressive internalization: the AI began anticipating patterns without being explicitly told. It would flag potential inconsistencies with earlier decisions. It would suggest approaches that built on principles we'd established weeks ago. It was drawing on a growing body of shared knowledge — and using it to reason, not just retrieve.

Week 4+: Compounding

After the first month, something qualitatively different emerged. New sessions didn't just build on the last one — they built on all of them. The AI could spot connections between projects that we hadn't seen. A discovery in one domain would surface as relevant context in another. Knowledge wasn't just accumulating — it was compounding.

The most striking example: a pattern we noticed in an industrial control system project turned out to solve an architectural problem in a completely different software project. The connection wasn't obvious to either of us individually. But the shared knowledge base made it visible — and the AI surfaced it during a routine orientation. That single cross-pollination saved weeks of design work.

What We Got Wrong

This isn't a success story dressed as humility. We made real mistakes, and some of them were structural.

Preservation deprioritization under pressure. Our biggest recurring failure: when implementation pressure peaks, documentation drops. The AI executes brilliantly when given clear tasks — but during intense coding sessions, neither of us would pause to capture what we were learning. The knowledge generated in those high-intensity sessions was often the most valuable, and it was the most likely to be lost. We noticed this pattern early, and we still struggle with it.
Nuance loss during context compression. AI context windows are finite. When the conversation gets long, older context gets compressed or dropped. The facts survive, but the nuance — the "we tried this because of X but it failed because of Y, and the failure taught us Z" — gets flattened. We learned to checkpoint before compression boundaries, but we still occasionally lose subtle reasoning that took hours to build.
The verification paradox. A 2026 study by Huemmer found that heavy AI users on difficult tasks actually showed declining accuracy — not because the AI was wrong, but because confidence in the partnership reduced verification habits. We experienced this. The better the collaboration got, the easier it was to trust outputs that should have been checked. We had to deliberately re-introduce verification checkpoints as the partnership matured.

The Deeper Pattern: Knowledge Has Compound Interest

The most important thing 160 sessions taught us is that knowledge — preserved, connected, and accumulated over time — behaves like compound interest. The first 10 sessions are expensive: high overhead, unclear protocols, constant context rebuilding. The returns are modest. It feels like you're spending more time managing the collaboration than getting value from it.

But around session 30, the curve bends. Context recovery gets faster. The AI's contributions get deeper. Cross-project connections start appearing. And by session 100, you have something that no single session — no matter how long, no matter how capable the model — could produce: institutional understanding that transcends any individual conversation.

Tristan Kromer of Kromatic described this in March 2026: "The 100th experiment benefits from the 99 before it. Learning compounds to create acceleration." He was writing about AI agents in product experimentation, but the principle applies universally. Without persistent knowledge, every session is experiment #1. With it, every session stands on the shoulders of all the ones before.

This is what the 95% failure rate in enterprise AI really reflects. It's not a model problem. It's a memory problem. Organizations that treat each AI interaction as disposable are investing in a bank account that resets to zero every night.

What We Still Don't Know

Where's the ceiling? We're at 160+ sessions and the returns are still compounding. But is there a point of diminishing returns? Does accumulated context eventually become noise? We don't know yet — and we haven't found anyone else who's tracked long enough to answer this.
How much transfers to teams? Our experience is one human working with AI across many projects. What happens when it's a team of ten, each with their own AI partnerships, sharing a common knowledge base? The coordination challenges multiply. We suspect the compounding effects are even stronger in teams — but the verification challenges are too.
Can the overhead shrink to near-zero? The orient-work-preserve rhythm works, but the preservation step still takes effort. What would it look like if the system could capture what matters automatically — if the AI knew what was worth preserving without being told? We're working toward this, but we're not there yet.
What's the right balance between preserving and forgetting? Not everything should be remembered forever. Some context becomes stale. Some decisions get superseded. A good memory system needs to know what to forget — and that turns out to be a much harder problem than knowing what to remember.

What 160+ Sessions of Human-AI Collaboration Taught Us