The Paper That Changes Everything: Why ICL Was Already Fine-Tuning

Oct 11

October 11, 2025

Google just proved that in-context learning is literally fine-tuning your model. They showed, mathematically, that when you give an LLM examples in its context, it's creating temporary weight updates—actual rank-1 modifications to its neural network. The weights change during inference, then snap back when generation completes.

Read that again. Your model is already training itself. It just forgets immediately.

This discovery should have been earthshaking. Instead, most people read it as "interesting mechanism explanation" and moved on. They missed the obvious next question: If ICL is already doing fine-tuning temporarily, why not make it permanent?

That question led us down a rabbit hole that fundamentally changes what an AI agent can be.

The Hidden Problem Everyone Accepts

You know the drill. You carefully craft few-shot examples. You optimize your prompts. You build RAG pipelines. And every single time your model runs, it has to re-learn the same patterns from the same examples, burning the same context tokens, taking the same time.

It's insane when you think about it. Imagine if humans had to re-read a textbook at the start of every math problem. That's what we're doing with ICL—forcing models to re-learn patterns they've seen thousands of times.

The industry just accepted this as "how things work." Context windows got bigger. We got better at prompt engineering. But nobody questioned the fundamental absurdity of temporary learning.

The Google Discovery That Changes the Game

Then Google Research drops this paper. They prove that when you put examples in context, the self-attention mechanism literally modifies the MLP weights. Not metaphorically. Literally. The mathematical operation is:

W_effective = W_base + rank-1 update from context

This is the exact same operation as LoRA fine-tuning. The only difference? One is temporary, one is permanent.

Think about what this means: Every time you do few-shot prompting, you're already fine-tuning your model. You're just throwing away the result.

The Obvious Question Nobody Asked

If ICL is already creating weight updates, and those updates are improving performance, and we can see exactly what those updates are... why not just keep them?

Why force the model to re-derive the same weight modifications every single inference? Why not just make those updates permanent?

This isn't a small optimization. This is a fundamental rethinking of how AI systems learn.

From Temporary to Permanent: The SynDE Approach

This is where our workflow orchestration engine, SynDE, becomes crucial. See, you can't just save random weight updates—you need clean, structured execution logs to convert into training data. You need to know what worked, what failed, and why.

SynDE was originally built to solve a different problem: preventing human contamination in AI execution. No "emotional journeys," no face-saving cope, no momentum preservation. Just deterministic execution with clean success/failure signals.

Turns out, those clean execution logs are exactly what you need to implement permanent learning:

Every workflow execution generates structured logs (state → action → outcome)
These logs contain the same information as few-shot examples (but richer, with reasoning traces)
We can convert them into LoRA fine-tuning data (rank-1, just like ICL)
The model permanently learns from experience (no context needed for learned patterns)

The Dual-Layer Architecture

We're not replacing context with fine-tuning. We're doing both, because they serve different purposes:

Context Layer (Fast):

Immediate adaptation to novel patterns
Task-specific customization
Easy to inspect and modify
Handles the "new" and "unusual"

Weight Layer (Slow):

Permanent capability improvement
Zero inference overhead
Compounds over time
Handles the "common" and "learned"

Together, they create something unprecedented: an AI that gets permanently better at its job.

Why This Actually Works (The Math Checks Out)

The beautiful thing is we're not speculating. The Google paper provides mathematical proof that context creates rank-1 weight updates. We're just making those same updates permanent through LoRA fine-tuning.

Google proved: Context → Temporary rank-1 update
We implement: Execution logs → Permanent rank-1 update
Same mechanism: Different persistence

The information content is equivalent (actually superior in logs due to reasoning traces). The mathematical operation is identical. We're not inventing new learning—we're just choosing to remember it.

The Implications Are Staggering

Imagine agents that:

Learn from every interaction without human supervision
Permanently improve rather than just accumulating context
Transfer skills across tasks through composed adapters
Reduce costs by eliminating repetitive context

This isn't incremental improvement. This is the difference between a student who takes notes and a student who actually learns. Between a system that pretends to adapt and one that genuinely evolves.

Why Everyone Else Will Miss This

Making this leap requires connecting dots across multiple domains:

Understanding the deep implications of the rank-1 discovery
Having clean execution logs (not chat transcripts)
Recognizing logs as training data
Implementing proper dual-layer learning

Most teams have chat logs full of human contamination. They have RAG pipelines committed to the "context only" approach. They've accepted temporary learning as permanent reality.

We haven't. And that changes everything.

The Paper

Below, we present our complete technical framework for unifying in-context learning and fine-tuning through workflow execution logs. It's dense, it's technical, and it's probably the most important thing we've published.

Because once you understand that your model is already training itself—just forgetting immediately—the path forward becomes obvious: Make it remember.

Welcome to the era of permanently learning AI.

Read the full paper - https://www.data-monger.com/ephemeral-to-permanent

Brandon Heaton

The Paper That Changes Everything: Why ICL Was Already Fine-Tuning

The Hidden Problem Everyone Accepts

The Google Discovery That Changes the Game

The Obvious Question Nobody Asked

From Temporary to Permanent: The SynDE Approach

The Dual-Layer Architecture

Why This Actually Works (The Math Checks Out)

The Implications Are Staggering

Why Everyone Else Will Miss This

The Paper

Your Site Title

Location

Contact

The Paper That Changes Everything: Why ICL Was Already Fine-Tuning

The Hidden Problem Everyone Accepts

The Google Discovery That Changes the Game

The Obvious Question Nobody Asked

From Temporary to Permanent: The SynDE Approach

The Dual-Layer Architecture

Why This Actually Works (The Math Checks Out)

The Implications Are Staggering

Why Everyone Else Will Miss This

The Paper

What Are Your Input Tokens Worth?

Larry Ellison Knows

Your Site Title

Location

Contact