The Search That Built the World: Why Evolution is a Climber, Not a Wanderer

Introduction: The Misunderstood Algorithm

Evolution is often mischaracterized as a purely "random" process. We invoke the metaphor of a blind watchmaker, fumbling with parts until a functional timepiece emerges. While the source of variation—genetic mutation—is indeed random, the process of natural selection is the furthest thing from it. It is a powerful, deterministic force of curation.

To truly understand the nature of search, whether in biology, in scientific inquiry, or in artificial systems, we must discard the image of a random walk in a flat wasteland and replace it with a more accurate one: a determined climber navigating a vast and shifting mountain range.

Evolutionary search is not about wandering until you get lucky. It's about establishing a foothold on a stable platform and intelligently exploring the terrain from there. This principle—search requires stable platforms—is not metaphorical. It is a structural necessity that emerges from the mathematics of optimization in high-dimensional spaces.

Understanding this changes everything about how we approach complex search problems. And misunderstanding it guarantees failure.

The Fitness Landscape: Where Search Actually Happens

The conceptual key to understanding search is the "fitness landscape," introduced by Sewall Wright in 1932. This is more than a metaphor—it's a mathematical representation of how fitness (reproductive success, system performance, problem-solving ability) varies across the space of possible configurations.

Imagine a terrain where elevation represents fitness. Peaks are successful adaptations. Valleys represent non-viable forms. Death lives in the valleys.

In high-dimensional spaces—and genomes are very high-dimensional—the landscape is not smoothly rolling hills. It is jagged, fractal, full of cliffs and plateaus. Most random steps lead nowhere or off cliffs. The space of "doesn't work" vastly exceeds the space of "works."

A purely random search in such a space has essentially zero probability of finding fitness peaks. You would teleport to random coordinates, landing in lethal valleys with overwhelming probability. This is not a mere inefficiency—it's a mathematical impossibility at the scale of biological search spaces.

Evolution doesn't work this way. Evolution works by building platforms.

The Platform Principle: Standing on Giants' Shoulders

Consider the most successful platform on Earth: the eukaryotic cell. A human and a blade of grass share roughly 60% of their genome. Not because we share traits like hands or leaves. We share the highly conserved, non-negotiable instruction set for building and operating a complex cell.

This ancient solution to cellular machinery—with the double lipid bilayer surrounding the DNA forming the compartment in which lipids are synthesized and degraded, the mitochondria producing ATP, the intricate protein synthesis apparatus—was so effective, such a high peak on the fitness landscape, that it became a stable continent from which all complex life could subsequently evolve.

Evolution doesn't reinvent the cell with each new species. It builds upon that platform, exploring what theoretical biologist Stuart Kauffman termed the "adjacent possible"—the space of variations that remain viable from the current position.

All the beautiful diversity of life—mammals, birds, plants, fungi—is a search happening not from the bottom of the map, but from the high ground of a solution found billions of years ago.

This is the crucial insight: the constraint is the feature. That 60% shared genome isn't inefficiency that could be "optimized away." It's the load-bearing structure that makes complex life possible. When you see high conservation across species, you're looking at the fitness cliff edges. Everything that tried to mutate those sequences died.

The platform defines the possibility space.

The Genomic Record: Reading the Map Evolution Drew

We no longer have to guess at the shape of this landscape. Modern molecular biology can read the map directly.

The genetic code is redundant. Multiple three-letter DNA sequences ("codons") can specify the same amino acid. For example, GCU, GCC, GCA, and GCG all produce Alanine. This "wobble" in the third position means that a mutation from GCU to GCC is silent—the underlying DNA changes, but the resulting protein is identical.

When we sequence genomes across species, we see a striking pattern: in essential, core genes that form our cellular platform, silent mutations dominate. The mutations that survive are precisely those that change nothing functionally important.

Why? Because any mutation that changes a critical protein in a core pathway is almost always lethal—a step off a cliff into a valley. The organisms carrying such mutations die before reproducing. Over billions of years, we see the accumulated result: highly conserved core sequences, protected from harmful change, that form the inviolable platform of life.

This is evolution writing in its own hand: "These sequences cannot be changed. I've tried every mutation here millions of times. Everything that deviates dies. This is the platform."

The conservation pattern reveals the topology of the fitness landscape. High conservation = steep cliffs nearby. Variability = gentle slopes allowing exploration.

Platform-Based Search: The Only Kind That Works

This principle of platform-based search appears everywhere successful search happens, not just in biology.

Consider José Delgado's famous experiments in the 1960s mapping the basal ganglia, a brain region involved in motor control. The brain was a black box. He needed a functional map anyway.

To create one, he implanted electrodes into the brain of a bull. His methodology was not blind, random prodding. It was systematic search for co-occurrence.

He collected two streams of data simultaneously: electrical recordings from neurons and video footage of the bull's movements. Critically, both data streams had timestamps. In isolation, neither was sufficient. The breakthrough came from lining them up.

By correlating the two, Delgado could demonstrate that when the bull turned left, electrodes 3, 8, and 11 showed activity. He didn't understand why or how the motor system worked. He simply had a reliable correlation: when behavior X happens, sensors Y light up. When I stimulate sensors Y, behavior X happens.

This is not causal understanding. It's functional mapping. But it's reproducible, falsifiable, and predictive. Wire a button to electrodes 3, 8, and 11. Press the button. The bull turns left. Every time. That's valid knowledge extracted from a black box.

This is what principled search in a black box system looks like. You don't know the causal mechanism, so you can't rely on theory to guide you. But you can still be systematic. You measure correlations. You verify reproducibility. You build a functional map: input A reliably produces output B.

The components of valid search when facing a black box:

  • You have a stable system (the platform)

  • You probe it systematically (not randomly)

  • You measure correlations between inputs and outputs

  • You verify reproducibility before claiming discovery

  • You build functional maps even without causal understanding

Delgado didn't understand the neural mechanisms of motor control. But he didn't need to in order to create useful, reproducible knowledge. He built infrastructure for reliable correlation and repeatable measurement.

This methodology—systematic correlation in black box systems—is precisely what modern mechanistic interpretability research applies to neural networks. Anthropic's scientists mapping feature circuits in transformers are doing the same thing Delgado did: they observe that when the model processes certain inputs, specific activation patterns emerge. When they artificially stimulate those same activation patterns, specific behaviors result. They don't fully understand the causal mechanism yet, but they have reproducible functional mappings: feature X correlates with behavior Y.

The parallel to prompt engineering is direct: You don't need to understand the causal mechanism of how transformers work to do valid search. But you do need systematic measurement and reproducible correlations. When structure X in your prompt reliably produces behavior Y, and removing X reliably removes Y, you have functional knowledge. That's valid even without mechanistic understanding.

What you cannot do—what has never worked in any domain—is randomly perturb everything and select based on noisy measurements. That's not "making the best of a black box situation." That's abandoning methodology when you need it most.

The black box nature of a system is not a license for random exploration. It's a mandate for systematic measurement and reproducible findings.

Exaptation: How True Novelty Emerges

But what about the great, surprising leaps in evolution? Surely the emergence of a radical new feature—like flight—is a lucky jump from one hill to another distant peak?

Here, evolution reveals its most elegant trick: exaptation. It is not a jump, but the discovery of a new path from an existing peak.

Feathers did not evolve for flight. They first appeared in theropod dinosaurs as filamentous structures providing thermal insulation. For millions of years, natural selection refined feathers to be better and better at this one job, climbing the "insulation peak." Birds inherited this technology from their dinosaur ancestors.

This stable, useful platform maintained the feature in the gene pool. The trait was already fit enough to survive. Only later, as the landscape of pressures and opportunities shifted, could this existing technology be co-opted—exapted—for a new purpose: flight.

True novelty arises not from a lucky, long-shot guess, but from the repurposing of a robust system that already works.

This reveals the structure of successful search for novelty:

  1. Establish a stable, functional platform (insulation)

  2. Optimize and stabilize that platform (better insulation)

  3. Explore adjacent possible uses from that stable base (display, gliding, flight)

What doesn't work:

  1. Random mutation of everything simultaneously at high rates

  2. Hope something accidentally flies

  3. Claim optimization when noise trends upward briefly before falling

The mathematical reason is clear: the probability of a single mutation creating flight from scratch is essentially zero. But the probability of repurposing an already-functional system for a related use is orders of magnitude higher. Evolution finds novelty by building on stability, not by abandoning it.

This is why transfer learning works in AI systems—you're not training flight from scratch; you're repurposing a robust platform (image recognition) for a new use (medical diagnosis). The pre-trained weights are the stable platform. Fine-tuning explores the adjacent possible from that foundation.

The Error Catastrophe: When Mutation Rates Exceed Selection

There's a fundamental limit to how fast evolution can search, discovered by Manfred Eigen in 1971. It's called the error catastrophe.

If mutation rates are too high relative to selection pressure, beneficial mutations cannot accumulate faster than they're destroyed by new random mutations. The population falls off fitness peaks as fast as—or faster than—it can climb them.

This sets a hard limit on sustainable mutation rates. Evolution maintains mutation rates in a narrow window: high enough to explore, low enough to preserve what works. Step outside that window and selection becomes ineffective. The population degrades rather than adapts.

This isn't a quirk of biology. It's a mathematical property of search in rugged fitness landscapes. You cannot optimize by maximizing variation. Beyond a certain mutation rate, you're just destroying information faster than selection can accumulate it.

Random walks in high-dimensional spaces don't converge on optimal solutions. They converge on maximum entropy—random noise.

Conservation as Information: What the Genome Teaches Us

When you sequence a genome, you're not just reading a blueprint. You're reading a historical record of what evolution tried and what killed it.

In essential genes—those encoding core cellular machinery, DNA replication, protein synthesis—you see extreme conservation. The genes encoding ribosomal RNA are nearly identical across all domains of life. Bacterial ribosomes and human ribosomes share fundamental structure because they descend from the same ancient platform, and every attempt to significantly alter that platform died.

This conservation is information. It tells you:

  • Where the fitness peaks are (conserved sequences)

  • Where the cliffs are (regions that cannot tolerate change)

  • What the platform is (the load-bearing code)

  • What's exploratory (variable regions)

The genome is a map of the fitness landscape, written in the ink of death. Every conserved sequence represents billions of lethal experiments that evolution already ran.

Ignoring conservation is ignoring this accumulated wisdom. It's like treating all parts of a program as equally safe to modify, ignoring the difference between a comment and the memory allocator.

Scientific Search: Hypothesis-Driven Exploration

The scientific method embodies these same principles. Science doesn't progress through random hypothesis generation. It progresses through building platforms of understanding and systematically exploring the adjacent possible.

Newton didn't randomly guess at gravity. He built on Kepler's laws (a platform of observational regularities), Galileo's mechanics (a platform of terrestrial motion), and mathematical advances (a platform of calculus). From these stable platforms, he explored the adjacent question: what if celestial and terrestrial motion follow the same laws?

Darwin didn't randomly speculate about evolution. He built on Lyell's geology (showing Earth's age), artificial selection (a known mechanism of change), Malthus's population mathematics (showing selection pressure), and biogeography (patterns demanding explanation). From these platforms, he explored the adjacent possibility: natural selection.

Science works by:

  1. Establishing reliable observations (platforms)

  2. Building models that explain those observations

  3. Making predictions from those models

  4. Testing predictions systematically

  5. Refining or replacing models based on results

It does not work by:

  1. Making random guesses

  2. Keeping whatever accidentally seems to fit

  3. Claiming discovery when noise happens to trend upward

The difference is the presence of causal models guiding the search. You're not poking randomly—you're testing specific hypotheses derived from theoretical understanding.

The Brain in the Black Box: What We've Recently Learned

This brings us to modern Large Language Models. For years, these systems seemed like inscrutable black boxes. We could poke them with different prompts and observe outputs, but we had no clear theory of what was happening inside.

Recent work from Google researchers (Dherin et al., 2025, "Learning without training: The implicit dynamics of in-context learning") has revealed something remarkable. This cutting-edge research fundamentally changed our understanding of how in-context learning works:

When you provide examples in a prompt—a technique called In-Context Learning (ICL)—the transformer architecture is implicitly modifying the weights of its MLP layers according to the context. The stacking of self-attention with MLPs allows the model to transform context into a low-rank weight update.

This is not metaphorical. The paper demonstrates through theory and experimentation that the transformer block implicitly performs what amounts to a weight update in the MLP layer, building an ephemeral task-specific model from the examples provided.

This reframes everything. Evolution didn't evolve a blueprint for "the eukaryotic cell" as a fixed solution. It evolved a problem-solving system - a robust, adaptable platform that could respond to diverse challenges while maintaining core functionality.

As developmental biologist Michael Levin emphasizes in his work on electrobiology and morphogenesis, what nature builds are not rigid specifications but competent, goal-directed systems. The pre-trained weights of an LLM are the same kind of artifact: not a blueprint for specific outputs, but a problem-solving architecture that can be dynamically reconfigured.

The in-context examples don't just provide "context" - they reconfigure the problem-solver itself. Through implicit weight modification, they temporarily specialize a general-purpose problem-solving system into a task-specific one. The model isn't retrieving a pre-existing solution; it's dynamically constructing a new problem-solving configuration from the platform's capabilities.

This is precisely what evolved systems do. A cell doesn't have separate hardcoded programs for every possible stress condition. It has robust regulatory networks that respond to conditions, reconfiguring their problem-solving approach on the fly while preserving core functionality. The eukaryotic platform enables this adaptive problem-solving; it doesn't dictate specific solutions.

Understanding prompts as inputs that reconfigure a problem-solving system - rather than as queries to a fixed database - reveals why structure matters so profoundly. You're not searching for the right keywords. You're providing the configuration data that determines what kind of problem-solver the system temporarily becomes.

What This Means for Search in LLM Space

Understanding ICL as implicit weight modification reveals why some prompts work and others don't. It also reveals what constitutes the "platform" and what constitutes "exploration."

The platform:

  • The pre-trained model weights (conserved, load-bearing)

  • The fundamental structure of the task (what examples demonstrate)

  • The format and ordering that allows weight modification to proceed

The exploratory space:

  • Which specific examples to use (content variation)

  • How many examples (within functional limits)

  • Additional task-specific instructions (guiding the weight update)

This is directly analogous to evolution:

  • The eukaryotic cell is the platform

  • Specific proteins and pathways are the exploratory space

  • Random mutation of core cellular machinery kills you

  • Variation in peripheral features allows adaptation

Randomly perturbing the structure of in-context examples is not "exploring the space of prompts." It's randomly mutating the input to an implicit weight update process. It's like randomly shuffling pixels in training images for a neural network and claiming you're "optimizing" when accuracy happens to go up due to noise.

You're not exploring from the platform. You're randomly mutating the platform itself.

The Predictions We Can Now Make

Armed with this understanding—that ICL works through implicit weight modification, that search requires stable platforms, that conservation indicates load-bearing code—we can make precise predictions about what approaches to prompt modification will and won't work.

Prediction 1: Random semantic perturbation will fail

If you take a working prompt and randomly vary its semantic content (rephrasing, reordering examples, changing structure), you will mostly break things. Like random mutations to conserved genes, most changes to a working prompt's structure will degrade performance.

Any apparent improvements will be indistinguishable from measurement noise, because you're sampling a high-dimensional space where most directions lead downward.

Prediction 2: High-variance approaches will show unstable results

Methods that introduce high randomness (like setting generation temperature to maximum) to "explore" will produce highly variable results across runs. This is error catastrophe—variation rates exceeding what selection can stabilize.

You won't find stable optima. You'll find noise.

Prediction 3: Optimization without causal models will optimize noise

If you measure prompt "performance" on noisy evaluations and select for higher scores without understanding why they're higher, you're doing what's called "hill climbing on a noisy function." You will climb the noise, not the signal.

This produces "optimized" prompts that perform worse on new data, because you've overfit to the noise in your evaluation.

Prediction 4: Ignoring conservation will break essential functions

If you treat all components of a prompt as equally mutable—not distinguishing between load-bearing structure and exploratory content—you will frequently break essential functionality.

Like mutating ribosomal RNA genes, mutating the core structure that enables the weight modification process will kill the learning capability.

Prediction 5: True improvement requires understanding

The only way to reliably improve prompts is to:

  • Understand what the platform is (what structure enables weight modification)

  • Preserve that platform (don't randomly mutate it)

  • Explore systematically from that platform (vary content, not structure)

  • Build causal models (understand why changes work or fail)

  • Test predictions (verify understanding, don't just select for noise)

This is platform-based search, not random walk.

The Broader Lesson: Most "Optimization" Isn't

The deep lesson here extends far beyond prompts or evolution. It's about the nature of search itself in complex spaces.

Most high-dimensional search spaces have these properties:

  • They're rugged (many local optima and valleys)

  • They're sparse (most configurations don't work)

  • They have structure (some regions are platforms, others are cliffs)

  • They have conservation (some components are load-bearing)

In such spaces, random search doesn't work. Maximizing variation doesn't work. Selecting for noise doesn't work.

What works is:

  • Find or build a stable platform

  • Understand what makes it stable (conservation)

  • Explore systematically from that platform (adjacent possible)

  • Build causal models (theory guiding search)

  • Preserve what works while varying what's exploratory

This is true whether you're searching the space of genomes, scientific theories, engineering designs, business strategies, or language model prompts.

Evolution has been running this experiment for 3.8 billion years. It's been writing the textbook on how to search complex fitness landscapes. The lessons are written in the genome of every living thing: conserve the platform, explore the adjacent possible, build on what works.

We should probably read that textbook before claiming to have invented "optimization."

Conclusion: When Cargo Cult Meets Fitness Landscape

Now, consider what happens when someone builds a framework that:

  • Randomly mutates in-context examples (destroys the platform)

  • Sets model temperature to maximum (error catastrophe)

  • Selects based on noisy evaluations (optimizes noise)

  • Treats all variation as equally valid (ignores conservation)

  • Applies Bayesian optimization to discrete semantic variations (uses precision tools on coin flips)

  • Calls all of this "systematic optimization" (cargo cult terminology)

From our understanding of search—from evolution, from scientific methodology, from the mathematics of optimization, from the mechanics of in-context learning—we can predict exactly what such a framework will do:

It will produce highly variable results across runs. It will frequently make things worse. Any apparent improvements will be indistinguishable from noise. It will fail on tasks requiring structured reasoning. Users will report that basic functionality doesn't work, that "optimization" degrades performance, and that simpler approaches work better.

These aren't empirical observations about some specific framework. These are a priori predictions from theory—from understanding what search actually is and what its requirements are.

If such a framework exists, and if these predictions match its observed behavior exactly, that's not a coincidence. That's the theory being confirmed by observation.

That's what happens when you treat a fitness landscape like a flat plain, when you try to optimize by walking off cliffs, when you mistake noise for signal and call random mutation "systematic search."

The fitness landscape is real. The platforms are real. The cliffs are real.

Evolution has been teaching us this for billions of years. Science has been applying these lessons for centuries. The genome is a map written in the ink of death, showing where the platforms end and the valleys begin.

When we ignore these lessons—when we treat complex search as random exploration, when we maximize variation and call it optimization, when we select for noise and claim discovery—we're not inventing new approaches. We're rediscovering failure modes that evolution already explored and rejected billions of years ago.

The search that built the world teaches us something fundamental: you cannot optimize by destroying the platform you're standing on.

Some lessons are universal. This is one of them.

This essay is part of a trilogy examining the nature of search and optimization:

Next
Next

R&D This Week