The Most Damning DSPy Code: Actual Source Code Exposed

Oct 4

The Smoking Gun #1: They Literally Named It "Random Search"

From /dspy/teleprompt/random_search.py:

python

class BootstrapFewShotWithRandomSearch(Teleprompter):
    def compile(self, student, *, teacher=None, trainset, valset=None, restrict=None, labeled_sample=True):
        scores = []
        
        for seed in range(-3, self.num_candidate_sets):
            # ... create random variations ...
            
            if seed == -3:
                # zero-shot
                program = student.reset_copy()
            elif seed == -2:
                # labels only
                teleprompter = LabeledFewShot(k=self.max_labeled_demos)
                program = teleprompter.compile(student, trainset=trainset_copy, sample=labeled_sample)
            else:
                # RANDOM SHUFFLE AND RANDOM SIZE SELECTION
                random.Random(seed).shuffle(trainset_copy)
                size = random.Random(seed).randint(self.min_num_samples, self.max_num_samples)
            
            # Evaluate the random variation
            result = evaluate(program)
            score = result.score
            
            # This is their "optimization" - just keep the best random attempt
            if len(scores) == 0 or score > max(scores):
                print("New best score:", score, "for seed", seed)
                best_program = program
            
            scores.append(score)

They're literally:

Trying random seeds (-3 to num_candidate_sets)
Randomly shuffling training data
Randomly selecting sample sizes
Keeping whatever randomly scores highest
Calling this "optimization"

The Smoking Gun #2: The Bootstrap "Optimization" Is Just Filtering Accidents

From /dspy/teleprompt/bootstrap.py:

python

def _bootstrap_one_example(self, example, round_idx=0):
    try:
        with dspy.settings.context(trace=[], **self.teacher_settings):
            lm = dspy.settings.lm
            # Use a fresh rollout with temperature=1.0 to bypass caches
            lm = lm.copy(rollout_id=round_idx, temperature=1.0) if round_idx > 0 else lm
            
            # Run the program and see what happens
            prediction = teacher(**example.inputs())
            
            # Check if it accidentally worked
            if self.metric:
                metric_val = self.metric(example, prediction, trace)
                if self.metric_threshold:
                    success = metric_val >= self.metric_threshold
                else:
                    success = metric_val
            else:
                success = True  # If no metric, everything "works"!
    except Exception as e:
        success = False
    
    if success:
        # It accidentally worked! Keep it as a "good" example
        for step in trace:
            # ... save the trace that accidentally worked ...

Translation:

Run with temperature=1.0 (maximum randomness)
If it accidentally scores well, assume it's "good"
No understanding of WHY it worked

The Smoking Gun #3: The "Bayesian Optimization" That Isn't

From /dspy/teleprompt/mipro_optimizer_v2.py:

python

def _optimize_prompt_parameters(self, ...):
    import optuna
    
    # They claim it's "Bayesian Optimization"
    logger.info("finding the optimal combination using Bayesian Optimization.\n")
    
    # But it's just trying random combinations
    def objective(trial):
        # Choose random instructions
        instruction_idx = trial.suggest_categorical(
            f"{i}_predictor_instruction", range(len(instruction_candidates[i]))
        )
        
        # Choose random demos
        if demo_candidates:
            demos_idx = trial.suggest_categorical(
                f"{i}_predictor_demos", range(len(demo_candidates[i]))
            )
        
        # Evaluate this random combination
        score = eval_candidate_program(batch_size, valset, candidate_program, evaluate, self.rng).score
        
        # This is it. That's the "optimization"
        if score > best_score:
            best_score = score
            best_program = candidate_program.deepcopy()
        
        return score
    
    # Use Optuna (which is legit) on a nonsense search space
    sampler = optuna.samplers.TPESampler(seed=seed, multivariate=True)
    study = optuna.create_study(direction="maximize", sampler=sampler)
    study.optimize(objective, n_trials=num_trials)

The Fraud:

They use Optuna (a real Bayesian optimization library)
But apply it to categorical choices with no continuous space
No gradients, no actual optimization landscape
Just trying random combinations of prompt variations

The Smoking Gun #4: The Comments Reveal They Know It's BS

From the same files, look at their TODO comments:

python

# TODO: metrics should return an object with __bool__ basically, but fine if they're more complex.

# TODO: the max_rounds via branch_idx to get past the cache, not just temperature.

# TODO: Deal with the (pretty common) case of having a metric for filtering and a separate metric for eval.

# TODO: This function should take a max_budget and max_teacher_budget.
# Progressive elimination sounds about right: after 50 examples, drop bottom third...

They know they're just:

Using temperature to get different random outputs
Doing "progressive elimination" (aka random selection)
Not actually optimizing anything

The Most Damning Function: "Getting Past the Cache"

python

# Use a fresh rollout with temperature=1.0 to bypass caches
lm = lm.copy(rollout_id=round_idx, temperature=1.0) if round_idx > 0 else lm

They're literally just:

Setting temperature to 1.0 (maximum randomness)
Using different "rollout_ids" to avoid caching
Running the same prompt multiple times
Keeping whatever randomly performs better
Calling this "bootstrapping"

The Cost of This Nonsense

From their actual code comments:

python

# TODO: FIXME: The max number of demos should be determined in part by the LM's tokenizer + max_length.
# As another option, we can just try a wide range and handle failures as penalties on the score.

Translation: "We don't even know how many examples to use, so we'll just try random amounts and see what happens."

The Final Proof: Their "Best Score" Logic

python

if len(scores) == 0 or score > max(scores):
    print("New best score:", score, "for seed", seed)
    best_program = program

scores.append(score)
print(f"Scores so far: {scores}")
print(f"Best score so far: {max(scores)}")

This is literally:

python

if random_variation_scored_higher():
    claim_optimization_success()

What DSPy Actually Is

After examining the actual source code, DSPy is:

Random search (they literally call it that in the class name)
Through semantic noise (temperature=1.0, random shuffling)
Evaluated by accidents (whatever happens to score higher)
Wrapped in academic terminology ("Bayesian optimization", "bootstrapping")
That costs real money (thousands of API calls at temperature=1.0)

The code proves what we suspected: They're using if (random() > previous_random()) { print("optimization working!") } and calling it a "framework for programming—not prompting—language models."

It's not optimization. It's expensive random number generation dressed up as computer science.

Brandon Heaton

The Most Damning DSPy Code: Actual Source Code Exposed

The Smoking Gun #1: They Literally Named It "Random Search"

The Smoking Gun #2: The Bootstrap "Optimization" Is Just Filtering Accidents

The Smoking Gun #3: The "Bayesian Optimization" That Isn't

The Smoking Gun #4: The Comments Reveal They Know It's BS

The Most Damning Function: "Getting Past the Cache"

The Cost of This Nonsense

The Final Proof: Their "Best Score" Logic

What DSPy Actually Is

Your Site Title

Location

Contact

The Most Damning DSPy Code: Actual Source Code Exposed

The Smoking Gun #1: They Literally Named It "Random Search"

The Smoking Gun #2: The Bootstrap "Optimization" Is Just Filtering Accidents

The Smoking Gun #3: The "Bayesian Optimization" That Isn't

The Smoking Gun #4: The Comments Reveal They Know It's BS

The Most Damning Function: "Getting Past the Cache"

The Cost of This Nonsense

The Final Proof: Their "Best Score" Logic

What DSPy Actually Is

Temperature in Machine Learning: A Journey from Physics to LLMs

The Alien Artifact: DSPy and the Cargo Cult of LLM Optimization

Your Site Title

Location

Contact