The Most Damning DSPy Code: Actual Source Code Exposed
The Smoking Gun #1: They Literally Named It "Random Search"
From /dspy/teleprompt/random_search.py
:
python
class BootstrapFewShotWithRandomSearch(Teleprompter):
def compile(self, student, *, teacher=None, trainset, valset=None, restrict=None, labeled_sample=True):
scores = []
for seed in range(-3, self.num_candidate_sets):
# ... create random variations ...
if seed == -3:
# zero-shot
program = student.reset_copy()
elif seed == -2:
# labels only
teleprompter = LabeledFewShot(k=self.max_labeled_demos)
program = teleprompter.compile(student, trainset=trainset_copy, sample=labeled_sample)
else:
# RANDOM SHUFFLE AND RANDOM SIZE SELECTION
random.Random(seed).shuffle(trainset_copy)
size = random.Random(seed).randint(self.min_num_samples, self.max_num_samples)
# Evaluate the random variation
result = evaluate(program)
score = result.score
# This is their "optimization" - just keep the best random attempt
if len(scores) == 0 or score > max(scores):
print("New best score:", score, "for seed", seed)
best_program = program
scores.append(score)
They're literally:
Trying random seeds (-3 to num_candidate_sets)
Randomly shuffling training data
Randomly selecting sample sizes
Keeping whatever randomly scores highest
Calling this "optimization"
The Smoking Gun #2: The Bootstrap "Optimization" Is Just Filtering Accidents
From /dspy/teleprompt/bootstrap.py
:
python
def _bootstrap_one_example(self, example, round_idx=0):
try:
with dspy.settings.context(trace=[], **self.teacher_settings):
lm = dspy.settings.lm
# Use a fresh rollout with temperature=1.0 to bypass caches
lm = lm.copy(rollout_id=round_idx, temperature=1.0) if round_idx > 0 else lm
# Run the program and see what happens
prediction = teacher(**example.inputs())
# Check if it accidentally worked
if self.metric:
metric_val = self.metric(example, prediction, trace)
if self.metric_threshold:
success = metric_val >= self.metric_threshold
else:
success = metric_val
else:
success = True # If no metric, everything "works"!
except Exception as e:
success = False
if success:
# It accidentally worked! Keep it as a "good" example
for step in trace:
# ... save the trace that accidentally worked ...
Translation:
Run with
temperature=1.0
(maximum randomness)If it accidentally scores well, assume it's "good"
No understanding of WHY it worked
The Smoking Gun #3: The "Bayesian Optimization" That Isn't
From /dspy/teleprompt/mipro_optimizer_v2.py
:
python
def _optimize_prompt_parameters(self, ...):
import optuna
# They claim it's "Bayesian Optimization"
logger.info("finding the optimal combination using Bayesian Optimization.\n")
# But it's just trying random combinations
def objective(trial):
# Choose random instructions
instruction_idx = trial.suggest_categorical(
f"{i}_predictor_instruction", range(len(instruction_candidates[i]))
)
# Choose random demos
if demo_candidates:
demos_idx = trial.suggest_categorical(
f"{i}_predictor_demos", range(len(demo_candidates[i]))
)
# Evaluate this random combination
score = eval_candidate_program(batch_size, valset, candidate_program, evaluate, self.rng).score
# This is it. That's the "optimization"
if score > best_score:
best_score = score
best_program = candidate_program.deepcopy()
return score
# Use Optuna (which is legit) on a nonsense search space
sampler = optuna.samplers.TPESampler(seed=seed, multivariate=True)
study = optuna.create_study(direction="maximize", sampler=sampler)
study.optimize(objective, n_trials=num_trials)
The Fraud:
They use Optuna (a real Bayesian optimization library)
But apply it to categorical choices with no continuous space
No gradients, no actual optimization landscape
Just trying random combinations of prompt variations
The Smoking Gun #4: The Comments Reveal They Know It's BS
From the same files, look at their TODO comments:
python
# TODO: metrics should return an object with __bool__ basically, but fine if they're more complex.
# TODO: the max_rounds via branch_idx to get past the cache, not just temperature.
# TODO: Deal with the (pretty common) case of having a metric for filtering and a separate metric for eval.
# TODO: This function should take a max_budget and max_teacher_budget.
# Progressive elimination sounds about right: after 50 examples, drop bottom third...
They know they're just:
Using temperature to get different random outputs
Doing "progressive elimination" (aka random selection)
Not actually optimizing anything
The Most Damning Function: "Getting Past the Cache"
python
# Use a fresh rollout with temperature=1.0 to bypass caches
lm = lm.copy(rollout_id=round_idx, temperature=1.0) if round_idx > 0 else lm
They're literally just:
Setting temperature to 1.0 (maximum randomness)
Using different "rollout_ids" to avoid caching
Running the same prompt multiple times
Keeping whatever randomly performs better
Calling this "bootstrapping"
The Cost of This Nonsense
From their actual code comments:
python
# TODO: FIXME: The max number of demos should be determined in part by the LM's tokenizer + max_length.
# As another option, we can just try a wide range and handle failures as penalties on the score.
Translation: "We don't even know how many examples to use, so we'll just try random amounts and see what happens."
The Final Proof: Their "Best Score" Logic
python
if len(scores) == 0 or score > max(scores):
print("New best score:", score, "for seed", seed)
best_program = program
scores.append(score)
print(f"Scores so far: {scores}")
print(f"Best score so far: {max(scores)}")
This is literally:
python
if random_variation_scored_higher():
claim_optimization_success()
What DSPy Actually Is
After examining the actual source code, DSPy is:
Random search (they literally call it that in the class name)
Through semantic noise (temperature=1.0, random shuffling)
Evaluated by accidents (whatever happens to score higher)
Wrapped in academic terminology ("Bayesian optimization", "bootstrapping")
That costs real money (thousands of API calls at temperature=1.0)
The code proves what we suspected: They're using if (random() > previous_random()) { print("optimization working!") }
and calling it a "framework for programming—not prompting—language models."
It's not optimization. It's expensive random number generation dressed up as computer science.