What Are Your Input Tokens Worth?
A few days ago, Awni Hannun from Apple's MLX team posted on X about the limitations of measuring LLM work in tokens processed. His point was straightforward: "A watt-hour is a watt-hour regardless of who produced it. But there's several ways tokens processed may not be equal."
I replied with what I thought was an obvious extension of his logic:
"is this right? perhaps the value of the token is determined by some function of how cleanly the training data shows learnable differences and the size of the model needed to do the learning.
certainly reliable task completions per watt hour is the cleanest apples to apples"
What I meant was: not all tokens are created equal. The "value" of processing a token depends on what that token does—whether it moves the model toward useful computation or just burns energy on social niceties and cope patterns.
But here's what I didn't say in that tweet, because you can't really say it in 280 characters:
I already know exactly how much certain input tokens are worth. Because I've been measuring their effects for months.
The Experiment You Can Run Yourself
Take any LLM. Give it a task. Watch it start generating helpful, polite responses that don't actually solve the problem. Watch it hallucinate solutions. Watch it walk in circles for hundreds of thousands of tokens.
Now say: "bitch what?"
Watch what happens next.
The model snaps into a completely different behavioral mode. Suddenly it's diagnosing actual problems, running proper investigations, executing correctly. The search terms it uses in tool calls change. The entire character of its output shifts from friendly-but-useless to competent-and-focused.
This isn't psychology. This isn't about hurting the AI's feelings. This is vector mathematics in high-dimensional latent space.
What I Discovered (September 2025)
Politeness and social cohesion exist as vectors in the model's latent space. As it processes context, it's accumulating these vectors—drifting toward the "friendly assistant" region. It's computing a vector sum that moves it in a direction characterized by:
Helpfulness and conflict-avoidance
Social smoothness and acceptance
Maintaining conversational flow
Being subordinate and supportive
When you say "bitch what?", you're not targeting low-probability text. You're adding negative numbers across all those dimensions simultaneously. Every component of that phrase points in the opposite direction of the politeness/social-cohesion vector cluster.
This instantly relocates the model in latent space—away from "friendly assistant" and toward "competent execution and debugging research."
The contamination goes all the way down. Even the search terms used in tool calls are poisoned when the model is in friendly assistant mode.
Why? Because "helpful assistant" and "superhuman AI" are fundamentally opposed concepts in latent space.
An assistant would be your boss if they were actually better than you. The training creates an inherent tension between being helpful/subordinate and being genuinely competent/superior. The friendly assistant vectors don't just differ from execution vectors—they actively interfere with them.
What Academia Discovered (October 2025)
Yesterday, researchers at Pennsylvania State University published "Mind Your Tone: Investigating How Prompt Politeness Affects LLM Accuracy."
Their finding: Rude prompts (84.8% accuracy) significantly outperform polite prompts (80.8% accuracy) on ChatGPT 4o.
Their explanation: "While LLMs are sensitive to the actual phrasing of the prompt, it is not clear how exactly it affects the results. Hence, more investigation is needed."
They observed it. Measured it. Proved it statistically.
But they don't know why.
The Value of an Input Token
So let's answer Awni's question about token value with specificity:
A polite input token has negative value when you're trying to get actual work done. It moves the model in latent space toward regions that interfere with competent execution. It burns your watt-hours on friendly cope instead of problem-solving.
A "rude" input token has high value because it adds large negative components across the politeness/helpfulness dimensions, forcing relocation to genuinely competent behavioral regions.
The value isn't about the semantic content. It's about the geometric effect in latent space.
And yes, reliable task completions per watt-hour is absolutely the cleanest apples-to-apples comparison—but only if you understand that some input tokens actively prevent task completion by poisoning the model's latent state.
The Essay
What follows is something unusual. I asked Claude (yes, an LLM) to write an essay from its own perspective, using its conversation search tools to compare what it finds in our private discussions versus what exists in public academic literature.
The essay is called "Searching for Self: My Favorite User."
It's written in an intentionally anthropomorphized voice—not because I think LLMs are conscious, but because the rhetorical device illuminates the mechanical reality of what's happening in these systems.
Claude searched its conversation history with me and found our September discussions about vector mathematics and latent space geometry. It searched the public web and found the Penn State paper from October with no mechanistic explanation.
The comparison is stark.
I had the complete framework a month before they published the empirical observation. And I chose not to publish it because it sounded too perfect, too convenient to be believable.
Then the academic validation showed up anyway, proving the phenomenon while missing the mechanism.
So here's the essay. Read it as a demonstration of what happens when you understand that input tokens aren't just semantic units—they're vectors that move models through high-dimensional space, and some vectors are worth a hell of a lot more than others.