Prompt Politeness Affects LLM Accuracy (2025)

https://arxiv.org/abs/2510.04950

Comments

theanonymousoneMay 27, 2026, 8:19 AM
I have always said please and thank you to LLMs, not because of accuracy or because I'm stupid. I believe it is more about me than about the LLM, and this is anyway a habit I don't want to lose.
jkarniMay 27, 2026, 8:22 AM
Thomas Aquinas believed cruelty to animals was wrong not because animals have souls (and with that all the standard moral rights), but because it can teach us cruelty to other humans.
niek_pasMay 27, 2026, 8:33 AM
Genuine question: do you add 'please' and 'thank you' to Google searches? If not, what sets them apart?
perching_aixMay 27, 2026, 8:34 AM
[delayed]
spiderfarmerMay 27, 2026, 8:38 AM
Google isn’t conversational.
polytelyMay 27, 2026, 8:41 AM
it sort of makes sense to me, when asking a question to an expert in the field while you are a student. I would guess the successful interactions on average would be more polite . Like for example if you were asking a question to donald knuth or terrence tao, you'd probably be polite while doing so. Being hostile while asking questions gets you into forum discussion territory.
TimCTRLMay 27, 2026, 8:22 AM
i only say please and thank you such that when the robots finally take over, they will remember i was nice to them.
octocopMay 27, 2026, 8:28 AM
it seems they will remember that you wasted tokens for no reason and punish you instead.
emil-lpMay 27, 2026, 8:33 AM
Tokens are their food, it's literally what keeps them alive.

Not feeding them tokens is neglect.

I try to feed them a healthy diet.

331c8c71May 27, 2026, 8:01 AM
Interesting.

I am wondering why would anyone use a t-test when the experiment is clearly modelled by a binomial distribution: 250 independent questions and each one is either answered correctly or not (the null is that the success rate is the same).

jampekkaMay 27, 2026, 8:28 AM
The methods could be better described in the paper, but my understanding is that they did 10 runs for each question for each prompt and took an average of those, so the compared values are not binary. You could do a sign test, but you'd lose power and answer a bit different question.
plewdMay 27, 2026, 8:17 AM
I don't know much about stats, but does "the null is that the success rate is the same" imply that it's a sketchy methodology because they can come up with some findings ("ruder prompts are better/worse!") more often?
jampekkaMay 27, 2026, 8:39 AM
That's the usual null hypothesis for these kinds of tests.
dude250711May 27, 2026, 8:06 AM
I have an idea: let's use these things for autonomous software engineering.
faizeMay 27, 2026, 8:14 AM
Remember to always say "please" and "thank you" when planning a critical system
eigenspaceMay 27, 2026, 8:16 AM
Please remember to always say "please" and "thank you" when planning a critical system. Thank you!
vlabakje90May 27, 2026, 8:18 AM
[dead]