Last year I did a caffeine cycling self-experiment and I determined that I don’t get habituated to caffeine when I drink coffee three days a week. I did a follow-up experiment where I upgraded to four days a week (Mon/Wed/Fri/Sat) and I found that I still don’t get habituated.

For my current weekly routine, I have caffeine on Monday, Wednesday, Friday, and Saturday. Subjectively, I often feel low-energy on Saturdays. Is that because the caffeine I took on Friday is having an aftereffect that makes me more tired on Saturday?

When I ran my second experiment, I took caffeine four days, including the three-day stretch of Wednesday-Thursday-Friday. I found that my performance on a reaction time test was comparable between Wednesday and Friday. If my reaction time stayed the same after taking caffeine three days in a row, that’s evidence that I didn’t develop a tolerance over the course of those three days.

But if three days isn’t long enough for me to develop a tolerance, why is it that lately I feel tired on Saturdays, after taking caffeine for only two days in a row? Was the result from my last experiment incorrect?

So I decided to do another experiment to get more data.

This time I did a new six-week self-experiment where I kept my current routine, but I tested my reaction time every day. I wanted to test two hypotheses:

Is my post-caffeine reaction time worse on Saturday than on Mon/Wed/Fri?
Is my reaction time worse on the morning after a caffeine day than on the morning after a caffeine-free day?

The first hypothesis tests whether I become habituated to caffeine, and the second hypothesis tests whether I experience withdrawal symptoms the following morning.

The answers I got were:

No, there’s no detectable difference.
No, there’s no detectable difference.

Therefore, in defiance of my subjective experience—but in agreement with my earlier experimental results—I do not become detectably habituated to caffeine on the second day.

However, it’s possible that caffeine habituation affects my fatigue even though it doesn’t affect my reaction time. So it’s hard to say for sure what’s going on without running more tests (which I may do at some point).

Contents
Experimental procedure
Results
Alternative experimental procedures that I’m not going to do
A story about how I thought my experiment failed, but actually I was just being stupid
Notes

Experimental procedure

As with my previous experiments, I took a reaction time test every morning before caffeine, as well as an hour after caffeine on days when I took it (Mon/Wed/Fri/Sat). I ran the test for six weeks.

This experiment had the same flaws as my previous experiments, e.g., I did not blind myself because blinding myself is annoying and I didn’t feel like doing it.

In my first two experiments, I was meticulous about controlling the conditions on my computer during the reaction time test. I always tested using the humanbenchmark.com test in Chrome with a single browser window open. I normally use Firefox, but I tested in a different browser to be sure that my 100+ open Firefox tabs wouldn’t interfere with the test in any way (perhaps background tasks could slow down the JavaScript code that runs the reaction time app, which could artificially inflate my reaction time). I tested without any other applications open on my computer except for Emacs and a terminal window (which I always have open).

For my most recent experiment, I wasn’t so meticulous about it because I wanted to be lazy and I figured it probably didn’t matter. I still did the reaction time test in Chrome, but I didn’t close Firefox or other applications during the test.

Results

First, I tested to see if caffeine even made a visible difference in reaction time. Last time, caffeine had a strong and readily apparent effect on my reaction time. My third experiment replicated this result:

caffeine vs. no-caffeine:
    298.0 ms vs. 303.5 ms
    t-stat = -2.9, p-value = 0.006

However, my reaction time was noticeably worse than in the previous two experiments. My average used to hover around 280 ms and now it was hovering around 300 ms. Perhaps because I was less meticulous about keeping my computer in consistent conditions, I ended up adding some latency to the reaction time app?

Some evidence for this hypothesis is that I’ve tried testing my reaction time on Windows a few times (I normally use Linux) and it’s much faster—more like 230 ms. This is almost certainly due to a difference in how the reaction time app works on Windows vs. Linux.

My primary hypothesis test—which I pre-registered to myself, but did not pre-register publicly—was to compare post-caffeine reaction time performance on Saturday vs. the average of every other caffeine day (Mon/Wed/Fri). This test got a null result:

Saturdays vs. non-Saturday caffeine days:
    297.2 ms vs. 298.3 ms
    t-stat = -0.4, p-value = 0.697

I felt generally worse on Saturdays, but perhaps I was imagining things or seeing patterns that weren’t there, and really I shouldn’t worry about it.

Or perhaps I do actually feel worse on the second caffeine day, in a way that reaction time fails to capture. It’s possible that caffeine’s different effects habituate at different rates, and I’m losing my alertness faster than I’m losing my reaction speed.

(I would guess that caffeine’s effect on exercise performance would habituate particularly slowly—as I understand, caffeine improves exercise by physiologically improving muscle function somehow (it enhances calcium circulation or something), not just by increasing alertness.)

My second hypothesis was that I experience caffeine withdrawal on the morning after a caffeine day. I got a null result for this hypothesis as well:

morning after caffeine vs. morning after nocaf:
    303.6 ms (sd 6.9) vs. 303.5 ms (sd 7.0)
    mean difference = 0.1 ms
    t-stat = 0.0, p-value = 0.987

(Before running the experiment, I had a vague idea that I wanted to test this hypothesis, but I didn’t mentally pre-register a methodology.)

Alternative experimental procedures that I’m not going to do

It could be that my reaction time doesn’t get worse on the second day, but my alertness does get worse. I can think of two methods to test that hypothesis, but I don’t want to do them.

Method 1: Same procedure as before, but instead of using a reaction time test as the independent variable, I subjectively rate my alertness. This seems not good because it’s unblinded. I’m not too concerned about blinding reaction time because it’s hard to placebo yourself into a faster reaction time, but “subjective rating of alertness” is exactly the sort of thing that’s highly prone to a placebo effect.

Method 2: Randomize whether I take caffeine pills or placebo pills, and blind myself. To detect potential habituation, I can take the same pill two days in a row, but blind myself to what type of pill it is. Then I subjectively rate my alertness. I don’t want to do that either because it would require working out without caffeine 50% of the time, and working out without caffeine is unpleasant.

A story about how I thought my experiment failed, but actually I was just being stupid

After completing my experiment—this was about three months ago¹—I wrote some code to test the hypotheses. To my dismay, I found no detectable difference between caffeine and no-caffeine reaction times:

caffeine vs. no-caffeine:
    298.3 ms vs. 303.5 ms
	t-stat = 0.0, p-value = 0.987

If there’s not even a difference between caffeine and no-caffeine, then the experiment is useless.

At the time, I was too tired and demotivated to write up the results, so I abandoned it for a while.

Eventually I decided to finally write up the results of my experiment again. I looked at the numbers and I noticed that they didn’t make any sense. If the difference between caffeine and no-caffeine was 5.2 ms, how was the t-stat 0.0?

You may be able to see the mistake I made if you look at the numbers from the Results section. Instead of printing the t-stat and p-value for the caffeine vs. no-caffeine t-test, I accidentally printed the numbers from the morning after caffeine vs. morning after nocaf test. So the figures I was looking at were totally wrong.

I guess I wasn’t 100% there mentally when I wrote the code. (Honestly I don’t think I was even 30% there.)²

If you want to check if my code contains any other horrible mistakes, you can find it on GitHub.

Posted on Nov 03, 2025

Notes

I have a bad habit of letting half-finished drafts sit in my drafts folder for a long time. ↩
I wrote it on a non-caffeine day which might have something to do with it. ↩

Philosophical Multicore | Michael Dickens

My Third Caffeine Self-Experiment

Contents

Experimental procedure

Results

Alternative experimental procedures that I’m not going to do

A story about how I thought my experiment failed, but actually I was just being stupid

Notes