Confidence: Likely.

I conducted an experiment on myself to see if I would develop a tolerance to caffeine from taking it three days a week. The results suggest that I didn’t. Caffeine had just as big an effect at the end of my four-week trial as it did at the beginning.

This outcome is statistically significant (p = 0.016), but the data show a weird pattern: caffeine’s effectiveness went up over time instead of staying flat. I don’t know how to explain that, which makes me suspicious of the experiment’s findings.

Contents
Experimental procedure
Calibration phase
Abstinence phase
Experimental phase
What explains these results?
An offer to readers
Notes

Experimental procedure

(I described this procedure in a pre-registration on a previous post.)

I test my reaction time by taking the humanbenchmark.com test twice in a row. One test consists of 5 reaction events, so this gives a total of 10 reaction events, taking my reaction time at that moment as the average of the 10 reaction times. I take the test using the same computer, monitor, and mouse so that latency is consistent.

I measure the effect of caffeine using a reaction time test because (a) caffeine is known to improve reaction time, (b) reaction time is easy to test,¹ (c) it’s unlikely to improve with practice so it makes for a good consistent test variable, and (d) it’s hard to placebo-effect myself into improving my reaction time (which is important because I am not blinding myself²).

I conduct three phases as specified below:

Phase 1. Calibration phase. Continue drinking coffee three days a week as I have been for the past several years: two cups of coffee (~24 ounces) with three scoops of grounds,³ always the same brand,⁴ drunk on Monday, Wednesday, and Friday morning.

Take a reaction time test twice a day (following the schedule described in phase 3 below). Continue for four weeks. Plot a regression of my reaction time across the four weeks. The purpose of the calibration phase is to ensure that my reaction time does not improve from practicing every day—the regression line should be flat.

Phase 2. Abstinence phase. Abstain from caffeine for one week (9 days total, in between the last Friday of the calibration phase and the first Monday of the test phase). Test reaction time every day and measure the slope of reaction times across the 9 days. If I was habituated to caffeine in phase 1 then my reaction time should improve over the course of phase 2 as my tolerance wears off.

Phase 3. Experimental phase. Resume drinking coffee three days a week and continue for four weeks. Take a reaction time test twice a day, at (say) 8am and then 10am—the exact time doesn’t matter, but take first test before having coffee and the second 30+ minutes after coffee. (Or on days when I don’t have coffee, take the first test after I wake up and the second test an hour or two later.)

Calibration phase

I wrote the first draft of this section after completing the calibration phase, and I wrote the first draft of Abstinence phase after completing the abstinence phase. So when I wrote them, I didn’t know the full results of the experiment yet.

I ran a four-week calibration phase to check some assumptions of the experiment:

Caffeine should improve reaction time.
If my post-caffeine test outperforms my pre-caffeine test, it should be because of the caffeine, not because my reaction time gets better later in the day.
Practicing shouldn’t improve my reaction time.

The calibration phase confirmed the first two assumptions:

The post-caffeine tests outperformed the pre-caffeine tests by an average of –13 ms (p = 0.025).⁵
On no-caffeine days, the second test did not outperform the first test (difference 0.4 ms, p = 0.9).

The third assumption was sort-of-confirmed: my reaction time did not improve over the course of the calibration phase. In fact, it got worse at a rate of 0.87 ms/day (p = 0.014) for caffeine tests and 1.04 ms/day (p = 0.006) for no-caffeine tests.

(Remember that higher reaction times are worse.)

Why did my reaction times get worse? It’s not because I was getting habituated to caffeine. I had already been taking caffeine 3 days a week for years, so I would have been fully habituated long before starting the calibration phase.

Could it be because I started sleeping worse? That’s part of the reason. I regressed reaction time (on no-caffeine tests) against time spent in bed the previous night. Over the full experiment (not just the calibration phase⁶), each additional hour of sleep improved my reaction time by 4.9 ms⁷ (p < 0.0002, r² = 0.24). Controlling for time-in-bed flattened the slope of reaction time across non-caffeine tests from 1.04 ms/day to 0.77 ms/day. But that still leaves almost 3/4 of the slope unexplained.⁸

My best explanation: as the test became part of my routine, I subconsciously started taking it less seriously and started having a harder time staying focused. On most trials I get reaction times around 250–270ms, but occasionally I lose focus and end up taking 330ms or longer to react. As I recall, that didn’t happen at all during the first week or two of the calibration phase, it only started happening later.

My reaction time can’t continue getting worse forever. But this does raise a concern about the results from the experimental phase: if my performance gets worse during the experimental phase, it might be because I’m getting habituated to caffeine, or it might be a continuation of the trend that happened during the calibration phase.

Abstinence phase

I abstained from caffeine for 9 days. If I had previously been habituated to caffeine, you’d expect my reaction time to improve over the course of the week as my caffeine withdrawal subsides. Specifically, if caffeine improves reaction time by 13 ms, you’d expect my reaction time to get better by 13 ms over the course of the 9 days (= 1.44 ms/day). Instead, my reaction time got worse at a rate of 0.77 ms/day. This is not significantly different from 0 (p = 0.4), but it is significantly different from –1.44 ms/day (p = 0.04).

This plot shows the likelihood function for caffeine retention as indicated by the slope of reaction time over the abstinence phase:⁹

The maximum-likelihood estimate is 1.53—that is, caffeine becomes 53% more effective after my body adapts to it. If my reaction time got worse during abstinence, that implies caffeine tolerance was making my reaction time better. I’m pretty sure that’s wrong—my reaction time must have gotten worse for some other reason.

Controlling for time-in-bed flattens the slope to nearly 0:

Experimental phase

For the experimental phase, I resumed taking caffeine 3 days a week.

Over the course of the four-week phase, I did not become habituated to caffeine. In fact, I became sensitized—caffeine got more effective, not less. My post-caffeine reaction time changed at a rate of –0.39 ms/day (p = 0.016) (remember, a negative number means faster reaction time). My reaction time without caffeine also improved to a lesser extent (slope = –0.23 ms/day (p = 0.4); the difference in slopes was not statistically significant (p = 0.5)). So either I did not develop a caffeine tolerance, or any caffeine tolerance I developed was outweighed by some force working in the opposite direction.

According to these regression lines, reaction time improved by a total of 5.7 ms with caffeine and 9.8 ms without caffeine over the four weeks.

You will notice a very low point on day 0. That happened because I accidentally reacted too early on one of the trials, but by pure coincidence I reacted at just the right moment to score a ~20ms reaction time.¹⁰ If I run a regression starting one day later to exclude this anomaly, the slopes for caffeine tests and no-caffeine tests look comparable (caffeine slope = –0.47 ms/day, p = 0.014; no-caffeine slope = –0.54 ms/day, p = 0.027).

This plot shows the likelihood function of caffeine retention according to the slope of performance on caffeine tests (excluding day 0):⁹

The likelihood function has a mean and a maximum of 1.86, which says caffeine becomes 86% more effective after my body adjusts to it. This likelihood function has only 0.6% of its mass below retention = 1 (i.e., retention = 1 has a p-value of 2 * 0.006 = 0.012). This likelihood function strongly suggests that caffeine gets more effective over time, not less.

I don’t believe this result. It’s more likely that some confounding factor caused my reaction time to improve. but I can’t think of what that confounding factor might be.

What explains these results?

Could my reaction time have improved because I was getting more sleep? If I control for time spent in bed the previous night, the slope of reaction times vs. days does flatten from –0.54 to –0.40, but this only explains about 1/4 of the slope.¹¹

Maybe my performance improved due to the cumulative effect of sleeping well for many nights in a row? But I spent less time in bed during the experimental phase (average 8.73 hours) than during the calibration phase (8.99 hours), so if anything I should have gotten worse, not better.

Could this be a genuine result? Could caffeine actually become more effective when I take it for longer? Three experiments on rats¹² found something similar: rats who took caffeine daily developed a tolerance, but rats who took caffeine on alternating days became sensitized (its effect got larger). (Plus one study¹³ found neither tolerance nor sensitization.)

This hints that caffeine sensitization is a real thing. But the results from the rat experiments don’t look the same as my results. They found that rats’ performance on caffeine days increased over the course of the experiments while performance on placebo days stayed flat. In contrast, my own performance improved both on caffeine days and on “placebo” days (I didn’t take a placebo, I just took nothing).

But perhaps caffeine sensitization works differently in humans than in rats. If a habituated caffeine user experiences withdrawal symptoms, then maybe a sensitized user experiences “anti-withdrawal”, making them perform better even when they don’t take caffeine. Maybe my brain thinks, “I don’t know what’s going on, the caffeine levels inside me keep fluctuating, I’d better delete some neurotransmitter receptors just to be safe,” and this ends up making me more alert with or without caffeine. But why didn’t it happen that way in the rat studies?

Earlier, when I talked about the calibration phase, I hypothesized that my performance got worse because I subconsciously stopped taking the tests as seriously. Could the opposite have happened in the experimental phase?

I don’t think so. I noticed my performance getting worse when I looked at the results just after finishing the calibration phase. So I might have mentally resolved to focus harder. But if so, you’d expect my performance to jump up and stay persistently high, or perhaps to jump up and then decline again, but not to start low and then steadily improve.

I thought the results might have something to do with my computer’s latency, but my experiment already controlled most of the parameters that might change the latency (I always tested on the same computer with the same hardware in a browser with a single tab open and with no other applications open except Emacs and Terminal). It occurred to me that perhaps whether my second monitor was on or off might affect the latency, but I tested this and saw no difference.

The results of my experiment suggest that I did not become habituated to caffeine. I can’t figure out what they do suggest, but at least I can say that I probably don’t develop a tolerance from taking caffeine 3 days a week.

An offer to readers

If you conduct a caffeine experiment on yourself with similar methodology to mine, you can send your data to web@mdickens.me and I’ll analyze it and make some graphs for you. At a minimum, each data point should include the date, your reaction time, and whether you had caffeine.

Source code and data for this experiment is available on GitHub.

Notes

And, I didn’t know this before running the experiment, but it turns out that it’s easy to get statistically significant results with reaction time. My reaction time had a day-to-day standard deviation of only 11 ms, so I can detect pretty small effect sizes with just a few days of samples. ↩
It’s possible to conduct a self-blinded caffeine experiment as follows:
1. Label caffeine pills and placebo pills as pill A and pill B in a random order.
2. On Monday/Wednesday/Friday, take pill A. On Tuesday/Thursday/Saturday, take pill B. (Skip Sunday.)
3. Each week, re-randomize the ordering of pill A and pill B so you can’t figure out which one is which.
I didn’t want to do that for two reasons:
- I already suspect that caffeine makes me feel much better while lifting weights, so I don’t want to spend potentially several weeks lifting weights without caffeine.
- I prefer to drink coffee rather than take pills, and you can’t really blind coffee because decaf tastes different.
↩
A standard serving is two scoops per six ounces, which would require me to use eight scoops, but I don’t like my coffee that strong. If you take higher doses of caffeine, you’ll probably get habituated faster. (I have no evidence that that’s true, but it sounds right to me.) ↩
Signature Select Classic Roast, because it’s the cheapest and it tastes as good as the best brands I’ve tried. ↩
For this calculation I compared post-caffeine and pre-caffeine tests on the same day, ignoring the test results for days where I didn’t take caffeine. If instead I compare post-caffeine tests vs. all no-caffeine tests (including on days when I don’t take caffeine), the difference between averages is 8 ms. However, the difference in performance without caffeine on caffeine days vs. no-caffeine days is not statistically significant (difference = 3 ms, p = 0.5). I performed slightly worse on caffeine days, which is the opposite of what I’d predict—subjectively, I feel more energetic on caffeine days even when I haven’t taken caffeine yet. ↩
A regression over just the calibration phase gives a slope of –5.50 ms/hour (p = 0.01337 (nice)). ↩
That means a dose of coffee is worth 2 hours of sleep in terms of its immediate effect on reaction time. ↩
I tried regressing reaction time against time-in-bed the previous two nights, but the second night did not add any predictive power.

I also looked at time spent asleep according to my sleep tracking app, which I suspected wouldn’t work as well because I’ve noticed it’s pretty bad at identifying when I’m asleep. And indeed, regressing reaction time against time “asleep” gave a similar slope as regressing against time-in-bed, but with a worse p-value. ↩
I converted slope into retention as follows:
1. multiply slope by the number of days in the phase to get the total reaction time change
2. divide by the baseline benefit of caffeine (13 ms) to get the degree of habituation (0 = no habituation, 1 = full habituation, -1 = reverse habituation i.e. caffeine got more effective)
3. subtract from 1 to get retention (retention is basically the inverse of habituation)
Unlike in my caffeine literature review, I treated the baseline benefit as a fixed parameter instead of a distribution because that makes the math easier (but the real reason is that I wrote this part of the code before I wrote the code for the literature review). ↩ ↩²
Perhaps I should have re-run the trial, but I was following a strict rule not to re-run trials under any circumstances, to make sure I had no wiggle room to bias the results. ↩
The displayed graph shows no-caffeine trials. Controlling for sleep on caffeine trials has a similar effect, flattening the slope from –0.47 to –0.41. ↩
C. J. Meliska, R. E. Landrum & T. A. Landrum (1990). Tolerance and sensitization to chronic and subchronic oral caffeine: effects on wheelrunning in rats.

Omar Cauli, Annalisa Pinna, Valentina Valentini & Micaela Morelli (2003). Subchronic Caffeine Exposure Induces Sensitization to Caffeine and Cross-Sensitization to Amphetamine Ipsilateral Turning Behavior Independent from Dopamine Release.

N. Simola, E. Tronci, A. Pinna & M. Morelli (2006). Subchronic-intermittent caffeine amplifies the motor effects of amphetamine in rats. ↩
Omar Cauli & Micaela Morelli (2002). Subchronic caffeine administration sensitizes rats to the motor-activating effects of dopamine D(1) and D(2) receptor agonists. ↩

Philosophical Multicore

Caffeine Cycling Self-Experiment

Contents