How Much Does It Cost to Offset an LLM Subscription?

Is moral offsetting a good idea? Is it ethical to spend money on something harmful, and then donate to a charity that works to counteract those harms?

I’m not going to answer that question. Instead I’m going to ask a different question: if you use an LLM, how much do you have to donate to AI safety to offset the harm of using an LLM?

I can’t give a definitive answer, of course. But I can make an educated guess, and my educated guess is that for every $1 spent on an LLM subscription, you need to donate $0.87 to AI safety charities.

Continue reading
Posted on

AI Safety Landscape and Strategic Gaps

I wrote a report giving a high-level review of what work people are doing in AI safety. The report specifically focused on two areas: AI policy/advocacy and non-human welfare (including animals and digital minds).

You can read the report below. I was commissioned to write it by Rethink Priorities, but beliefs are my own.

Continue reading
Posted on

Can You Maintain Lean Mass in a Calorie Deficit?

If you’re losing weight, does lifting weights reduce how much muscle you lose? Is it possible to entirely prevent muscle loss (or even gain muscle)?

Murphy & Koehler (2021)1 did a meta-analysis on this question. They collected experiments where the experimental groups did resistance training while eating at an energy deficit (RT+ED), and the control groups did resistance training while eating a normal amount of food (RT+CON).

They found a strong association between change in lean mass and the magnitude of the energy deficit (slope = –0.325, p = 0.001). The meta-analysis predicts that you can eat at a deficit of 500 calories per day without losing any lean mass, but you will lose mass at a larger deficit.

(The meta-analysis also reported that participants gained strength in almost every study, even with larger calorie deficits. That’s useful to know, but I will focus on lean mass for this post.)

I should mention that what we actually care about is muscle loss, not lean mass loss. Lean mass includes anything that isn’t fat—muscle fibers, organs, glycogen, etc. Muscle mass is harder to measure. We don’t know what happened to study participants’ muscle, only their total lean mass.

Let’s set that aside and assume lean mass is a useful proxy for muscle mass.

The authors showed a plot of every individual study’s experimental group (RT+ED) and control group (RT+CON), along with a regression line predicting lean mass change as a function of energy deficit:2

But…does this regression line look a little odd to you?

Continue reading
Posted on

Do Protests Work? A Critical Review

James Özden and Sam Glover at Social Change Lab wrote a literature review on protest outcomes1 as part of a broader investigation2 on protest effectiveness. The report covers multiple lines of evidence and addresses many relevant questions, but does not say much about the methodological quality of the research. So that’s what I’m going to do today.

I reviewed the evidence on protest outcomes, focusing only on the highest-quality research, to answer two questions:

  1. Do protests work?
  2. Are Social Change Lab’s conclusions consistent with the highest-quality evidence?

Here’s what I found:

Do protests work? Highly likely (credence: 90%) in certain contexts, although it’s unclear how well the results generalize. [More]

Are Social Change Lab’s conclusions consistent with the highest-quality evidence? Yes—the report’s core claims are well-supported, although it overstates the strength of some of the evidence. [More]

Cross-posted to the Effective Altruism Forum.

Continue reading
Posted on

I Was Probably Wrong About HIIT and VO2max

This research piece is not as rigorous or polished as usual. I wrote it quickly in a stream-of-consciousness style, which means it’s more reflective of my actual reasoning process.

My understanding of HIIT (high-intensity interval training) as of a week ago:

  1. VO2max is the best fitness indicator for predicting health and longevity.
  2. HIIT, especially long-duration intervals (4+ minutes), is the best way to improve VO2max.
  3. Intervals should be done at the maximum sustainable intensity.

I now believe those are all probably wrong.

Continue reading
Posted on

Retroactive If-Then Commitments

An if-then commitment is a framework for responding to AI risk: “If an AI model has capability X, then AI development/deployment must be halted until mitigations Y are put in place.”

As an extension of this approach, we should consider retroactive if-then commitments. We should behave as if we wrote if-then commitments a few years ago, and we should commit to implementing whatever mitigations we would have committed to back then.

Imagine how an if-then commitment might have been written in 2020:

Pause AI development and figure out mitigations if:

Well, AI models have now done or nearly-done all of those things.

We don’t know what mitigations are appropriate, so AI companies should pause development until (at a minimum) AI safety researchers agree on what mitigations are warranted, and those mitigations are then fully implemented.

(You could argue about whether AI really hit those capability milestones, but that doesn’t particularly matter. You need to pause and/or restrict development of an AI system when it looks potentially dangerous, not definitely dangerous.)

Posted on

Notes

  1. Okay, technically it did not score well enough to qualify, but it scored well enough that there was some ambiguity about whether it qualified, which is only a little bit less concerning. 

Posted on

Charity Cost-Effectiveness Really Does Follow a Power Law

Conventional wisdom says charity cost-effectiveness obeys a power law. To my knowledge, this hypothesis has never been properly tested.1 So I tested it and it turns out to be true.

(Maybe. Cost-effectiveness might also be log-normally distributed.)

  • Cost-effectiveness estimates for global health interventions (from DCP3) fit a power law (a.k.a. Pareto distribution) with \(\alpha = 1.11\). [More]
  • Simulations indicate that the true underlying distribution has a thinner tail than the empirically observed distribution. [More]
Continue reading
Posted on

Where I Am Donating in 2024

Summary

Last updated 2025-04-25.

It’s been a while since I last put serious thought into where to donate. Well I’m putting thought into it this year and I’m changing my mind on some things.

I now put more priority on existential risk (especially AI risk), and less on animal welfare and global priorities research. I believe I previously gave too little consideration to x-risk for emotional reasons, and I’ve managed to reason myself out of those emotions.

Within x-risk:

  • AI is the most important source of risk.
  • There is a disturbingly high probability that alignment research won’t solve alignment by the time superintelligent AI arrives. Policy work seems more promising.
  • Specifically, I am most optimistic about policy advocacy for government regulation to pause/slow down AI development.

In the rest of this post, I will explain:

  1. Why I prioritize x-risk over animal-focused longtermist work and global priorities research.
  2. Why I prioritize AI policy over AI alignment research.
  3. My beliefs about what kinds of policy work are best.

Then I provide a list of organizations working on AI policy and my evaluation of each of them, and where I plan to donate.

Cross-posted to the Effective Altruism Forum.

Continue reading
Posted on

Outlive: A Critical Review

Last updated 2025-07-04.

Outlive: The Science & Art of Longevity by Peter Attia (with Bill Gifford1) gives Attia’s prescription on how to live longer and stay healthy into old age. In this post, I critically review some of the book’s scientific claims that stood out to me.

This is not a comprehensive review. I didn’t review assertions that I was pretty sure were true (ex: VO2 max improves longevity), or that were hard for me to evaluate (ex: the mechanics of how LDL cholesterol functions in the body), or that I didn’t care about (ex: sleep deprivation impairs one’s ability to identify facial expressions).

First, some general notes:

  • I have no expertise on any of the subjects in this post. I evaluated claims by doing shallow readings of relevant scientific literature, especially meta-analyses.
  • There is a spectrum between two ways of being wrong: “pop science book pushes a flashy attention-grabbing thesis with little regard for truth” to “careful truth-seeking author isn’t infallible”. Outlive makes it 75% of the way to the latter.
  • If I wrote a book that covered this many entirely different scientific fields, I would get a lot more things wrong than Outlive did. (I probably get a lot of things wrong in this post.)
  • When making my assessments, I give numeric credences and also use terms such as “true” and “likely true”. The numbers give my all-things-considered subjective credences, and the qualitative terms give my interpretation of the strength of the empirical evidence. For example, if the scientific evidence suggests that a claim is 75% likely and I understand the evidence well, then I rate the claim as “likely true”. If I only read the abstract of a single meta-analysis, and the abstract unequivocally supports the claim but I’m only 75% sure that the meta-analysis can be trusted, then I rate it as “true”. Both claims receive a 75% credence.

Now let’s have a look at some claims from Outlive, broken down into four categories: disease, exercise, nutrition, and sleep.

Continue reading
Posted on

Protein Quality (DIAAS) Calculator

Update 2025-01-17: I discovered another protein quality calculator that’s much more comprehensive than mine: https://www.diaas-calculator.com/

You may know that complete proteins are good because they contain every essential amino acid. But you might not know that that’s not the full story.

Take wheat. Wheat is a complete protein—it contains all nine essential amino acids. But it has a problem. Wheat only contains 27mg of lysine (an essential amino acid) per gram of protein, whereas the Food and Agriculture Organization recommends 48mg of lysine per gram. To make full use of a gram of protein, your body needs to get those 48mg. It doesn’t matter that wheat has lots of other essential amino acids. Once your body uses up all the lysine, it can’t make good use of the other amino acids in wheat protein.

You can evaluate the protein quality of a food using the Digestible Indispensable Amino Acid Score (DIAAS). This score determines the quality of a source of protein based on which essential amino acid will run out first, adjusted for digestibility. A score of 100 means the protein has plenty of every essential amino acid.

Sometimes you can improve the protein quality of your food by mixing different ingredients. Wheat has a DIAAS of 57 because it only has 57% as much lysine per gram as your body needs. Peas have a score of 82 because they don’t have enough methionine + cysteine. But peas have 131% of the lysine requirement, and wheat has 149% of methionine + cysteine, so mix them together and they cover for each other’s weaknesses. A 50/50 mixture of wheat and pea protein has a DIAAS of 94.

With this calculator, you can determine the DIAAS for mixtures of different protein sources.

Continue reading
Posted on

← Newer Page 1 of 9