We won't solve post-alignment problems by doing research

Introduction

Even if we solve the AI alignment problem, we still face post-alignment problems, which are all the other existential problems1 that AI may bring.

People have written research agendas on various imposing problems that we are nowhere close to solving, and that we may need to solve before developing ASI. An incomplete list of topics: misuse; animal-inclusive AI; AI welfare; S-risks from conflict; gradual disempowerment; permanent mass unemployment; risks from malevolent actors; moral error.

The standard answer to these problems, the one that most research agendas take for granted, is “do research”. Specifically, do research in the conventional way where you create a research agenda, explore some research questions, and fund other people to work on those questions.

If transformative AI arrives within the next decade, then we won’t solve post-alignment problems by doing research on how to solve them.

Continue reading
Posted on

Do Disruptive or Violent Protests Work?

Previously, I reviewed the five strongest studies on protest outcomes and concluded that peaceful protests probably work (credence: 90%).

But what about disruptive or violent protests?

Peaceful protests use nonviolent, non-disruptive tactics such as picketing and marches.

Disruptive protests use nonviolent, in-your-face tactics such as civil disobedience, sit-ins, and blocking roads.

Violent protests use violence.

There isn’t much evidence on the other two categories of protest. My best guesses are:

  • Violent protests probably don’t work. (credence: 80%)
  • Violent protests may reduce support for a cause, but it’s unclear. (credence: 40%)
  • For disruptive protests, it’s hard to say whether they have a positive or negative impact on balance. I’m about evenly split on whether a randomly-chosen disruptive protest is net helpful, neutral, or harmful.
  • A typical disruptive protest doesn’t work as well a typical peaceful protest. (credence: 80%)
  • Peaceful protests are a better idea than disruptive protests. (credence: 90%)
Continue reading
Posted on

Epistemic Spot Check: Expected Value of Donating to Alex Bores's Congressional Campaign

Political advocacy is an important lever for reducing existential risk. One way to make political change happen is to support candidates for Congress.

In October, Eric Neyman wrote Consider donating to Alex Bores, author of the RAISE Act. He created a cost-effectiveness analysis to estimate how donations to Bores’s campaign change his probability of winning the election. It’s excellent that he did that—it’s exactly the sort of thing that we need people to be doing.

We also need more people to check other people’s cost-effectiveness estimates. To that end, in this post I will check Eric’s work.

I’m not going to talk about who Alex Bores is, why you might want to donate to his campaign, or who might not want to donate. For that, see Eric’s post.

Continue reading
Posted on

Writing Your Representatives: A Cost-Effective and Neglected Intervention

Is it a good use of time to call or write your representatives to advocate for issues you care about? I did some research, and my current (weakly-to-moderately-held) belief is that messaging campaigns are very cost-effective.

In this post:

Continue reading
Posted on

Do Small Protests Work?

TLDR: The available evidence is weak. It looks like small protests may be effective at garnering support among the general public. Policy-makers appear to be more sensitive to protest size, and it’s not clear whether small protests have a positive or negative effect on their perception.

Previously, I reviewed evidence from natural experiments and concluded that protests work (credence: 90%).

My biggest outstanding concern is that all the protests I reviewed were nationwide, whereas the causes I care most about (AI safety, animal welfare) can only put together small protests. Based on the evidence, I’m pretty confident that large protests work. But what about small ones?

I can see arguments in both directions.

On the one hand, people are scope insensitive. I’m pretty sure that a 20,000-person protest is much less than twice as impactful as a 10,000-person protest. And this principle may extend down to protests that only include 10–20 people.

On the other hand, a large protest and a small protest may send different messages. People might see a small protest and think, “Why aren’t there more people here? This cause must not be very important.” So even if large protests work, it’s conceivable that small protests could backfire.

What does the scientific literature say about which of those ideas is correct?

Continue reading
Posted on

My Third Caffeine Self-Experiment

Last year I did a caffeine cycling self-experiment and I determined that I don’t get habituated to caffeine when I drink coffee three days a week. I did a follow-up experiment where I upgraded to four days a week (Mon/Wed/Fri/Sat) and I found that I still don’t get habituated.

For my current weekly routine, I have caffeine on Monday, Wednesday, Friday, and Saturday. Subjectively, I often feel low-energy on Saturdays. Is that because the caffeine I took on Friday is having an aftereffect that makes me more tired on Saturday?

When I ran my second experiment, I took caffeine four days, including the three-day stretch of Wednesday-Thursday-Friday. I found that my performance on a reaction time test was comparable between Wednesday and Friday. If my reaction time stayed the same after taking caffeine three days in a row, that’s evidence that I didn’t develop a tolerance over the course of those three days.

But if three days isn’t long enough for me to develop a tolerance, why is it that lately I feel tired on Saturdays, after taking caffeine for only two days in a row? Was the result from my last experiment incorrect?

So I decided to do another experiment to get more data.

This time I did a new six-week self-experiment where I kept my current routine, but I tested my reaction time every day. I wanted to test two hypotheses:

  1. Is my post-caffeine reaction time worse on Saturday than on Mon/Wed/Fri?
  2. Is my reaction time worse on the morning after a caffeine day than on the morning after a caffeine-free day?

The first hypothesis tests whether I become habituated to caffeine, and the second hypothesis tests whether I experience withdrawal symptoms the following morning.

The answers I got were:

  1. No, there’s no detectable difference.
  2. No, there’s no detectable difference.

Therefore, in defiance of my subjective experience—but in agreement with my earlier experimental results—I do not become detectably habituated to caffeine on the second day.

However, it’s possible that caffeine habituation affects my fatigue even though it doesn’t affect my reaction time. So it’s hard to say for sure what’s going on without running more tests (which I may do at some point).

Continue reading
Posted on

How Much Does It Cost to Offset an LLM Subscription?

Is moral offsetting a good idea? Is it ethical to spend money on something harmful, and then donate to a charity that works to counteract those harms?

I’m not going to answer that question. Instead I’m going to ask a different question: if you use an LLM, how much do you have to donate to AI safety to offset the harm of using an LLM?

I can’t give a definitive answer, of course. But I can make an educated guess, and my educated guess is that for every $1 spent on an LLM subscription, you need to donate $0.87 to AI safety charities.

Continue reading
Posted on

AI Safety Landscape and Strategic Gaps

I wrote a report giving a high-level review of what work people are doing in AI safety. The report specifically focused on two areas: AI policy/advocacy and non-human welfare (including animals and digital minds).

You can read the report below. I was commissioned to write it by Rethink Priorities, but beliefs are my own.

Continue reading
Posted on

Can you maintain lean mass in a calorie deficit?

TLDR: A meta-analysis allegedly showed that a 500-calorie deficit is the sweet spot to avoid losing lean mass, but the interpretation of the data was wrong and actually it didn’t show that. When interpreted correctly, the data provides weak (insignificant) evidence that any deficit will result in a loss of lean mass.

If you’re losing weight, does lifting weights reduce how much muscle you lose? Is it possible to entirely prevent muscle loss (or even gain muscle)?

Murphy & Koehler (2021)1 did a meta-analysis on this question. They collected experiments where the experimental groups did resistance training while eating at an energy deficit (RT+ED), and the control groups did resistance training while eating a normal amount of food (RT+CON).

They found a strong association between change in lean mass and the magnitude of the energy deficit (slope = –0.325, p = 0.001). The meta-analysis predicts that you can eat at a deficit of 500 calories per day without losing any lean mass, but you will lose mass at a larger deficit.

(The meta-analysis also reported that participants gained strength in almost every study, even with larger calorie deficits. That’s useful to know, but I will focus on lean mass for this post.)

I should mention that what we actually care about is muscle loss, not lean mass loss. Lean mass includes anything that isn’t fat—muscle fibers, organs, glycogen, etc. Muscle mass is harder to measure. We don’t know what happened to study participants’ muscle, only their total lean mass.

Let’s set that aside and assume lean mass is a useful proxy for muscle mass.

The authors showed a plot of every individual study’s experimental group (RT+ED) and control group (RT+CON), along with a regression line predicting lean mass change as a function of energy deficit:2

But…does this regression line look a little odd to you?

Continue reading
Posted on

Page 2 of 11