I recently wrote about values spreading, and came out weakly in favor of focusing on global catastrophic risks over values spreading. However, I neglected an important consideration in favor of values spreading: feedback loops.
When we try to take actions that will benefit the long-term future but where we don’t get immediate feedback on our actions, it’s easy to end up taking actions that do nothing to achieve our goals. For instance, it is surprisingly difficult to predict in advance how effective a social intervention will be. This gives reason to be skeptical about the effectiveness of interventions with long feedback loops.
Interventions on global catastrophic risks have really, really bad feedback loops. It’s nearly impossible to tell if anything we do reduces the risk of a global pandemic or unfriendly AI. An intervention focused on spreading good values is substantially easier to test. An organization like Animal Ethics can produce immediate, measurable changes in people’s values. Measuring these changes is difficult, and evidence for the effectiveness of advocacy is a lot weaker than the evidence for, say, insecticide-treated bednets to prevent malaria. But short-term values spreading still has an advantage over GCR reduction in that it’s measurable in principle.
Still, will measurable short-term changes in values result in sustainable long-term changes? That’s a harder question to answer. It certainly seems plausible that values shifts today will lead to shifts in the long term; but, as mentioned above, interventions that sound plausible frequently turn out not to work. Values spreading may not actually have a stronger case here than GCR reduction.
We can find feedback loops on GCR reduction that measure proxy variables. This is particularly easy in the case of climate change, where we can measure whether an intervention reduces greenhouse gas levels in the atmosphere. But we can also find feedback loops for something like AI safety research: we might say MIRI is more successful if it publishes more technical papers. This is not a particularly direct metric of whether MIRI is reducing AI risk, but it’s still a place where we can get quick feedback.
Given that short-term value shifts don’t necessarily predict long-term shifts, and that we can measure proxy variables for global catastrophic risk reduction, it’s non-obvious that values spreading has better feedback loops than GCR reduction. There does seem to be some sense in which value shifts today and value shifts in a thousand years are more strongly linked than, say, number of AI risk papers published and a reduction in AI risk; although this might just be because both involve value shifts–they may not actually be that strongly tied, or tied at all.
Values spreading appears to have the advantage of short-term feedback loops. But it’s not clear that these changes have long-term effects, and this claim isn’t any easier to test than the claim that GCR work today reduces global catastrophic risk.
Changing behavior of far-future humans matters more than alleviating immediate animal suffering.
Helping humans has better flow-through effects than helping non-human animals.
The analysis effectively concludes that helping humans is more important than helping non-human animals but I believe it misses a few important considerations.
(These are fairly quick thoughts about which I have a lot of uncertainty; I’m publishing them here for the sake of making the conversation public.)
Here are some research topics on cause prioritization that look important and neglected, in no particular order.
Look at historical examples of speculative causes (especially ones that were meant to affect the long-ish-term future) that succeeded or failed and examine why.
Try to determine how well picking winning companies translates to picking winning charities.
In line with 2, consider if there exist simple strategies analogous to value investing that can find good charities.
Find plausibly effective biosecurity charities.
Develop a rigorous model for comparing the value of existential risk reduction to values spreading.
Perform basic analyses of lots of EA-neglected or weird cause areas (e.g. depression, argument mapping, increasing savings, personal productivity–see here) and identify which ones look most promising.
Reason about the expected value of the far future.
Investigate neglected x-risk and meta charities (FHI, CSER, GPP, etc.).
Reason about expected value estimates in general. How accurate are they? Do they tend to be overconfident? How overconfident? Do some things predictably make them more reliable?
We can divide theories about consciousness into three categories:
Consciousness is a special non-physical property (dualism).
Consciousness is the result of the physical structures of the brain (identity theory).
Conscious mental states are the result of their functional role within a process (functionalism).
In particular, I want to talk about Turing machine functionalism, a specific form of functionalism which states that consciousness is computation on a Turing machine. I want to talk about Turing machine functionalism in particular because it is probably correct.
There are a few cause areas that are plausibly highly effective, but as far as I know, no one is working on them. If there existed a charity working on one of these problems, I might consider donating to it.
Happy Animal Farm
The closest thing we can make to a hedonium shockwave with current technology is a farm of many small animals that are made as happy as possible. Presumably the animals are cared for by people who know a lot about their psychology and welfare and can make sure they’re happy. One plausible species choice is rats, because rats are small (and therefore easy to take care of and don’t consume a lot of resources), definitively sentient, and we have a reasonable idea of how to make them happy.
I am not aware of any public discussion on this subject, so I will perform a quick ad-hoc effectiveness estimate.
The cause selection blogging carnival is well under way, and we already have a few submissions. But before the blogging carnival began, some folks had already written some of their thoughts on cause selection. Here I’ve compiled a short list of links to a few such writings supporting a variety of cause areas. Maybe some of these will give you ideas or even convince you to change your mind.
Disclaimer: I haven’t researched or thought about this much, and a lot of what I’m saying is probably derivative or completely wrong. I just wanted to work through some of my thoughts.
What would happen if we implemented basic income guarantees tomorrow?
Assume we’re just talking about the United States here. Assume we don’t have any major technological advances between today and tomorrow, so we can’t automate every single person’s job. Let’s say that the income guarantee is enough to live off of—maybe $30,000.
What would people do? And would the economy continue to generate enough money to be able to pay for everyone’s income guarantee?
Change in Incentives
When people automatically get $30,000, this dramatically reduces their willingness to work. There are a lot of jobs that people only work because they desperately need a job, and they would really prefer not to. Once they get a basic income guarantee, demand for these jobs will drop dramatically. If the jobs are important, wages will increase until some people once again become willing to take those jobs.
Exactly how much people are willing to work depends on the tax rate. Let’s say we have a progressive taxation scheme which starts much higher than the current tax rate—maybe 50% at the lowest bracket and 90% at the highest (I’m just making up numbers here). That means if you make $30,000 a year for doing nothing and take a job that pays $30,000, now you’re making $45,000 after taxes. People have diminishing marginal utility of money, so people will be less willing to do this, but there should still be a lot of people who want to make more than the basic income and end up taking jobs.
Which jobs will they take?
When people have a basic income, that dramatically changes their incentives to work. In economic terms, supply of labor drops. Which jobs continue to be prominent depends on which jobs have high or low price elasticity of demand for labor.
To get more concrete, let’s think about two jobs: garbage collector and fast food burger flipper. Probably a small minority of the people in these jobs actually enjoy them; if these people suddenly had a guaranteed $30,000 a year, how would the market respond?
People really need garbage collectors, so they have a high willingness to pay for their salaries. Or, more precisely, they have a high willingness to accept higher taxes so that the government can employ garbage collectors. In all likelihood, not enough people will be willing to work as garbage collectors for their current salaries. Demand for garbage collectors is highly inelastic, so as supply of willing workers decreases, wages will increase by a lot. The increase in wages should be enough to incentivize people to continue working as garbage collectors.
The labor supply for burger flippers would similarly decrease. Fast food companies would have to raise salaries by a lot in order to get people to keep working for them, which means they would have to increase food prices. The increased food prices would decrease quantity demanded, and fast food companies would shrink (and possibly disappear entirely). I am probably okay with this.
The Broad Market
But since people have less need to work, they should become more willing to work intrinsically enjoyable goods, so we should see an increase in the supply of short films, music, and other similar goods. Interestingly, writing books seems to be so intrinsically enjoyable that the market’s already over-saturated even without a basic income guarantee—publishers get way more manuscripts than they can use.
There’s a spectrum between “everybody intrinsically enjoys this” and “nobody intrinsically enjoys this”, and every job lands somewhere on the spectrum. Even among jobs that most people don’t intrinsically enjoy, we will still see differences. A lot fewer people will work in factory farms, since I can’t imagine that anybody would actually want to do that. But we probably won’t see that big a reduction in the quantity of auto mechanics. A lot of people like working on cars—people often do it as a hobby. We’d expect these people to be willing to work as auto mechanics for only relatively little pay.
Software Development
I want to talk a little extra about software development since it’s my field. Generally speaking, a lot of programmers enjoy programming, but there are a lot of kinds that are more fun than others. We’d probably see more people starting their own companies and fewer people working software jobs that involve a lot of boring repetition.
This changes the incentives for companies hiring developers. Boring routine work becomes more expensive since fewer developers are willing to do it, so companies have stronger incentives to automate as much work as possible.
There probably won’t be a huge effect since developers tend to make well $30,000, so that extra money doesn’t do as much for them; the most affected jobs will be those that pay less than or about as much as the basic income.
Does It Work?
An economy with a basic income guarantee would reduce or remove unimportant jobs while still retaining important jobs. Prices would be higher and people probably wouldn’t buy as much, but the things they’d buy less of would mostly be the things that weren’t really important to begin with. People aren’t perfectly rational; a lot of purchases people make just keep them going on the hedonic treadmill and don’t actually improve their lives.
Perhaps a world with McDonald’s is better than one without, but if it is, it’s certainly not much better, and I wouldn’t feel too bad about it if McDonald’s went out of business after all the low-level employees quit.
Please explain in the comments why I’m wrong about everything. I think the economic effects of a basic income guarantee could be really interesting and possibly surprising, and I want to hear what you think.
Haskell has all these language features that seem cool, but then you wonder, what is this actually good for? When am I ever going to need lazy evaluation? What’s the point of currying?
As it turns out, these language constructs come in handy more often than you’d expect. This article gives some real-world examples of how Haskell’s weird features can actually help you write better programs.
Lazy Evaluation
Lazy evaluation means that Haskell will only evaluate an expression if its value becomes needed. This means you can do cool things like construct infinite lists.
To take a trivial example, suppose you want to write a function that finds the first n even numbers. You could implement this in a lot of different ways, but let’s look at one possible implementation (in Python):
Here we construct a list with 2n numbers and then take every even number from that list. Here’s how we could do the same in Haskell:
Instead of constructing a list with 2n elements, we construct an infinite list of even numbers and then take the first n.
Okay, so that’s pretty cool, but what’s the point? When am I ever going to use this in real life?
Why It’s Useful
I recently wrote a simple spam classifier in Python. To classify a text as spam or not-spam, it counts the number of blacklisted words in the text. If the number reaches some threshold, the text is classified as spam. 1
Before reading further, think for a minute about how you could implement this.
Originally, I wanted to write something like this.
filter the list for only blacklisted words
see if the length of the list reaches the threshold
Here’s the equivalent Python code:
This code is simple and concise. The problem is, it requires iterating through the entire list before returning, which wastes a huge amount of time. The text might contain tens of thousands of words, but could be identified as spam within the first hundred.
I ended up implementing it like this:
Instead of using higher-order functions like sum, this implementation manually iterates over the list, keeping track of the number of blacklisted words, and breaks out once the number reaches the threshold. It’s faster, but much uglier.
What if we could write our code using the first approach, but with the speed of the second approach? This is where lazy evalution comes in.
If our program is lazily evaluated, it can figure out when the count reaches the threshold and return immediately instead of waiting around to evaluate the whole list.
Here’s a Haskell implementation:
(For those unfamiliar with Haskell syntax, see note.2)
Unfortunately, this doesn’t quite work. If the condition is true for the first k elements of the list then it will also be true for the first k+1 elements, but Haskell has no way of knowing that. If you call classify on an infinite list, it will run forever.
We can get around this problem like so:
Note that the take operation takes the first k elements of the list and drops the rest. (If you call take k on a list with n elements where n < k, it will simply return the entire list.)
So this function will take the first threshold blacklisted words. If it runs through the entire list before finding threshold blacklisted words, it returns False. If it ever successfully finds threshold blacklisted words, it immediately stops and returns True.
Using lazy evalution, we can write a concise implementation of classify that runs as efficiently as our more verbose implementation above.
(If you want, it is also possible to do this in Python using generators.)
Partial Application
In Haskell, functions are automatically curried. This means you can call a function with some but not all of the arguments and it will return a partially-applied function.
This is easier to understand if we look at an example. Let’s take a look at some Haskell code:
add is a simple function that takes two arguments and returns their sum. You can call it by writing, for example, add 2 5 which would return 7.
You can also partially apply add. If you write add 2, instead of returning a value, it returns a function that takes a single argument and returns that number plus 2. In effect, add 2 returns a function that looks like this:
You could also think of it as taking the original add function and replacing all occurrences of x with 2.
Then we can pass in 5 to this new function:
In fact, in Haskell, (add 2) 5 is equivalent to add 2 5: it calls add 2, which returns a unary function, and then passes in 5 to that function.
A similar function could be constructed in Python like so:
Then you could call (add(2))(5) to get 7.
Why It’s Useful
To take a simple example, suppose you want to add 2 to every element in a list. You could map over the list using a lambda:
Or you could do this more concisely by partially applying the + function:
It might seem like this just saves you from typing a few characters once in a while, but this sort of pattern comes up all the time.
This summer I was working on a program that required merging a series of rankings. I had a list of maps where each map represented a ranking along a different dimension, and I needed to find the sum ranking for each key. I could have done it like this:
(Note: unionsWith takes the union of a list of maps by applying the given function to each map’s values.)
With partial application, we can instead write:
This new function uses partial appliation in two ways. First, it passes in + instead of creating a lambda.
Second, it partially applies unionsWith. This call to unionsWith gives a function that takes in a list of maps and returns the union of the maps.
Notice also how mergeRanks is not defined with any arguments. Because the call to unionsWith returns a function, we can simply assign mergeRanks to the value of that function.
Perhaps this example is a bit on the confusing side; I intentionally chose a complex example that has real-world value. Once you grok partial applications, they show up more often than you might think, and you can use them to perform some pretty sophisticated operations.
And I haven’t even mentioned function composition.
Here’s a more complicated usage of partial application combined with function composition that I wrote this summer. See if you can figure out what it does.
In one program of about 500 lines, I wrote about a dozen pieces of code similar to this one.
Pattern Matching
Pattern matching gives us a new way of writing functions. To take the canonical example, let’s look at the factorial function. Here’s a simple Python implementation.
And the same program written in Haskell:
But we could also write this using pattern matching.
Think of this as saying
the factorial of 0 is 1
the factorial of some number n is n * fac (n - 1)
So pattern matching is more declarative rather than imperative–a declarative program describes the way things are rather than what to do.
Why It’s Useful
Wait, isn’t this just a different way of writing the same thing? Sure, it’s interesting, but what can pattern matching do that if statements can’t?
Well, quite a lot, actually.3 Pattern matching makes it trivial to deconstruct a data structure into its component values. Haskell’s pattern matching intricately relates to how Haskell handles data types.
Suppose we want to implement the map function. Recall that map takes a function and a list and returns the list obtained by applying the function to each element of the list. So map (*2) [1,2,3] == [2,4,6]. (Notice how I used partial application there?)
You may wish to take a moment to consider how you would implement map.
Without using pattern matching, we could implement map like this:
But this is a bit clunky, and we can do a lot better by using pattern matching. Think about how to define map recursively:
The map of an empty list is just an empty list.
The map of a list is the updated head of the list plus the map of the tail of the list.
So much nicer!
This sort of design pattern comes in handy when you’re operating over data structures. To take a real-world example, I recently wrote a function that operated over an intersection of three values:
I could pass in the Intersection type and pattern matching made it easy to pull out the three values into the variables x, y, and z.
Haskell has a number of language features that appear strange to someone with an imperative-programming background. But not only do these language features allow the programmer to write more concise and elegant functions, they teach lessons that you can carry with you when you use more imperative programming languages.
Many modern languages partially or fully support some of these features; Python, for example, supports lazy evaluation with generator expressions, and it’s possible to implement pattern matching in Lisp. And I’m excited to see that Rust supports sophisticated pattern matching much like Haskell.
If you want to learn more about Haskell, check out Learn You a Haskell for Great Good! Or if you’ve already dipped your toes into the Haskell ocean and want to go for a dive, Real World Haskell can teach you how to use Haskell to build real programs.
P.S. This site is relatively new, so if you see a mistake, please leave a comment and I’ll try and fix it.
I realize this is a terrible way to implement a spam classifier. ↩
Note on Haskell Syntax
The $ operator groups expressions, so
length $ filter blacklisted $ words text
is equivalent to
length (filter blacklisted (words text))
The words function splits a string into a list of words. words
text is roughly equivalent to Python’s text.split(). ↩
Well, technically nothing, because every Turing-complete language is computationally equivalent. Anything that can be written in Python can also be written in assembly; that doesn’t mean you want to write everything in assembly. ↩
The : operator is a cons–given a value and a list, it prepends the value to the head of the list. ↩