Towards a Better Bayes Rule: Implicit Bias, Jan Karski, and Global Warming Denialism
Bayes rule has proven a rather fruitful way to model aspects of how brains compute information. But while it can capture many low-level predictive brain processes, it seems comes up short when trying to describe and capture conscious thought. In this post, I’ll discuss why I think so, as well as suggest two additions that seem like a more accurate description of how people (more or less) consciously reason.
Background: The Traditional Account
Bayes rule is a relatively brief mathematical recipe that can tell you how to incorporate new information with previously learned context. It comes in handy when modeling how brains do their thing. In a sentence, Bayes rule roughly says that the more you’ve learned that water is wet, the more evidence you’ll need to convince you otherwise.
The slightly longer story is that your prior beliefs about some specific aspect of reality can be captured via ‘degree of belief’ or what many would call a probability. Sometimes, though, you might learn about a piece of data or a fact that, if taken out of context, would lead to a DRASTICALLY DIFFERENT CONCLUSION than your prior belief would suggest. For example, the physical wavelengths of light reflected from an apple by the sun at noon are not the same as the physical wavelengths of light reflected by a fluorescent bulb at night. Why then do apples not look freakish sometimes? Because your brain is adept at combining the information from the apple with the information from the ambient light-source to do its own version of color correcting, technically known as color constancy. (PSA: Freakish, safe-to-eat apples can be tamed with a silver knife and crunchy peanut-butter. Works every time.) Scientists also use Bayes rule to describe what happens when you guess how long it might take for an out-of-context cake to bake, or how your brain learns different things.
However, when describing conscious thought, Bayes Rule implies that the only possible way to come to non-ideal conclusions is through not having the right kind of data in front of you, possibly repeatedly. In other words, Bayes will have you believe that though the data may be bad, you’re probably doing the internal updating process just fine. Furthermore, the Bayesian account of reasoning suggests that different people looking at the same data should be able to come to identical conclusions provided they can communicate and update their prior beliefs.
Clearly, these things don’t happen. Consider people our society deems to be worthy of august positions like being on the Supreme Court. These individuals train their reasoning, communication, self-articulation and writing abilities to a considerable degree, more than most of the people living on the planet. However, they frequently disagree about the interpretation of the same facts, and can’t convince each other. (I generally take this to be a sign of how young, societally speaking, we in fact are.)
In this light, Bayes rule is clearly not the end-all be-all of our reasoning process. While it may perfectly describe how beliefs are updated after they’ve been reasoned about, there is ground it misses. So, I propose that there are (at least) two things missing from this account when it comes to making conscious conclusions and decisions.
Missing thing #1: the internal weighing of the information’s source
One way we figure out how to weigh information is to use what we think about the source of the information. Consider how you might weigh information differently if you heard it was from the following sources: Fox news, NPR, one caffeinated George Lakoff, your best friend, each of your parents, and the various bosses you've had.
(We of course also evaluate the information against our associations and conceptual structures, part of which I’ll attempt to capture in the next section).
We both knowingly and unknowingly train ourselves for years to weigh and combine info based on what we think of the source.
What I’ve read suggests that we can weigh information both reflexively and deliberatively (cf 'Thinking, Fast and Slow'; Keith Stanovich's writings or Rational Choice in an Uncertain World). I think it likely that reflexively weighing information allows people to quickly write off potential new facts without bothering to waste all the energy to examine whether their prior beliefs need updating, as that can cost precious resources.
In fact, this is the best mechanistic account I can muster to describe implicit bias. Learning to reflexively write-off the worth of some “others” can a) save resources, and b) works enough for you not to die before you have kids. Gotta admit, those reasons seem pretty slim.
Theoretically, people might say the following things when performing this kind of reflexive discounting: “These scientific elites, they don’t know anything and are out of touch”; “Don’t listen to your car mechanic for life advice, what can they possibly know”; “Conservative people always missing the point, don’t trust ‘em.” If you've ever heard someone justify why they aren't hearing someone else in a way that wasn't convincing, perhaps they were reasoning towards what they reflexively concluded. (And because you have a different reflex, you can't follow them)
If we add a ‘weight’ parameter, our picture becomes more complete. For conscious reasoning, the story now goes that upon perceiving novel information:
a) a set of systems associates a weight with the piece of new information, inferred from the weight the person places on the source. (We probably learn how to weigh sources of information by bayesian updating too.)
b) this weighed information is then (somehow) combined with our prior beliefs to produce a potential shift in our conclusions.
The external “fact” (or likelihood term) can be objective, but then not one, but two internal processes collude to get in the way of coming to the right conclusion. By this account, we can come to less-than-ideal conclusions in three ways: not having the right information, mis-valuing the source of information, and having incorrect prior beliefs.
The fundamental attribution error suggests that its best to treat all sources of information as potentially equal, and shift the evaluation of how much weight to put into it away from the reflexive system and towards the deliberative system. Or to put it another way, if you're in the business of coming to correct conclusions, it's helpful to deliberatively abstract what you may think of the conclusion away from what you may think of the source. Learning to attach a lesser weight to a source of information is easier to inspect, control, tune if it's conscious, rather than preconsciously* reflexive.
*By "preconscious" I just mean "happens so fast you don't notice it." This threshold, while is may vary for people, can be measured in milliseconds. In no way do I mean anything related to psychoanalysis' imprecise pet metaphor.
In other words, one way of interpreting implicit bias is that it is symptomatic of relying on a reflexive source-of-information weighing scheme. If you have the reflex, you don’t know where else it might come up; though it’s likely to surface when others give you feedback (whether implicit or explicit) about your behavior. (Because feedback can be threatening, and said reflex can protect you from the threat.)
This extended Bayes Rule account also reveals an answer to an interesting, age old puzzle. Let’s say you’ve been insulted, and to deal with this, you spend time insulting the person who insulted you privately with a close friend. The “insult” has already been communicated; that information has already transferred from your insulter to you. So why should insulting them in private make you feel any better? I doubt it’s related to feelings of powerfulness. I suspect what happens is that insulting them changes the weight you’ve associated with them and thus what they communicated to you. If your friend agrees, you can have more confidence in your estimation. And lastly, if you grow up being (subtly) insulted, or (subtly) hurt, being able to reflexively re-weight what was just communicated to you will have a protective effect on your identity.
Missing Thing #2: Flexibility
Okay. Let’s say you have an externally verified likelihood, and you happen to avoid reflexively weighing the source of the information inappropriately. Will this prevent you from updating to the wrong conclusion? NO.
You then take the piece of source-weighed information and evaluate it against your experiences, associations, expectations, and conceptual structures. Either you decide the evidence is rubbish, or your own beliefs need updating.
Sounds like normal Bayes, right?
However, if you talk to anyone ever, and try to convince them to change their mind, you’ll notice that people are differentially “flexible.” Some people can turn around on a dime, others take much more evidence to be convinced. Some people even resist changing their minds regarding conclusions they aren’t that confident in.
(Side note: In general, if you are trying to come to accurate and novel conclusions, it’s most efficient to use the least amount of evidence to reach the ‘right’ conclusion. Dr. Todd Gureckis gave a talk on this once, but I can't find the associated paper at the moment.)
So the second thing I propose that’s missing from Bayes’ rule is what you might call “flexibility.” It’s the ‘stickiness’ of your prior beliefs. Theoretically, if exposed to the same information, two people could form the same priors, but if a year goes by and one person identifies with the information, they may becomes less flexible. Then everything else they learn will be less ‘correct’ (per an ideal bayesian reasoner) than their counterpart. Note that this is different than being defensively skeptical about everything that you see in order to not be wrong — that seems to have more to do with either not weighing anything too highly as it “comes in” or having low confidence in your beliefs in general.
Here’s the kicker: You can think that you are flexible and weighing things evenly, when in fact you aren’t. The research on motivated reasoning and identity protective cognition suggests that you’ll reason towards some sort of self-justification of whatever you are doing, and this can include thinking that you are flexible and weighing things evenly. Those same threads of research also suggest that people can become more inflexible about whatever they identify with (including the mental model they use to think about themselves). My reading suggests that people can be inflexible if they have acquired habits that once protected their identities when they grew up.
“Flexibility of priors” is a particular kind of openness, one that I suspect is correlated with trait openness, and will likely have something to do the curious result about binocular rivalry and openness in general -- people who score as 'more open' on a personality measure are more likely to a different stimulus to each eye as mixed or with a low switching rate than others.
Abstract quasi-mathematical attempts to capture the human reasoning process are all for nothing if they don’t provide a new analysis of concrete situations. I’ll provide two illustrations; the first that serves to illustrate the 'weight-of-source' term and the second which serves to illustrate the ‘flexibility’ term.
Here’s another example. Let’s say you want to dig a well in your backyard. You have two friends conduct two sets of tests over every 5ft by 5ft patch . Your first friend uses her home-made ground penetrating radar system. The second uses a dowsing rod. You’ve given each a map with a grid, and after two hours they come back with the degree to which their test indicated the presence of water over each square patch. Prior to these tests, you had absolutely no reason to think any water was anywhere. Then you take a look at both maps. You think the dowsing rod map to be less reliable than the home-made ground penetrating radar, but because neither are perfect, you figure you’ll combine the two in a 1:10 ratio. Again, it’s only after you’ve initially perceived the relevant information that you figure out how to combine them.
(This example presupposes that you have perfect flexibility over all of the ground patches, that you didn’t grow up with any ancestral beliefs about how the patches of land near the oak tree were sacred and could never have water beneath them, etc.)
Let’s take the historical example of Jan Karski. Karski was an officer in the underground polish resistance army during WWII. Though he wasn’t Jewish, he twice smuggled himself into the Warsaw Ghetto and posed as a guard at a Bełżec transit camp. He then made it out to Britan and the United States where he was able to report what he saw with his own eyes to various senior politicians and officials. At one point, he gave his report to Supreme Court justice Felix Frankfurter, who was both jewish, and considered to be one of the smartest people in FDR’s government.
Let’s assume Felix Frankfurter was a) very sharp, and b) knew nothing but vague reports, hints and whispers about the fate of the polish Jews. Assuming this to be true, his prior beliefs may be characterized as a wide, or nearly flat distribution. He has every reason to trust Karski, who has been vetted several times, recommended by the polish ambassador, and who even briefed FDR. For the purposes of this illustration, let’s assume Frankfurter was, being a person who viewed himself as “a judge of mankind” relatively free from reflexive discounting when it came to polish males. What we don’t know is Frankfurter’s flexibility. How “willing” is he to change his mind? (I put “willing” in quotes because it doesn’t seem to be a 100% explicitly consciously-willed process.)
Actually, we do know Frankfurter’s flexibility on these topics. In The Karski Report, Karski recalls that Frankfurter did not believe him, and that he was unusually articulate about why. Or as Karski recalled, Frankfurter said “Young man, [a] man like me, with a man like you, [I] must be totally honest. And I’m telling you: I DO NOT believe you…. My mind, my heart, they are made in such a way that I cannot accept it.” (Note that transcribing these words utterly fails to capture the tone that was expressed. It’s something no writer can directly capture. You can see part of Karski’s testimony for yourself if you want.)
It seems like no stretch to extend this account to global warming denialism — people are coming to the objectively, verifiably wrong conclusions about what to do. With Karski, it wasn’t a matter of having “correct facts,” and the same is true about Global Warming. Instead, identity, flexibility, and habits of processing our experiences we acquired as we grew up get in the way.
The ideas in this post were presented at the 2017 California Cognitive Science Conference. This doesn’t mean that they are necessarily correct, only that I’m willing to revise them when I encounter new evidence. Clearly this model doesn’t capture “emotions” that well. And these extra parameters could be modeled with some sort of fancy hierarchical mixture model, but it seems like the model I’ve presented here would have to come first. Personally, I hope that this is all old news somewhere in neuroscience or neurology.