Bentham's Blunder (Part 3)

A response to "Moral realism is true"

Aug 19, 2024

This is the third post in a series of posts responding to Bentham’s Bulldog’s post, Moral realism is true. Here are the previous posts in this entry:

Part 1

Part 2

Since I’d already addressed most of the remarks made by BB previously, most of this post is going to be fairly derivative, simply quoting previous remarks I’ve made, interspersed with commentary in how those remarks address BB’s post. This will not be true of later posts in this series.

7.0 Most philosophers are realists

Next, BB says:

Additionally, given that most philosophers are moral realists, we have good reason to find it the more intuitively plausible view. If the consensus of people who have carefully studied an issue tends to support moral realism, this gives us good reason to think that moral realism is true.

No, it doesn’t. I’ve addressed this at length in a nine-part series called The PhilPapers Fallacy, where I critique the notion that the fact that most analytic philosophers are moral realists is good evidence of moral realism. You can find that here, with the table of contents to the set of posts at the bottom. Here’s the synopsis, reproduced here:

People often appeal to the proportion of philosophers who endorse a particular view in the PhilPapers survey as evidence that a given philosophical position is true. Such appeals are often overused or misused in ways that are epistemically suspect, e.g., to end conversations or imply that if you reject the majority view on the matter that you are much more likely to be mistaken, or that you’re arrogant for believing you’re correct but most experts aren’t.
That most respondents to the PhilPapers survey endorse a particular view is very weak evidence that the view is true. Almost everyone responding to the survey is an analytic philosopher, and the degree to which the convergence of their judgments provides strong evidence is contingent on, among other things, (a) the degree to which analytic philosophy confers the relevant kind of expertise and (b) the degree to which their judgments are independent of one another.
There is good reason to believe people trained in analytic philosophy represent an extremely narrow and highly unrepresentative subset of human thought, and there is little evidence that the judgments that develop as a result of studying analytic philosophy are reflective of how people from other populations, or people under different cultural, historical, and educational conditions, would think about the same issues (if they would think about those issues at all).
Since most philosophers responding to the 2020 PhilPapers survey come from WEIRD populations, most of them are psychological outliers with respect to most of the rest of humanity. Their idiosyncrasies are further reinforced by self-selection effects (those who pursue careers in philosophy are more similar to one another than two randomly selected members of the population they come from), a narrow education that focuses on a shared canon of predominantly WEIRD authors, and induction into an extremely insular academic subculture that serves to further reinforce the homogenization of the thinking of its members. As such, analytic philosophers are, psychologically speaking, outliers among outliers among outliers.
At present, there is little evidence or compelling theoretical basis for believing that human minds would converge on the same proportion of assent to particular philosophical issues as what we see in the 2020 PhilPapers survey results if they were surveyed under different counterfactual conditions.
There is also little evidence nor much in the way of a compelling case that analytic philosophy confers expertise at being correct about philosophical disputes. The presumption that the preponderance of analytic philosophers sharing the same view is evidence that the view is correct is predicated, at least in part, on the further presumption that the questions are legitimate and that mainstream analytic philosophical methods are a good way to resolve those questions. Both of these claims are subject to legitimate skepticism. Analytic philosophy is a subculture that inducts its members into an extremely idiosyncratic, narrow, and comparatively homogeneous way of thought that is utterly unlike how the rest of humanity thinks. It has little track record of success and little external corroborating evidence of its efficacy.
Critics are not, therefore, obliged to confer substantial evidential weight on the proportion of analytic philosophers who endorse a particular philosophical position. Resolving how much stock we should put in what most philosophers think rests, first and foremost, on resolution about the efficacy of their methods.

8.0 Responding to “What if the folk think differently”

In the next section, BB addresses the suggestion that most people may not be realists. BB begins by saying:

I’m supremely confident that if you asked the folk whether it would be typically wrong to torture infants for fun, even if no one thought it was, they’d tend to say yes.

I already addressed this above: this is a terrible question that’s ambiguous and not a good way to measure whether respondents are realists or not. If BB thinks otherwise, BB is welcome to do empirical research that establishes the validity of such a question as a diagnostic tool for evaluating whether nonphilosophers are realists.

BB then says:

Additionally, it turns out that The Folk Probably do Think What you Think They Think.

BB seems to think this article justifies his claim that moral realism is an intuitive, commonsense view. Yet this seems to be based almost entirely on taking the title of the paper literally. Titles of papers published in journals are often intentionally cute or provocative. Taking this one at face value is bizarre and a little embarrassing. This paper does not show that philosophers probably think what you think they think. How could it? What you think they think will depend on what you think they think. I think most people are not realists. Does that mean that they probably aren’t realists?

BB could say that if most philosophers think most nonphilosophers are realists, that this is what’s probably true: that it’s a statistical claim, i.e., that most nonphilosophers probably think what most philosophers think they think. And perhaps most philosophers think most people are moral realists. That seems plausible enough. So, does the paper establish this? That most nonphilosophers probably think what most philosophers think they think?

No, not really. Before considering this, note a few things:

First, this is only a single study. As Scott Alexander warns, one should beware the man of one study. It’s more than a little questionable for BB to trot out one study that purportedly supports his claims. I could provide at least a dozen that support my contentions, probably more than that, in large part because I myself conducted many of these studies.

Second, it at best provides extremely indirect evidence that most nonphilosophers are moral realists. Note that, in contrast, the data I’d appeal to directly addresses the question of whether the folk are moral realists (and suggests, I contend, that most of them aren’t).

Finally, there is the study itself. Does the study provide a good justification for thinking most people are moral realists, or that realism is intuitive or a commonsense view among most nonphilosophers? Not even close. Unfortunately, BB persisted in making this claim some time later, and was much more explicit about it. This occurred in a Facebook exchange on Joe Schmid’s wall, where BB stated that:

But philosophers are usually write [sic] about what the folk think.

This claim is beyond wrong. Right about what they think in what context? Without context? Are philosophers usually right to any arbitrary level of specificity? With respect to any claim about what the folk think? This remark displays BB’s abject ignorance when it comes to making clear, precise, and appropriately well-specified claims. Suppose I found that if you gave doctors a detailed case report, told them that the patient had one of two diagnoses, (1) or (2), and you found that doctors chose the correct diagnosis at significantly higher rates than chance, would you conclude that “doctors are usually correct in their diagnoses of illnesses”?

Absolutely not. Because most doctors aren’t diagnosing people under such highly narrow, specific, tailored conditions. If you wanted to show that doctors generally make accurate diagnoses, you’d need enough data to generalize from the results of your studies to the actual decisions doctors make in the scenarios you’re talking about. It is not enough to show that in some lab study you get some result consistent with a hypothesis that therefore you’ve definitively established some claim based on generalizing from your results to whatever claim you make outside of that study. It is extremely difficult to establish the external validity of a particular study’s results; you often need a ton of data, or to triangulate on such a conclusion by appealing to a broad body of mutually corroborating literature, or to ground your interpretation of your data in a solid and well-supported theory, or to provide extraneous evidence that you’ve validated your measures…or, ideally, all of the above. Citing a single study this far removed from the conclusion BB wants to reach doesn't even come close to making a strong case for BB’s claims.

The only thing the study BB cites achieves is showing that if you give a handful of philosophers a description of a handful of unrepresentative study designs with the response options and conditions available to them, that they can predict whether there’d be a significant difference and what direction that difference is in. This is not very robust information. It doesn’t tell much about the degree of the difference and, more importantly, as I address below, it doesn’t tell you why there is such a difference. This is not a good way to determine whether philosophers know what the folk think, and it especially doesn’t justify generalizing from the cases featured in these studies to claims made outside the context of these studies. I reinvent the wheel a lot, but even if I have my limits. I addressed these points in the Facebook exchange with BB mentioned above, so I’ll simply reproduce that comment here, in its entirety:

Matthew Adelstein Here’s one issue.
The authors canvas four studies. These studies are:
Knobe, J., & Fraser, B. (2008). Causal judgment and moral judgment: Two experiments. Moral psychology, 2, 441-447.
Knobe, J. (2003). Intentional action and side effects in ordinary language. Analysis, 63(3), 190-194.
Livengood, J., & Machery, E. (2007). The folk probably don't think what you think they think: Experiments on causation by absence. Midwest Studies in Philosophy, 31(1), 107-127.
Nichols, S., & Knobe, J. (2007). Moral responsibility and determinism: The cognitive science of folk intuitions. Nous, 41(4), 663-685.
The authors show that when you present the stimuli for these studies, that the surveyed population of philosophers that were asked about them were able to accurately predict the outcome of the studies most of the time. There are already a number of concerns with this framing, but I’ll set those aside for now to focus on some other issues.
(1) First, let’s look at who the participants in these studies were.
(i) Knobe & Fraser (2008): Two studies. The first was n=18 intro to philosophy students at UNC. I didn’t see a sample size for the second study, nor any other demographic info. It’s plausible they were all UNC students, but I’m trying to be quick here and didn’t look to see if this info is anywhere.
(ii) Knobe (2003): Study 1 and 2 consisted of 78 and 42 people in a Manhattan public park, respectively.
(iii) Livengood & Machery (2007): 95 students at the University of Pittsburgh
(iv) Nichols & Knobe (2007). All studies conducted with undergraduates at the University of Utah.
Taken together, these studies all reflect the attitudes and judgments of people responding to surveys in English in the United States. Most of the participants were college students. These studies were conducted in a particular cultural context: WEIRD societies. They were conducted on what were likely mostly WEIRD populations (though all of the studies did a bad job of providing significant demographic data), all of the studies had small samples, and most of the studies (3 of 4) were conducted on college students in particular.
WEIRD is an acronym that stands for “Western, Educated, Industrialized, Rich, and Democratic.” It was a term proposed to describe a clustering pattern of demographic traits characteristic populations that comprise the vast majority of research participants in psychology, and the vast majority of those conducting this research.
When it comes to making generalizations about how “the folk,” or people in general think, it is important to gather representative data. That is, you should sample from populations who are sufficiently representative of the population about which you wish to generalize that inferential statistics permits one to make judgments about that population based on the participants in one’s sample. If, for instance, if wanted to know whether most people in the United States were Taylor Swift fans, it would make no sense to survey attendees at a Taylor Swift concert, for the obvious reason that people attending the concert would be more likely to like Taylor Swift.
Why is this a problem for four studies that figure in Dunaway et al’s study? The problem is that all four studies were conducted in WEIRD populations. And WEIRD populations are psychological outliers. Along numerous measurable dimensions of human psychology, people from WEIRD populations tend to anchor one or the other end of the extreme of these distributions. Thus, not only are people from WEIRD populations often unrepresentative of how people in general think, they are often the *least* representative population available. They are, at a population level, psychological outliers with respect to most of the world’s population. The evidence for this is strong, and only continues to grow with time. And ALL FOUR of the studies reported here were conducted in WEIRD populations (for what it’s worth, they were probably also conducted primarily by people from WEIRD populations, which could influence the way questions were framed, how results were interpreted, and so on, introducing a whole slew of additional biases I’m not even addressing directly). As such, the original studies themselves have such low generalizability that they, themselves, don’t tell us about how “the folk” think. At best, they might tell us about how college students or people in public parks in Manhattan think, but it’s not at all clear that how people in these places think reflects how people everywhere think.
And if the original studies don’t even come close to telling us what “the folk,” think, how on earth is the ability for philosophers to accurately predict the results of these studies supposed to indicate that philosophers know what “the folk” think? The answer to this is very simple: it doesn’t. Even if we ignored every other methodological problem with these studies, the bottom line is that even under ideal conditions the findings reported in this study wouldn’t even come close to providing robust evidence of how nonphilosophers think about the issues in question. And there are many other methodological problems with these studies.
There are even bigger problems with generalizability when one focuses on the judgments of college students in particular. Indeed, in some cases, we have empirical evidence that people around the ages of those most likely to be undergraduates are disproportionately likely to be *unrepresentative* of people of other age groups. See, for instance, Beebe and Sackris’s (2016) data on this with respect to metaethical views, which shows that people around college age are less likely to give responses interpreted by researchers as "realist" responses and more likely to give "antirealist" responses.
In short, the studies themselves have such low generalizability that they don’t tell us what “the folk” think. At best, they might tell us what college students in the US or people in Manhattan parks think. And I do mean "at best": I doubt they are even successful at this modest goal. Yet what college students in the US or people in Mahattan [sic] parks think is unlikely to be representative of what most of the rest of the world thinks. As a result, the studies are not a good proxy for what “the folk” think.
Given this, even if philosophers could predict the outcomes of these studies, and even if those studies had valid measures, were correctly interpreted, and so on (all highly contestable claims in their own right), the findings don’t tell us what “the folk” think for one simple reason: the original studies themselves don’t tell us what the folk think.
Note that this alone is probably sufficient to undermine any strong claims about these findings. And yet there are still more problems with these studies. If anything, that the original study isn't a good indication of what Matthew seems to think it indicates is likely overdetermined by a variety of additional considerations, subsets of which would likely to be independently sufficient to severely limit what the study tells us.
References
Beebe, J. R., & Sackris, D. (2016). Moral objectivism across the lifespan. Philosophical Psychology, 29(6), 912-929.
Henrich, J., Heine, S. J., & Norenzayan, A. (2010). The weirdest people in the world?. Behavioral and brain sciences, 33(2-3), 61-83.

In short, there is little reason to believe the studies themselves were representative of how “people” think, so even if philosophers could accurately predict the results of these studies, those studies lack the generalizability to tell us about how people in general think. Compound this with the fact that the four studies the authors examined are not representative of research on how nonphilosophers think, and you basically have a lack of generalizability squared. That is, you have an unrepresentative sampling of studies that are themselves unrepresentative. This alone is sufficient to cast serious doubt on BB’s claim that philosophers generally know how nonphilosophers think.

There are yet more problems, however. In follow-up comments, I raised the following concerns:

Joe Schmid & Matthew Adelstein See, I told you this would take a bit of work! Unfortunately, raising objections to a claim often requires more work and words than making the claim in the first place, whether that claim ultimately turns out to be correct or not. Note that even these considerations are truncated when it comes to raising problems for conducting studies on student and WEIRD populations.
And I didn't even get into problems with researcher bias (which empirical studies suggest does influence Xphi studies), low statistical power, problems of interpreting the results of the studies, the severe problem of stimulus sampling with these studies (see 1 below), the fact that 3 of the 4 studies include Knobe on the research project, which even further limits the representativeness of the studies, the fact that most of the studies are about or are adjacent to moral/normative considerations, which makes them unrepresentative of Xphi in general, the fact that the researchers selected studies based on whether the authors of those studies claimed the findings were “surprising”: this is a strange standard to choose, since the most important factor in having a paper accepted for publication in psychology is the novelty of the findings, and there are very strong norms in place for people to report that their findings are “surprising” or to use terms to indicate that one’s findings are novel, interesting, and ultimately worth publishing.
That is, we have good reasons to think people would call their findings “surprising” regardless of how surprising they were, because there are massive incentives in place to do so. And, at any rate, perhaps we should infer that Knobe is not a good judge of which findings are surprising before we leap to the conclusion that philosophers are really good inferring how nonphilosophers think. That seems like a far more parsimonious account of this particular set of four studies, given that Knobe was an author on three of them.
(1) Stimulus sampling: Here’s a general problem in a lot of research. Researchers will use a particular set of stimuli, such as a set of four questions, or four examples of some putative domain, and then generalize from how participants respond to that stimuli to how people think about the domain as a whole.
Suppose, for instance, I wanted to evaluate whether people “like fruit.” I need to choose four fruit to ask them about. You can imagine conducting two different studies:
(a) Ask about apples, bananas, oranges, and grapes
(b) Ask about durian, papaya, figs, cranberries
We ask people to rate how much they like each fruit on a scale (1 = hate it, 5 = love it), then average across the four fruits to get a mean fruit preference score. Would you expect the same results if we ran these two studies? I wouldn’t. And would you expect either study to tell us what people think about fruit in general? Again, probably not. Why? Because there’s no good reason to think set (a) or (b) is representative of “fruit” as a domain. First, of course some populations will vary in whether they prefer the fruits in (a) or (b) more, but setting aside this concern, suppose [we] wanted to know just about fruit preferences in the United States. If so, (a) is going to win by a landslide. Yet neither (a) nor (b) would tell us about “fruit” as a domain. This is because (a) and (b) are unsystematic and nonrandomly selected: they don’t *represent* fruit as a domain, but instead reflect very popular and much less popular fruit (in the US), respectively.
When researchers run studies, if they want those studies to generalize to all people, they already face the steep challenge that their *participants* are typically not representative of people in general. Yet another huge problem which is almost totally ignored turns on considerations like those outlined above regarding fruit: researchers often want to genrealize [sic] from their *stimuli* to some broader category of phenomena. Yet, whereas they recognize and model the participants in their studies as a random factor, they almost never bother to model their stimuli as a random factor. In effect, they treat their stimuli as though it is perfectly representative of the domain about which it is intended to quantify over, even though (a) this is almost certainly not true and (b) even though there are statistical methods available for avoiding this presumption. Of course, the problem with (b) is that it’s harder to do, and will often result in your findings being far less impressive. Who is going to put in the work to produce less impressive results? Not anyone who wants to win the competition of getting more publications.
It would be absurd for me to go into much more detail than that here, so I’ll direct you to a blog post and an article which develop on this problem in greater length.
https://www.r-bloggers.com/.../the-stimuli-as-a-fixed.../
https://psycnet.apa.org/doiLanding?doi=10.1037%2Fxge0000014
Judd, C. M., Westfall, J., & Kenny, D. A. (2012). Treating stimuli as a random factor in social psychology: a new and comprehensive solution to a pervasive but largely ignored problem. Journal of personality and social psychology, 103(1), 54.
Why does all of this matter? For a very simple reason: The Dunaway et al. (2013) paper involves an analysis of four particular Xphi studies. Yet there is no good evidence, nor any good reason, to think that these studies are a representative sampling of Xphi studies in general. As such, not only does the study suffer from ridiculouslly [sic] low generalizability with respect to the participants in the study, there’s no good reason to think the studies themselves are representative of Xphi studies.
This results in a double-whammy of super low generalizability. Note, too, that (a) the criterion for study selection was explicitly nonrandom, (b) no principled methods were employed for randomly selecting representative studies, (c) Knobe is sole, first, and second author on three of the studies, further suggesting that we’re not dealing with a representative sample of Xphi studies so much as, at best, a representative sample of studies conducted by Josh Knobe (and even that’s questionable), further limiting generalizability, and (d) the studies are all on very closely related subjects: three are explicitly about causal judgments (and the other is about attributions of intentional action), and three are about moral/normative evaluation (including the one not included in the first category). This means that the studies cover an extremely narrow range of the subject matter of folk philosophical thought and of Xphi research in general. As such, we’re dealing with an extremely narrow slice of research: It’s mostly early Xphi research on causality and moral judgment conducted by Knobe and colleagues. And we’re supposed to conclude that the ability to predict the outcome of *those* studies years after they were published and percolated through the academy are a good indication that philosophers know how nonphilosophers think?
It’s incredibly difficult for psychologists to figure out how people think even with high-powered, representative, carefully-designed studies conducted by experts with years of experience. Researchers face methodological problems piled on top of one another. See here for example:
Yarkoni, T. (2022). The generalizability crisis. Behavioral and Brain Sciences, 45, e1.
…Yet we’re supposed to believe that philosophers can infer how nonphilosophers think based on their ability to predict the outcome of a tiny, unrepresentative handful of small studies?

Followed by this comment:

Joe Schmid & Matthew Adelstein Oh, and I *also* didn't add that there's a paper criticizing the results of the Dunaway et al. (2013) paper that takes a different angle than mine, further piling on the problems with this study:
https://www.tandfonline.com/.../10.../09515089.2016.1194971
Abstract: "Some philosophers have criticized experimental philosophy for being superfluous. Jackson (1998) implies that experimental philosophy studies are unnecessary. More recently, Dunaway, Edmunds, and Manley (2013) empirically demonstrate that experimental studies do not deliver surprising results, which is a pro tanto reason for foregoing conducting such studies. This paper gives theoretical and empirical considerations against the superfluity criticism. The questions concerning the surprisingness of experimental philosophy studies have not been properly disambiguated, and their metaphilosophical significance have not been properly assessed. Once the most relevant question is identified, a re-analysis of Dunaway and colleagues’ data actually undermines the superfluity criticism."
Liao, S. Y. (2016). Are philosophers good intuition predictors?. Philosophical Psychology, 29(7), 1004-1014.

One of the most important points to stress is that, even if you can predict the results of a study, that does not necessarily tell you how the respondents to that study think. Such a claim relies on the presumption that you’ve interpreted the results of the study correctly, such that the observed response patterns in your data are the result of valid measures and that you’ve interpreted them correctly. Correctly predicting, for instance, that most people would choose a “realist response” over an “antirealist response” does not entail that most of the respondents are realists, since this would only follow if choosing what was operationalized as a realist response actually indicates that the respondent is a realist. This was a point stressed by Joe Schmid, in his response to BB:

Matthew, Lance already hit most of the nails on the head. My three main problems, which largely reiterate Lance’s, are:
(1) We can only conclude that the philosophers surveyed are good at predicting some folk responses to certain survey questions; this doesn’t mean they’re good at predicting folk views or intuitions. Survey questions are very faint indicators of folk views and intuitions — they’re hugely liable to unintended interpretations, ambiguities, spontaneous theorizing, and other confounds.
(2) The results are not justifiably generalized to philosophers’ abilities to predict folk views more generally. From the fact that philosophers can predict some folk’s responses to some survey questions from 4 papers [with relatively small sample sizes], it is incredibly hasty to conclude that they’re good at predicting folk survey responses more generally on the dozens or hundreds of surveys that have and might be conducted — let alone folk views on the thousands of philosophical questions more generally (as opposed to responses to survey questions), and let alone folk intuitions.
(3) The surveys are done on WEIRD populations; even if those survey responses were good indicators of the views and intuitions of the survey respondents, and even if philosophers are generally good at predicting these responses, that doesn’t mean philosophers are generally good at predicting the views and intuitions of ‘the folk’; we need some reason to think the WEIRD survey populations are representative of ‘the folk’ more generally, including non-WEIRD populations.

Point (1) is critical: predicting the results of studies does not entail that you can predict what people think.

Finally, we now have a decade of research showing that seemingly reasonable attempts to address exactly this question have historically failed because participants do not interpret questions about metaethics as researchers intend. I cannot stress enough that I am not basing this on a superficial evaluation of available data, but because I have personally spent the last decade specifically specializing in this exact question. I quite literally specialize in the methods used to determine whether nonphilosophers are moral realists. It is ludicrously difficult to devise valid measures. Even if the studies used in Dunaway et al. relied on valid measures, and that predicting the results of those studies entailed predicting what people think, this won’t necessarily generalize to the specific case of metaethics.

It is one thing to say that a given body of data has poor generalizability: that you aren’t justified in generalizing from a given body of data to draw inferences about some larger population in the absence of data to the contrary. But this is a case where we actually have quite a lot of evidence to the contrary. More importantly, those studies which have done the most to distance themselves from the methodological shortcomings of earlier studies tend to find fairly high rates of antirealist responses from participants. I talk about this at length on this blog, on my channel, and in my research. Simply put, there is no good empirical case that most people are moral realists, that most people find realism intuitive, or that moral realism is a “commonsense” view widely held by nonphilosophers. Given the current state of available evidence, BB just isn’t justified in suggesting otherwise. I don’t want to rehash my case against widespread moral realism here, so I’ll direct you to some of my other posts on the matter:

Unfortunately, despite specifically stating that he’d be interested in hearing what my objections were, BB never responded to Joe and my responses. I like to flatter myself that this is because we made a case too strong for BB to rebut, but threads always stop somewhere and I sometimes forget to reply or lose track of people’s responses.

In short: the paper by Dunaway and colleagues paper doesn’t even come close to justifying the claim that realism is a commonsense view.

Lance Independent

Discussion about this post