On the limits of social science
Jim Manzi explores “What Social Science Does—and Doesn’t—Know“:
Unlike physics or biology, the social sciences have not demonstrated the capacity to produce a substantial body of useful, nonobvious, and reliable predictive rules about what they study—that is, human social behavior, including the impact of proposed government programs. The missing ingredient is controlled experimentation, which is what allows science positively to settle certain kinds of debates. How do we know that our physical theories concerning the wing are true? In the end, not because of equations on blackboards or compelling speeches by famous physicists but because airplanes stay up. Social scientists may make claims as fascinating and counterintuitive as the proposition that a heavy piece of machinery can fly, but these claims are frequently untested by experiment, which means that debates like [those about the influence of the stimulus on the United States’ economy] will never be settled.
A friend of mine, Amanda, pointed me to a similar, if not “more scathing” criticism of educational and psychological intervention research. The opening salvo by Levin, O’Donnell, and Kratochwill (2003) provides a “sobering account of exactly how far the credibility of educational research is perceived to have advanced in two generations”:
The problems that are faced in experimental design in the social sciences are quite unlike those of the physical sciences. Problems of experimental design have had to be solved in the actual conduct of social-sciences research; now their solutions have to be formalized more efficiently and taught more efficiently. Looking through issues of Review of Educational Research, one is struck time and again by the complete failures of the authors to recognize the simplest points about scientific evidence in a statistical field. The fact that 85% of National Merit Scholars are first-born is quoted as if it means something, without figures for the over-all population proportion in small families and over-all population proportion that is first-born. One cannot apply anything one learns from descriptive research to the construction of theories or to the improvement of education without having some causal data to with which to implement it (Scriven, 1960, p. 426).
Education research does not provide critical, trustworthy, policy-relevant information about problems of compelling interest to the education public. A recent report of the U.S. Government Accounting Office (GAO, 1997) offers a damning indictment of evaluation research. The report notes that over a 30-year period the nation has invested over $31 billion in Head Start and has served 15 million children. However, the very limited research base available does not permit one to offer compelling evidence that Head Start makes a lasting difference or to discount the view that it has conclusively established its value. There simply are too few high-quality studies available to provide sound policy direction for a hugely important national program. The GAO found only 22 studies out of hundreds conducted that met its standards, noting that many of those rejected failed the basic methodological requirements of establishing compatible comparison groups. No study using a nationally representative sample was found to exist (Stroufe, 1997, p. 27).
These articles remind me of a recent profile in the New Yorker of M.I.T. development economist Esther Duflo. As the co-founder of a poverty “action lab,” Duflo and her colleagues, sometimes referred to as “the randomistas,” are making waves in the social sciences for “borrowing from medicine a very robust and simple tool: they subject social-policy ideas to randomized control trials, as one would test a drug.”
Related: In 1923, Fred Boucke concluded that “social science is a philosophy of values as much as an analysis of specific magnitudes.” [Source]
Full citation of Levin, O’Donnell, and Kratcochwill:
Levin, J.R., O’Donnell, A. M., & Kratochwill, T. R. (2003). Educational/psychological intervention research. In I. B. Weiner (Series Ed.) & W. M. Reynolds & G. E. Miller. (Vol. Eds.), Handbook of psychology: Vol. 7. Educational psychology (pp. 557-581). Hoboken, NJ: Wiley.
UPDATE: When I shared Manzi’s article on my Facebook wall, a stimulating conversation ensued. Here are some excerpts from the online discussion…
MJR: Linguistics spends a giant chunk of its time predicting human social behaviours, in ways that are so “nonobvious” that it’s a pain to explain them to people. I wonder if it’s simply that people don’t expect to understand the hard sciences, but when they don’t grasp the social sciences they take it to mean that the content isn’t there in the first place.
AL: One wonders what class of disciplines the author believes to be encompassed by the term “social science”. It appears that he finds it to be synonymous with “political science”, and only the applied branch thereof at that. If psychology/cognitive science, linguistics, human biology, and other related fields are part of the social sciences, then the premise that experimentation has not led to a large body of non-obvious principles that predict human behavior is simply false.
AS: To [AL], I think it depends on what you mean by “non-obvious.” One problem I noticed in social sciences, even clinical medical research, is that the researchers often devise poor research design methods. They’ve taken maybe two or three classes in statistics at best and usually it’s only applied. Sometimes when construcint hypotheses, sorcial science research tends to do the “looking under the lamplight on a dark street for one’s missing keys bc it’s the only light we have.” Also, many studies done at universities use undergraduates as guinea pigs.
On the other hand, should we be looking for a large body of non-obvious principles at all when doing social science research? Can humans really be explained down to some natural principles or laws-is that an appropriate metaphor? I mean, not even classical Newtonian laws of mechanics hold up at the quantum level in the sense as we know it.
Jason, there was a DeCal that dealt with Duflo’s various methods for approaching randomized testing of economic policies. The thing about those methods is that they acknowledge that those findings are limited and not overly generalizable. Ted Miguel is at Berkeley still, I think, and he did a large study with Kramer on improving school attendance rates in Kenya. But more than a case study, in a generalizable sense, they further delineated how positive externalities can work in such a setting.
MJR: I see a lot of econometrics papers that try desperately and transparently to shoehorn hardcore experimental procedure into their papers. The authors know it’s tacked on and generally not important to the analysis, but they do it anyway because that’s what’s hot right now, and so it has to be there. This is probably the kind of thing that [Manzi] is talking about.
AL: AS, If the goal is to predict human behavior (as mine is), then yes, we should most certainly be searching for the natural, causal principles that regulate the human mind. And this is no metaphor–like all organisms that were designed by natural selection, humans are constellations of behavioral regulation systems; although ascertaining the nature of all of the input-output mappings entailed by these mechanisms is a formidable task that will take many decades to complete, it is one that can in principle be accomplished. Although I agree that many researchers design poor studies–experimental and otherwise–many more are highly competent in this regard.
Oh, and Newtonian principles and quantum mechanics do not need to apply to one another in order to be fully compatible, since they occur at different levels of hierarchical organization–the probabilistic presence of quantum-level material provides macro-level matter with the structural density is needs to operate as if it endures through time and space.
AS: You put it very eloquently, AL, but can human behavior can even be put into input-output mappings at the large, generalizable scale without resorting to what Manzi probably would refer to as obvious conclusions? (e.g. people are motivated to avoid physical and psychic pain or that all humans search for a place and some semblance of personal order in this “blooming, buzzing confusion.”) When studying decision-making, we learn in econ that psychology differs in the belief that humans are rather contextual creatures. Economics assumed revealed preferences and doesn’t question how they are formed. How do you map all of these possible contexts and permutations in a meaningful manner? I see social science more akin to meteorology or weather forecasting than physics or chemistry. In principle, defining all these mappings could be done but what is the likelihood or probability of doing so?
But assuming that humans are designed by natural selection (given that I lack the background or framework to actually dispute the nuances and ramifications this intelligently…nevertheless a lot of questions are raised in my mind. too many for a facebook post haha), how do you . Roger Newton (also a mathematical physicist, not related to Sir Isaac) raises the interesting question of how can you explain the world as we experience it unambiguously in linguistic terms, as social sciences really must do. Mathematics is really the unambiguous language, and we can only use its shadow to really apply to the actual world.
Regarding the other Newtonian thing– first of all, your reference to correspondence limits illustrates my point if I understand both you and remember physics (this is likely not the case too lol). You have to bring in probability to use the big principles to describe everyday life as we observe it. and in introducing probabilities, how can you then claim to have causal inference? So if one were to take your argument that causality in human minds can be found because we’re also products of immutable natural selection mechanisms…let’s pretend we do find various causalities for human behavior. But how do we know the things we can observe or measure actually lose their power of causality at the underlying level? (Not sure if this clear…) Plus, it’s really difficult to isolate single factors in humans. We’re both part of various systems and an entire system of our own, if you look at the body alone.
And I’m not still not sure whether anyone has tested the competence of statistics in research. And again, doesn’t academic research largely uses undergraduates as test subjects, which would be a huge problem… And then humans trying to describe humans creates problems of interpretation and focus, hence my streetlamp metaphor. One of my undergrad papers was to compare how two similar research design studies from major economists found completely different results in people’s sensitivity to price in clean drinking water tablets and microloans in Zambia and S. Africa, although they were testing the same thing. Turns out, the main difference was how they interpreted their statistical findings, kind of like how heads and tails are part of the same coin, just two different sides that never meet. Neither team was really incorrect in their interpretation either, just depended more on their background…
JA: Somewhat related — an excerpt from a commentary by Tom Siegfried about the shortcomings of statistics:
“It’s science’s dirtiest secret: The ‘scientific method’ of testing hypotheses by statistical analysis stands on a flimsy foundation. Statistical tests are supposed to guide scientists in judging whether an experimental result reflects some real effect or is merely a random fluke, but the standard methods mix mutually inconsistent philosophies and offer no meaningful basis for making such decisions. Even when performed correctly, statistical tests are widely misunderstood and frequently misinterpreted. As a result, countless conclusions in the scientific literature are erroneous, and tests of medical dangers or treatments are often contradictory and confusing.”
Full article is here: http://www.sciencenews.org/view/feature/id/57091/title/Odds_Are%2C_Its_Wrong
AL: A few points —
(1) Yes, I do believe that a large fraction of the mechanisms comprising the mind can be mapped–in principle and in practice. Just consider the visual system in humans (whose computational logic has been described in great detail), or the predictions we can make about the behavioral decisions of non-humans (with whom and when members of a species will mate, where members of a species will forage under different resource patch distributions, when they will undergo cue-triggered sex changes, under what conditions they will cooperate, etc, etc.). Why is there reason to suspect that we would be less successful in studying humans (other than the fact that we don’t have the luxury of keeping humans captive, performing invasive manipulations, cutting open their brains, etc.)? I feel like we must read different literatures if you believe that the human behavioral sciences are akin to meteorology. (For the record, the model of human nature employed by behavioral economists–which assumes rational pursuit of “self-interest” as an organizing principle, etc.–is almost certainly false in almost every way, which is why economists have always been baffled by humans’ decisions in econ games like the dictator game, prisoner’s dilemma, trust game, etc., etc.).
I agree with you that humans studying humans is a recipe for biased interpretations of observations (as is humans studying non-humans, which can –> anthropomorphism). As such, well-formed theories that do not rely on intuition–and rather are derived from first principles–are essential in this regard. This is why I believe we should study humans with the same meta-theoretical tools as we use (successfully) to study non-humans: Evolutionary theory (which, hybridized with cognitive science = evolutionary psychology). But this is a can of worms….
(2) I agree that language has its limits in describing the computational operations performed by the nervous system, but carefully-constructed contingency statements (If… then…) actually map on rather perfectly to the logic of causality. I also agree that mathematical statements–which also systematize contingencies–may increase the precision of theoretical models. Thus, ultimately, we should be interested in forming computational theories of the psychological mechanisms comprising the mind that do in fact model the operations it’s likely to perform.
(3) All causality is probabilistic–there are distributions of causes and distributions of corresponding effects; this does not undermine the idea that effects have causes. I take this as an uncontroversial assumption we make as scientists (if it is false, we should all just go become stock brokers). In making causal inferences, the goal of the scientist is to decrease our error of prediction in these distributions of cause and effect. Will we ever predict all causes and all effects (in any domain)? Nope. But we can be less wrong than we were the day before, and more wrong than we will be tomorrow. This is true for the physicist as much as the human behavioral scientist–just ask a physicist!
(4) Statistical inference is indeed tricky, sometimes misleading, and often done incorrectly. However, the problems associated with statistical inference in science decrease as the emphasis on p < .05 decreases and the emphasis on effect size increases. Cohen, Rosenthal, and many others have written extensively about this:
Others have been hard at work attempting to devise alternatives to traditional inferential criteria:
JL: To AS, I like your take, and would expand on some of the things you’ve said. First, that most research relies on creating models based on observed behavior of a finite set (sample) of a larger “population” of available data. So, research is always selective, and based on what the individual doing the researcher is looking for (willing/ready to see/observe). That’s why all science only creates “theory.”
This is true even in Physics, which Manzi thinks is somehow less mutable than social science – a point I disagree with. Theories and even “laws” of Physics are subject to the conditions they were observed within. Newtownian physics does not apply in super-high-focus (speed/density) views.
Regarding human behavior, there are definitely solid rules that apply pretty consistently across the board. The problem, I think, is that they’ve been observed in so many different fields (psychology, sociology, evolutionary biology, physical education, education, behavioral economics, etc.) that very few people (if any) have created coherent statements of what those rules are or how to apply them.
The people who I think have done the best job of creating good, usable, and testable rules/theories about human behavior are marketers. Want to know how people “work,” read marketing papers/books. Aside from that, we have a lot of “common sense” rules about how people work – “You attract more bees with honey than with vinegar.” “Keep your mouth shut and don’t rat on your friends” (ok, that’s from Goodfellas), etc.
Will those explanations (or any) ever be absolute on/off “answers?” No! That’s the nature of Existence and the perspectival nature of our participation in that Existence…especially the polar nature of language. Is it useful to look for “on/off” functions in nature and in human behavior? Yes, I think so, as long as you don’t lose sight of the fact that what you’re doing when you look for those things (and create them) is tool-making, and that you (or we) made the tool, and that the tool is not reality…just a way for use to grasp reality so that we can manipulate it.
All conversations of “cause” are post-facto…lets not forget that. So there is no clear “cause” of anything, just the selection of a series/sequence of events from what has been observed, and the attribution of “cause” to those events. I don’t mean to be vague or wishy-washy here. There are predictable sequences of events, and we can say that, if I drop a glass on concrete, the glass will (most likely) shatter. However, what is the “cause” of the glass shattering? My dropping it? It striking the concrete with a certain amount of force? The relative density of the glass versus the concrete? Gravity pulling the glass down at a certain velocity so that it strikes the concrete with enough force to shatter? Depending on my perspective, I’ll choose the explanation that suits me best.