Multiple Regression

Do you believe that for every prisoner executed in the United States, eight future murders will be prevented? Do you think that a 1% increase in the number of citizens with concealed weapons has caused the state’s murder rate to drop by 3.3%? Do you believe that 10% to 20% of the decline in crime in the 1990s was due to the increase in abortions in the 1970s? Or if the United States had not built so many new prisons, the murder rate would have increased by 250% since 1974?

If you are misled by any of these studies, you may be due to a malicious form of junk science: using mathematical models that do not demonstrate predictive power to draw policy conclusions. The performance of these studies is impressive. Written by reputable social scientists from well-known institutions, they often appear in peer-reviewed scientific journals. Full of complex statistical calculations, they provide precise numerical “facts” that can be used as debate points in policy arguments. But these “facts” will be wisdom. Before the ink dried up before one study, another completely different “fact” emerged. Despite their scientific appearance, these models do not meet the basic criteria of useful mathematical models: the ability to make better predictions than random chance.

Although economists are the main practitioners of this mystery art, sociologists, criminologists and other social scientists also have versions. As we all know, it includes “econometric modeling”, “structural equation modeling” and “path analysis”. All of these are methods of causal reasoning using the correlation between variables. This question, as anyone with statistical knowledge knows, correlation is not causation. The correlation between two variables is often “false” because they are caused by the third variable. Econometric modelers try to overcome this problem by including all relevant variables in the analysis, using a statistical technique called “multiple regression”. If all causal variables are measured perfectly, then this is fine. But the data is definitely not good enough. The repeated use of multiple regression efforts to achieve clear answers to public policy questions has failed.

But many social scientists are reluctant to admit failure. They have been learning and teaching regression models for years, and they continue to use regression to make causal arguments that are unreasonable due to their parameters. I call these arguments the myth of multiple regressions, and I want to use four studies of murder rates as examples.

Myth 1: More guns, less crime.

Yale University economist John Lott used an econometric model to argue that “citizens are allowed to carry hidden weapons to prevent violent crime without increasing accidental deaths.” Lott’s analysis involves “promulgating” laws. Local authorities are required to issue a concealed weapons permit to a law-abiding citizen. Lott estimates that a 1% increase in gun ownership among a population leads to a 3.3% drop in the homicide rate. Lot and his co-author David Mustard published the first version of the study on the Internet in 1997, and thousands of people downloaded it. This is the subject of policy forums, newspaper columns, and often very complex debates on the World Wide Web. In a book with more guns and fewer crimes, Lott mocks his critics and accuses them of bringing ideology to science.

Lott’s work is an example of one-time statistics. He has more data and more complex analysis than anyone researching this topic. He asks anyone who wants to question his views to be immersed in a very complicated statistical debate, based on the difficulty of calculations that cannot be done with ordinary desktop computers. He challenges anyone who disagrees with him to download his data set and recalculate his calculations, but most social scientists don’t think that using repeated failure methods to replicate research is worthy of their attention. Most gun control researchers simply erase Lott and Mustard’s statements and continue their work. Two well-respected criminal justice researchers Frank Zimring and Gordon Hawkins (1997) wrote an article explaining:

Just like Lot and Mustad, a model of homicide determinants can be used to generate statistical residuals, showing that “issuing” laws can reduce homicide. We expect a certain econometrician to use different models To produce the same historical period of treatment, the opposite effect. The econometric model is a double-edged sword that can facilitate statistical discovery and warm the hearts of true believers of any stripes.
Zimmer and Hawkins are right. Within a year, two established econometric economists, Daniel Black and Daniel Nagin (Daniel Nagin, 1998) published a study showing that if they change the statistical model or apply it to Different parts of the data, Lott and Mustard’s findings disappeared. The Negro and Najib found that when Florida was removed from the sample, “there was no detectable impact of tort laws on murder and rape rates”. They concluded that “inferences based on the Lott and Mustard model are inappropriate, and as a result, public policy cannot be formulated responsibly.”

However, John Lott disputed their analysis and continued to promote his own analysis. Rattle collected data on counties in the United States every year from 1977 to 1992. The problem is that the size and social characteristics of American counties vary greatly. Several large countries, including major cities, account for a large proportion of murders in the United States. In fact, none of these very large counties “issued” gun control laws. This means Lott’s large data sets are completely unsuitable for his task. Where most murders took place, his key causal variable-the “will be issued” law remained unchanged.

There is no mention of this restriction in his book or article. When I found out that when I checked the data myself, the major cities lacked “release” laws, I asked him. He shrugged and said that he “controlled” the “population size” in his analysis. However, the introduction of statistical control in mathematical analysis did not make up for his lack of data on major cities where the homicide problem is most serious.

It took me some time to find this problem in his data because I am not familiar with gun control issues. However, Zimring and Hawkins filed files immediately, because they knew that the country where the National Rifle Association is strong “mostly in the South, the West, and rural areas” had “promulgated” laws. These are countries with few restrictions on guns. They observed that this legislative history frustrated “our ability to compare development trends” should be consistent with trends in other countries, because the country that changed the legislation is different from the country in terms of geographic location and constitution, so the comparison of legislative categories will always be a risk It can confuse the influence of population and region with the behavioral influence of different legal systems. Zimring and Hawkins further observed:

Of course, Lott and Mustard knew about this problem. Their solution, a standard econometric technique, is to build a statistical model that controls all the differences between Idaho and New York City, affecting homicide and crime rates, rather than “enacting” laws. If we can “specify” the main effects of homicide, rape, theft, and automatic theft in our model, then we can eliminate the influence of these factors on different trends. Lott and Mustard built models to estimate the impact of demographic data, economic data, and criminal penalties on various crimes. These models are the ultimate place to count household cooking, because they were created by these authors for these data sets and only test data that will be used to assess the impact of carrying rights.