I was reading this book by British demographer John Goldsthorpe. He is now established and retiring so he wrote a type of discipline memoir book called Pioneers of Sociological Science. The book itself is quite interesting in its own subject. As a memoir that filters all the important people in the discipline and organizes their foundational contribution, this book tracks the intellectual history of sociology from a quite different angle. Instead of sticking to the conventional approach that tracks modern sociology to Durkheim, Weber, and Marx, or other revisionist intellectual histories that put identitarian figures such as early feminists or racial theorists to the triune of Durkheim, Weber and Marx, Goldsthorpe very oddly, in a nice way, traces early sociological thinkers with statistical focus. Basically, this book regards sociology as a population science that deals mostly with collections of individual statistical points and they collectively form different boundaries of the society.
One of the first figures he mentioned is Adolphe Quetelet, who usually appears in statistics textbooks, not sociological ones. I knew Quetelet invented the idea of the average man. When I was reading the seven pillars of statistical wisdom by Stigler, I came across the notion of average as a revolutionary and unconventional way of thinking about chaotic phenomena. Basically, the average man proposed by Quetelet does not really exist anywhere in the real world, people initially had difficulty accepting this idea that represents a fictitious entity but claims to carry the truth of observable realities. “Quetelet could use direct individual measurements, in metres or in kilogrammes, and it was in fact by taking the arithmetical means of such measurements that he arrived at the first formulation of his famous concept of l’homme moyen — the average man”.
If the application of average man is simply taking the arithmetic mean of a collective of all individuals, this is what we see the use of means in presentations of statistical findings by media and laymen alike. It is fine to use mean at the first glance but soon you will find issues that arise not because mean is not accurate or methodologically biased. Mean is problematic in its application just because, again, it doesn’t touch base with reality from both commonsense and the perspective of modeling. The early men, understandably, failed to find out how could “the average number of children is 1.7” make any sensible application in their lifeworld knowledge. In addition, I think that saying arithmetic mean represents the average man also cannot agree with conditional probability.
Let’s say height conforms with a normal Gaussian distribution with a mean of 170cm and s.d. of 20. Then the highest probability of finding a man of 170cm is 50%. This is also the highest probability of randomly finding a man of any height. 170cm is the average man in regard to height. Then let’s say the average years of education is 15 years, and it’s also normally distributed so that education ~(\mu=15, \epsilon).
Since both have a 50% of chance being selected. Finding an average man of 170cm and 15 years of education should be joint probability P(A, B)=P(A)P(B)=.25? This should definitely not be the case because having an average degree in one trait is usually linked to having an average degree in another trait. Education and height are not independent events. Most social traits are not independent events. In practice, we often talk about the average man as if these social traits are independent events. For example, we would hear the science media say something like: the average American man is $100,000 in debt, making $35,000, 5'8" tall, and have a high school degree. This statement may be correct in its application, but the statistical thinking behind getting this application is wrong. You cannot find an average in each of these traits, put them together, and then claim these traits altogether define the average American man. In the whole population of all American men, $35,000 is the average salary, this is correct. But $35,000 is not the average salary conditional on the average education. I ran a little test with the General Social Survey of 2018, which is a representative survey of the U.S. population. You will find the average age of Americans is 48.9. The average number of children is 1.85 (just say 2 for simplicity). However, the average age among people with 2 kids is 52.64. This is very important matter to be interpreted correctly: the most common American Joe is a guy with two kids and 53 years old. The one who’s 49 and has two kids is actually leaning a bit younger on the spectrum, or a bit too reproductive if age is the measuring rod.
So the reason that the average man doesn’t exist should not be reduced to the minuscule likelihood of the joint probability of finding a certain number of arithmetic means. If the average man of 170cm and 15 years of school can be found by only a chance of 25%, adding more averaging events (salary, looking, etc) would reduce the likelihood to nearly 0%. But this is not correct.
Instead, finding one average value should be conditional on the likelihood of another average value, not by forming a joint probability of both values. That is P (A|B) = P (A and B) / P (B). However, we often do not know the probability of both values happening, which is P(A & B). From a Bayesian perspective, then, it is P(A|B)=P(B|A)*P(A)/P(B). Still, this is data-driven, without data we wouldn’t know P(B|A) either.
With data, it gets a bit different and easier to implement. To find the average man, we need to have the arithmetic mean of one trait, then use that trait as a condition to find the arithmetic mean of a second trait conditional on the first trait’s arithmetic mean. But since the conditioning factor can be reversed, we have to first set which variable is the one being conditioned on. Usually it’s some immutable factors like gender and race. For example, the average age, as we shown with GSS before, is 48.9. But after conditioning a battery of other factors at their means (I included income, education, children, gender), average age of Americans is 44.
> reg age rinc degree childs i.sex
> margins, atmeansExpression : Linear prediction, predict()
at : rincome = 10.40458 (mean)
degree = 1.80687 (mean)
childs = 1.687786 (mean)
1.sex = .4656489 (mean)
2.sex = .5343511 (mean)— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
— — — — — — -+ — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —
_cons | 44.22901 .3668238 120.57 0.000 43.50938 44.94864
— — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — —