Seeing Selection

Written on September 7, 2021

A couple times somebody has told me about the latest COVID super-spreader and then asked the question ‘how come it’s always somebody who visits 5 different places in a single day. Who the hell are these people?’. Recent examples aren’t hard to find: so-called super-spreaders do get around. But when you think about it, this question has an obvious answer: highly mobile people are more likely to contract COVID in the first place. If they happen to also be mobile on the day they infect a bunch of other people, their super-spreading isn’t super-surprising.

Many ‘what are the odds?’ questions have similar answers: when something happens that seems unlikely, our first instinct is to assume there was only a random chance of the event occurring, however hidden selection effects can bias an outcome (such as increased mobility resulting in an increased chance of contracting and spreading COVID). In this post I want to convince you that selection bias explains far more than you might think.

Education

Selective Entry Schools

Selective-entry schools are so prestigious, and why wouldn’t they be? They produce some of the highest-performing graduates and typically funnel those grads into similarly prestigious universities.

You might think that the reason for these schools’ success is all the resources they have: high-paid teachers, more resources, more facilities, etc. Think again:

We find that almost all of the selective school advantage in GCSE can be explained by family SES, achievement, ability and EduYears GPS. After controlling for these factors, going to a grammar vs. a state non-selective school is associated with a mean GCSE grade increase of just 0.026 of a standard deviation and for private schools, 0.070 of a standard deviation. Furthermore, the variance in GCSE that school type explains falls from 7% to <1%.

The reason selective-entry schools produce such good grads is because they select for smart students, with supportive home-environments. You could take all the students in a selective-entry school, swap them with all the students in a public school, and on average you will barely see a change in outcome for a given student.

This has policy implications for charter schools, which are all the rage right now, but which themselves apply strong selection pressure.

Everybody Should Go To College

At some point it was decided that everybody should go to college. The argument goes: college grads have 57% more job opportunities and make more money.

Consider this analogy from Bryan Caplan: an audience of people all sit down to watch a concert. A few people decide to stand up so they can get a better view, obscuring the view for others still sitting. For those still sitting, you could argue that their view improves by 57% upon standing, which would be correct, until everybody stands and the concert is just as visible as it was when everybody was sitting. The only difference now is that everybody’s feet get sore.

Employers don’t select for college grads because they have a bunch of skills (except maybe in med/CS/engineering), they select for college grads because they’ve proven themselves smart and hard working by completing college. As we’ve put more people through college, that signal has weakened and employers have found other ways of measuring aptitude, sometimes by considering extra-curricular achievements, but often by just bumping the requirement from graduate to post-graduate. Problem solved!

Yes, you do learn real skills at uni, but given the typical mismatch between those skills and what you’ll be doing on the job, most employers just want to know whether you’re switched on and passionate. Having everybody go to college is just a recipe for a student debt crisis and countless years of life wasted at uni, all thanks to a failure to recognise the underlying selection process.

Comment Quality

How come the civility level of comments differs so greatly between Reddit and Twitter?

In Reddit:

comments are ordered based on upvotes/downvotes, so the most agreeable comments rise to the top
threads exist within subreddits which have their own moderators and norms

Sure, there are still stupid and rude comments, but those are downvoted ‘to oblivion’ so you won’t come across them as often.

Compare this to Twitter where:

comments are shown in chronological order, meaning first-in-first-served
moderation is globally-applied, and weaker
there is no downvote button
short character limit

You could have the exact same thread of comments in the two platforms and you would perceive Reddit’s thread as being far more civil just because of how those comments are presented. The difference in comment landscape is further exacerbated by a second-order effect selecting for the users in each platform: users who love conflict will gravitate towards Twitter and users who want more civility and consensus will gravitate towards Reddit.

Whether you’re reading through a comment thread in Reddit or Twitter, you won’t be seeing conversation that’s representative of the real world, given that about 1% of users will actually post something whereas the other 99% just lurk. The 1% who do post (like myself) are far more likely to be more opinionated and stubborn than the average person, and failing to recognise that selection effect might make you think we’re on the brink of a civil war.

Display Pictures

Anybody who’s used dating apps long enough realises that, excluding private parts, if somebody is hiding some feature, it’s probably unappealing. For example, I don’t have the best teeth, so you won’t catch me smiling in many photos. Bald guys might conveniently crop off the photo below their scalp or just wear a hat and overweight people stick to close-up shots. People with a lazy eye angle their face away from the camera, and facial blemishes are concealed with makeup. You can gain quite a lot of information about somebody just by taking note of what they don’t show.

Of course, we all know that what’s on the inside is what really matters, but this exemplifies how understanding selection pressure can yield new information.

Folk Wisdom

Cats Falling From Tall Buildings

Back when I was a kid I was told that cats were better off falling from over six stories than under six stories given how long it takes for them to twist their bodies around in order to land on their feet. Sounds perfectly plausible, except that the original study that posited the theory was only considering the injuries of cats who had survived the fall. This is an example of survivorship bias, a specific kind of selection bias.

What about a study that does correct for the selection bias by also considering cats who died?

Falls from the seventh or higher stories, are associated with more severe injuries

Myth busted.

Motivational Speakers

I’ve always had beef with motivational speakers that come to highschools to inspire students. ‘If I can do it, anybody can!’ is an inspirational line, but the speaker is not representative of all their competitors, given that the vast majority of business ventures fail. And no highschool is going to invite a failure to give a demotivational speech about how ‘If I can’t do it, nobody can!’. This is another example of survivorship bias.

Being successful is not just a matter of believing in yourself, it’s a matter of believing in yourself and being really, really lucky.

Human Life

Fear of Death

Death is terrifying! We’re pretty comfortable being alive and having a self, but who knows what happens after death? Perhaps we just cease to exist. At any rate, the fact that any animal without an in-built fear of death would likely die without leaving offspring, it should be no surprise that the vast majority of humans have evolved to fear death. This should actually make us suspicious of our fear of death: if the fear is just a means to an end for our genes to continue propagating, perhaps we should take it with a grain of salt.

In fact, all arguments based on evolution are themselves arguments based on selection bias: natural selection is just another form of selection.

Anthropic Principle

What are the odds our universe supports sentient life? Pretty damn high given we’re sentient! Any universe which can’t support sentient life won’t have sentient beings asking themselves anything about the universe. This is known as the Anthropic Principle, which relates to selection of the actual observer (i.e. us). Given that slightly tweaking a fundamental property of our universe, like the mass of a proton, would almost certainly prevent life from arising, it seems that our universe is suspiciously fine-tuned for the purposes of supporting life. But if you believe there are infinitely many universes each with their own uniquely tuned fundamental properties, this becomes a whole lot less surprising: the universes which can’t support life don’t have any sentient beings pondering about their own universe, and the universes which can (such as our own) end up with everybody asking ‘how crazy is it that our universe supports life?’. Not crazy at all, so long as you’re on board with the multiverse theory of course.

So What?

In all these examples, we start with a strange phenomenon, which we assume is based on random chance, but then consider if selection is taking place which narrows down the possible outcomes, and all of a sudden the phenomenon is not so strange anymore.

The more you think about selection bias, the more you learn about how the world functions. Why are there so many anti-wrinkle ads in those taboola links? Why are hidden sugars in food so common? Are localised crime stories by commercial news channels representative? By thinking about the underlying selection pressures at play, you can answer all these questions.

Of course, your answers may be wrong and you should verify them empirically. Maybe the COVID super-spreaders just had a busy day by coincidence. Maybe cats really do have an optimum falling height to allow for proper landing orientation. The power of considering selection bias is that you can generate new hypotheses, some of which will prove correct (or at least account for a slice of the full explanation).

Psychologically speaking, trying to explain everything with selection bias might give you a fatalistic impression that nobody improves, nothing changes, and everything you see is just concealed selection. I’m probably in that boat myself and need to move the needle in the other direction. But I suspect that the average person is on the other side of the spectrum, and that the world would be improved if more people learnt to spot selection bias in the wild.

Thanks for reading, until next time!

Shameless plug (which appears on every blog post, not just this one, so don't think that I'm opportunistically posting this specific post just for the sake of doing this plug): I recently quit my job to co-found Subble, a web app that helps you manage your company's SaaS software licences. Your company is almost certainly wasting time and money on shadow IT and unused/overprovisioned licences and Subble can fix that. Check it out at subble.com