A/B Testing and COVID
In times of uncertainty, it is better to run experiments than to pretend to know the answers to difficult questions
"A/B testing Dorito packets?" by Ben Terrett is licensed under CC BY-NC-ND 2.0.
Entrepreneur and author Eric Ries published a very interesting book on innovation called The Lean Startup in 2011. In the book, which has been very influential among entrepreneurs and investors in early-stage projects, the author defined startups as “a human institution designed to create new products and services under conditions of extreme uncertainty.”
Ries puts testing ideas at the heart of the way that startups are meant to work. “They exist to learn how to build a sustainable business. This learning can be validated scientifically by running frequent experiments that allow entrepreneurs to test each element of their vision.”
One of the most common way of testing ideas is often called split testing in the startup world. It is more commonly known as A/B testing by marketing experts and has been used since the early 20th century. Ries says this:
Engineers, designers, and marketers are all skilled at optimization. For example, direct marketers are experienced at split testing value propositions by sending a different offer to two similar groups of customers so that they can measure differences in the response rate of the two groups.
So, imagine you have decided to commit to a contrarian bet by setting up a startup (as Peter Thiel recommends in Zero to One). Ries says that you should set up two different approaches and see which ones gains more traction.
For example, in one early experiment, we changed our entire website, home page, and product registration flow to replace “avatar chat” with “3D instant messaging.” New customers were split automatically between these two versions of the site; half saw one, and half saw the other. We were able to measure the difference in behavior between the two groups. Not only were the people in the experimental group more likely to sign up for the product, they were more likely to become long-term paying customers.
A/B testing is a very powerful protocol, which deserves to be much better known outside the world of product optimisation and disruptive startups. This week’s essay will explore a couple of applications in new areas in times of uncertainty: public health and regulatory policy.
The regulated investigation of new treatments often includes a version of split testing. Some Phase II trials are randomised, which means that some patients receive the new treatment, while others receive a placebo or the standard treatment. The decision on who receives what is done at random to avoid any selection effects. Randomised controlled trials (RCTs) are considered the gold standard of testing in medicine. The first modern RCT was in 1948.
However, this approach is far from common in frontline medicine - it is hard to imagine a family doctor rolling a dice before writing out a prescription. Public health has adopted the paternalistic style of the frontline rather than the experimental style of clinical trials. This makes sense if public-health officials have all the information necessary to take decisions. The paternalistic style is meant to promote confidence and trust.
However, having all the information was definitely not the case in the recent COVID-19 pandemic; and it probably won’t be the case in the next pandemic either. Public-health officials were dealing with extreme uncertainty at the tail end of 2019, throughout 2020 and into 2021 when the vaccines came online. How did the pandemic start? Would masks stop it spreading? Or lockdowns? What would be the costs of lockdowns? How, exactly, would the vaccines stop the disease spreading?
Unfortunately, none of the answers were clear. Official advice seemed to change with the wind, but was always offered with an air of unjustified certainty. This looked suspicious to many, largely because of intentionality bias (the idea that everything happens for a reason).
At the same time, many of us adopted “follow the rules” as a good baseline strategy to stay safe and keep our loved ones safe. Even so, those of us who did this sometimes found some of the rules a little strange. I well remember everyone in Spain basically ignoring silly guidelines on public masking long after it was known that this was ineffective.
Research shows that trust in public-health institutions and officials ebbed during this period as a result of officials acting with great certainty in times of uncertainty. The number of people who trust vaccines (one of the greatest discoveries in human history) also fell, which is a tragedy.
In my humble opinion, public-health officials should have reached for Ries’ book. They should have publicly admitted a state of extreme uncertainty. And they should have welcomed natural A/B experiments as a way of getting better information about what worked and why, rather than passionately defending one model.
When Sweden tried to manage the pandemic without lockdowns, other European countries should have welcomed the experiment. Which approach would be more effective? Get some data! Also, when Madrid regional president Isabel Díaz Ayuso tried to modify the terms of Spain’s national lockdowns to protect hospitality businesses and workers in the region, people who disagreed with her should have welcomed the chance to prove her wrong instead of shouting about her killing people.
Of course, we can only run one version of the universe at the time, so it is difficult to know for sure, but I strongly suspect that more honesty about uncertainty would have helped public trust in public health and vaccines. We can always try these ideas in the next pandemic and then compare the results with what happened during COVID-19.
How to deal with tourist flats
We can also see something similar with other policy areas. Airbnb was only founded in 2008 as a San Francisco-based startup. Within 16 years, the online marketplace for homestays had completely transformed life in many popular cities, including Barcelona (I live in the outskirts). There are 10,000 licensed tourist flats in the Catalan city, plus many more unlicensed ones. Barcelona’s metropolitan area is 3x as large as the city centre, which has just 1.6m residents. The city receives 32m tourists a year - 20 visitors for every resident in the centre.
Tourist flats have been driving up accommodation prices. In 2023, the average rental price in the city was €1,136 per month, up nearly 11% in a year and nearly 43% over 15 years. Meanwhile, the average monthly salary in Barcelona’s metropolitan area is €1,516. As a result of the mismatch, which is also seen in other cities in Spain, just 16% of the young Spaniards between the ages of 18 and 29 leave the parental home.
What is to be done? Clearly this is a new situation, which has been growing steadily over a decade and a half. Elected officials lack precedents. Barcelona’s Socialist Mayor since 2023, Jaume Collboni, has announced a bold plan to eliminate all tourist flats from the city by 2029. What will happen next? We have no idea! Tourism accounts for around 14% of the city’s gross domestic policy (GDP) and the sector employs 150,000 people. There is a clear risk that Collboni will make flats more affordable for young people while simultaneously ruining the jobs market and the economy.
If you’ve read this far, you can probably guess what I think Collboni should have done instead. Barcelona has ten districts. He should have split them into two groups of five and then run an A/B test to compare a ban on tourist flats with some other more moderate policy. He could then see the unintended consequences of his bold policy preferences without going all-in before having all the information.
This experimental approach to policy might seem very different from what normally happens, but please don’t assume it is new. Karl Popper, the great liberal philosopher who promoted open societies, made a similar proposal as far back as 1945. He recommended “piecemeal social engineering,” which would be based on tackling the most urgent issues in society first and then experimenting to find which solution gives the best results.
Although the word “startup” was coined as far back as 1851, the modern startup scene began in earnest in the 1970s and 1980s. It began to take off in the 1990s, with a wobble in the dot-com crash, and then picked up speed in the 21st century. The experimental approach that Popper had recommended for policymakers has gained traction in Silicon Valley and other startup hubs in recent decades. It is time for public-health officials and elected officials to sit up and take notice. The comments are open. See you next week!
Previously on Sharpen Your Axe
Thanks to the author of this guest column for suggesting this week’s theme
Popper’s views on the open society (part one and part two)
Further Reading
The Open Society and its Enemies by Karl Popper
Zero to One: Notes on Start Ups, or How to Build the Future by Blake Masters and Peter Thiel
The Lean Startup: How Today’s Entrepreneurs Use Continuous Innovation to Create Radically Successful Businesses by Eric Ries
This essay is released with a CC BY-NY-ND license. Please link to sharpenyouraxe.substack.com if you re-use this material.
Sharpen Your Axe is a project to develop a community who want to think critically about the media, conspiracy theories and current affairs without getting conned by gurus selling fringe views. Please subscribe to get this content in your inbox every week. Shares on social media are appreciated!
If this is the first post you have seen, I recommend starting with the third anniversary post. You can also find an ultra-cheap Kindle book here. If you want to read the book on your phone, tablet or computer, you can download the Kindle software for Android, Apple or Windows for free.
Opinions expressed on Substack and Substack Notes, as well as on Bluesky, Mastodon and X (formerly Twitter) are those of Rupert Cocke as an individual and do not reflect the opinions or views of the organization where he works or its subsidiaries.
You need to be careful with AB testing. Sometimes your subjects communicate with each other, and are angry that different people get different web sites. Sometimes they dislike the B side of the test so much that after a decade or so using your service, they not only stop visiting you for two years, they also bad mouth you to everyone they know. (That last example was me, and Amazon. It took me months to realize that the egregious change they'd inflicted on me hadn't been inflicted on all their customers - the others got boiled slowly, with small increments of "sponsored content", rather than going instantly from a working search to one where 50% of results were irrelevant and labelled "sponsored", and another 25% were irrelevant but not labelled "sponsored".)
More relevantly, your A/B test proposal for Barcelona wouldn't produce results that compared status quo and ban all tourist flats. You'd have half as many tourist flats across the whole city, or maybe a bit more, and a smaller shortage of non-tourist flats to be inhabited by local people. This would be the same as if you'd banned unlicensed tourist flats (with enforcement), and then issued licenses to only half of the previous number of would be tourist flats. Except that your version has more per-neighbourhood distortion.
To get a real AB test, you need two cities, far enough apart that one can't easily live in one while working in the other. And other things need to be equal. Similar housing stock, similar incomes, similar levels of tourism.
I still don't use the company I now refer to as ScAmazon unless there's no alternative for a particular purchase. I doubt they measured that as anything other than "sometimes customers randomly stop using our services". And they are still offering me free trials of Amazon Prime.
Editted to add: this is, of course, analagous to my likely reaction if government health authorities announced untruths while testing some means of handling a pandemic in my locality. I can't imagine them saying "we don't know which is better, so in cities with names beginning with A thru M, do X, and in other cities do Y". What they'd say, to those on A-M, is that X is the very best course - while contradicting themselves in N thru Z. As we saw, to an extent, with individual countries and their non-identical policies. Every last one pretty well told its residents that their choice was the best possible.
Of course during the covid pandemic they announced a few untruths even while not doing much in the way of A/B testing. Sometimes they even knew they were wrong, rather than merely knowing that they were basically guessing.