Book: Weapons of Math Destruction by Cathy O’Neil

Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy
by Cathy O’Neil
published by Broadway Books (Penguin Random House), New York
2016 (afterward from 2017)

I’ve finally read this clear and well-organized book about the design of data-centric automation tools, and how their potential has been often squandered or misused. We can do so much better!

O’Neil is a math Ph.D. and professor who went into industry and was distressed at how proprietary algorithms are being used in potentially harmful real life situations without thoughtful oversight. The fact that technology is involved at all leads to something like blind faith from the businesses and organizations that apply it. She firmly believes algorithms CAN be used for good, but won’t under current approaches. You can’t have good outcomes if the goal is to make a quick buck, keep the approach secret, and never improve it! These tools are too often used in ways which only reinforce existing inequities.

Her examples are thoughtful and described in depth.

A major flaw in data automation is the use of proxy data, and I was glad to see this called out. How do you measure if someone is a good teacher, if they would be a good employee, if they should receive a good deal on your product, or if they are a risk to the community? Without a single, obvious thing to measure, people make stuff up that is easier to quantify, and then encode their wacky idea into an “objective” measurement that doesn’t really measure the subject at all. The wacky measurement is then obscured as proprietary secrets, and sold as as a product to businesses, who want answers cheaply more than they want accuracy. The less regulated the industry, the wackier some of the data and measurements become.

For example, good teaching is hard to measure, so instead the system may measure a change in test scores… but if the students were already getting all As, there is no improvement possible, so the teacher may be marked down, and not know why. Unscientific personality tests may be used to screen potential employees, or robots may just scan applicant resumes for keywords, without any real indication that those tools result in better employees.

Many of these approaches are NOT ready for real world use, but are used just the same. O’Neil cites the Michigan automated unemployment auditing system that falsely accused thousands of unemployment fraud, which destroyed livelihoods (and marriages), as a great example. That error is still playing out, and will play out in the courts for a long time, per this Detroit Free Press article: Judge: Companies can be sued over Michigan unemployment fraud fiasco by Paul Egan & Adrienne Roberts (March 26, 2021). To quote from the article, “The state has acknowledged that at least 20,000 Michigan residents — and possibly as many as 40,000 — were wrongly accused of fraud between 2013 and 2015 by a $47-million computer system, purchased from FAST, that the state operated without human supervision and with an error rate as high as 93%.” Officials blindly launched this system without human checks, because yaaay, technology?

As someone who keeps being asked by one credit agency about cars I’ve never owned and pet insurance I’ve never purchased, I know that we’ve already automated some data projects badly. O’Neil cites other professional data scientists who have proposed sensible industry standards, and she has additional, more specific suggestions on top of this.

I can hope that the popularity of this book, which was a NYT Bestseller, can push decision makers into making better, more ethical, more fair decisions as a result of her ideas.