Showing posts with label data science. Show all posts
Showing posts with label data science. Show all posts

Sunday, March 30, 2025

Too much efficiency makes everything worse

From "Overfitting and the strong version of Goodhart's law":
Increased efficiency can sometimes, counterintuitively, lead to worse outcomes. This is true almost everywhere. We will name this phenomenon the strong version of Goodhart's law. As one example, more efficient centralized tracking of student progress by standardized testing seems like such a good idea that well-intentioned laws mandate it. However, testing also incentivizes schools to focus more on teaching students to test well, and less on teaching broadly useful skills. As a result, it can cause overall educational outcomes to become worse. Similar examples abound, in politics, economics, health, science, and many other fields.

[...] This same counterintuitive relationship between efficiency and outcome occurs in machine learning, where it is called overfitting. [...] If we keep on optimizing the proxy objective, even after our goal stops improving, something more worrying happens. The goal often starts getting worse, even as our proxy objective continues to improve. Not just a little bit worse either — often the goal will diverge towards infinity.

This is an extremely general phenomenon in machine learning. It mostly doesn’t matter what our goal and proxy are, or what model architecture we use. If we are very efficient at optimizing a proxy, then we make the thing it is a proxy for grow worse.

Thursday, July 13, 2023

Compression Schemes as Prediction Schemes

Awhile ago I mentioned inference in a Github post on my other blog in reference to security. But when we talk about inference in a wider informational sense, we often talk about complexity.

Using Python To Access archive.today, July 2025

It seems like a lot of the previous software wrappers to interact with archive.today (and archive.is, archive.ph, etc) via the command-line ...