Dependencies are a Liability

About 8 years ago, I wrote a blog post that goes over my minimalist philosophy for developing software. One of the things I discussed in the post is that I like to minimize external dependencies. For context, when I wrote that post, I was working at the Mila AI research institute, and there were multiple instances where I downloaded source code from an AI research paper that had been published just six months earlier, only to find that the code was broken. One of the main sources of breakage came from dependencies. There were a variety of ways in which dependencies could cause issues, including but not limited to unpinned Python packages, C code that wouldn't compile, and system packages that wouldn't install. Sometimes I would also see a dependency on some random GitHub repository with no commit specified.

In the past, the main reason I wanted to minimize dependencies was to avoid breakage. It should be obvious, but if your software depends on 100 external dependencies, then in order for your software to work, you need to be able to install these 100 dependencies on every machine you want your software to run on. As such, it makes sense to carefully pick which dependencies you want to use. When I design a piece of software, I try, as much as possible, to rely on packages that have been around for some time and will likely continue to be around in a few years' time. I also make it a point to avoid introducing a dependency that only brings in a tiny amount of logic that I could easily implement myself. I generally dislike frameworks, because they tend to constrain the way you write software, and they often come with their own large set of dependencies.

Today, things are different. In the past few years, there have been a number of supply chain attacks in both the Python and npm ecosystems, as well as some very prominent Linux exploits discovered, such as Dirty Frag and Copy Fail. I've been wanting to play with local AI coding harnesses, but I find myself hesitating when thinking about installing code that depends on a large number of packages. OpenCode likely has hundreds of direct and indirect dependencies. In the Python world, just the LangChain package itself transitively installs around 267 packages.

Discussing this with a friend, he pointed out something that should be obvious but isn't talked about enough, which is that the Linux and macOS security model is very much outdated, and exposes us to unnecessary risks. The idea that when you clone some random GitHub repository and run a script, it has access to your entire home directory on a computer where you may be doing your banking, is completely insane. Realistically, we need to move to a model where every program is sandboxed by default, and any access to the filesystem or sensitive system resources requires the program to ask for permissions, like on mobile. I know there are tools to sandbox code execution, but sandboxing should be the default, not something that requires installing extra tools or writing a config file.

Supply chain attacks aren't new, but today, it's possible to use LLMs to automate the discovery of security vulnerabilities. It's also possible to create bots that relentlessly try to craft and introduce malicious pull requests into open source projects. There are people unironically advocating to not even review AI-generated code. You have to wonder whether these people are carefully reviewing pull requests on their repositories, and you'll likely come to the conclusion that the answer is no.

Realistically, when it comes to exploits and supply chain attacks, the use of AI creates an arms race. If AI tools make it much easier to discover and introduce security vulnerabilities, then it seems likely that better AI tools will also be created to continuously audit codebases and pull requests. The only defense against malicious AI is more AI. However, even in the presence of tools that can audit code, some exploits can slip through undetected, and so maybe the best way to secure software is to make the software simpler, and to minimize unnecessary dependencies, effectively reducing the attack surface. When you run a server, you don't want unused services with exposed ports, because those services are a potential liability. When you write software, you should be careful to avoid introducing code you don't actually need.