
It was extremely wasteful and was part of why booting Node applications took so much time.įinally, the very design of the node_modules folder was impractical in that it didn't allow package managers to properly de-duplicate packages. It was entirely possible that the code you wrote worked one day in development but broke later in production because you forgot to list one of your dependencies in your package.json.Įven at runtime, the Node resolution had to make a bunch of stat and readdir calls to figure out where to load every single required file from. Even having preexisting installations wouldn't save you, as package managers still had to diff the contents of node_modules with what it should contain.īecause the node_modules generation was an I/O-heavy operation, package managers didn't have much leeway to optimize it beyond just doing a simple file copy - and even though it could have used hardlinks or copy-on-write when possible, it would still have needed to diff the current state of the filesystem before making a bunch of syscalls to manipulate the disk.īecause Node had no concept of packages, it also didn't know whether a file was meant to be accessed. Generating them could make up for more than 70% of the time needed to run yarn install. The node_modules directories typically contained gargantuan amounts of files. This process was vastly inefficient for several reasons: ", and it kept going until it found the right one. "Does this file exist here? No: Ok, let's look in the parent node_modules then. In this context, Node didn't have to know the first thing about what a "package" was: it only reasoned in terms of files. The way installs used to work was simple: when running yarn install Yarn would generate a node_modules directory that Node was then able to consume thanks to its built-in Node Resolution Algorithm. Packages are stored inside Zip archives: How can I access their files?.
