Steam

Steam

Not enough ratings
SteamPipe, and what Disk Busy Means
By aiusepsi
An explanation of how the SteamPipe content system works, and what the "Disk Busy" indicator means.
   
Award
Favorite
Favorited
Unfavorite
SteamPipe
The system that Steam uses to manage the games you have installed, allows developers to upload new versions of their games, and manages your downloads, is called SteamPipe. It's a replacement for the old system that Steam used since it was first launched in 2003.

The fundamental building blocks for games on Steam are chunks, which are approximately 1 MB of data each. When you install a new game, Steam fetches a manifest for the latest version of the game. The manifest specifies which chunks make up each of the files in the game. Your client then connects to several content servers, and begins downloading the chunks specified in the manifest.

Games are broken into chunks for a number of reasons. Chief amongst them is to aid in patching; when a developer is preparing a new patch, they run a Steam tool on the new version of the game's files. The tool identifies a way to specify a manifest for that new version which re-uses as many chunks that have already been uploaded to Steam as possible, and only uploads new chunks that have never been seen before.

When the update is released on Steam, each client downloads a new version of the game's manifest. By a comparison against the manifest for the old version, the client can work out which chunks it already has downloaded as part of the existing install, and how many new chunks there are that it has to download.

The client then assembles the updated versions of any files by preallocating space for the new files, then copying over from the existing install any existing chunks, and downloading any new chunks. It performs this reassembly process rather than trying to modify the existing files because trying to edit the existing files will have a tendency to cause fragmentation.

The old Steam system, by contrast, immediately deleted any files that were modified and redownloaded them in their entirety. It made no attempt to minimise the download by identifying reuse.
Disk Busy
So what does all this have to do with the Disk Busy status?

Think of the client as a bit like a production line. At the front end is a list of chunks which need to be either freshly downloaded from a content server or copied from the existing install.

These chunk jobs are removed from the list one at a time. If they're a download job, they get passed over to a worker that knows how to download a chunk from a content server. If it's a copy job, it gets passed to a worker that knows how to read an existing chunk from the game's install. Either way, when the worker is done, the chunk content ends up in memory (RAM), and it's passed down the production line.

The worker at the end of the production line knows how to write chunks from RAM into disk. If there's a chunk job waiting, it grabs it, and writes it to disk.

Now, think of what would happen in a real production line if step 1 was producing items twice as fast as step 2 could take items in. Either you'd have to have an develop an enormous stockpile of items between step 1 and step 2, or you'd have to slow step 1 down to half speed so that both steps process items at the same rate.

In our case, it's the disk-writer at the end of the production line that ends up slowing down. Because downloads are so much slower than disk writes, the disk-writer job spends most of its time doing nothing at all; it finishes jobs a lot faster than the download workers do.

However, it's not faster than the disk read (i.e. file copy) worker. If the disk read worker has a lot of work to do (where many more chunks are being reused than there are new chunks) then the disk writer is working at maximum rate continuously; the workers ahead of it in the production line have to slow down to match instead. This is exactly the state that "disk busy" indicates.

The condition where many more chunks are reused than there are new chunks occurs when small changes are made to existing files. For example, if a 1 MB change is made to a 1 GB file, then there are 1000 reused chunks against 1 new chunk. The vast majority of the patching process in that case consists of purely disk work, not downloading work.
What Valve Could Do
The biggest problem with the Disk Busy indicator is that it's really non-obvious what it means unless you already understand the system, and misunderstandings of it seem to be rife.

What Valve could do to ameliorate this is to instead add a second progress bar to each item on the downloads page which tracks the progress of chunk reuse as well as chunk download.

If they did this, then it would be immediately obvious why e.g. Payday 2 updates often spend much of their time not downloading; the "reuse" bar would show that Steam has gigabytes of work to do, and it would show that progress is indeed being made even while the download is not progressing.
2 Comments
Black_Blade 8 May, 2014 @ 3:53pm 
Thanks man.. needed something like these for a few forums.. now i have a source :D:
AlarminglySvelte 25 Apr, 2014 @ 9:39am 
This explains exactly what's going on. Makes total sense.