I have recently solved a problem which may trouble others, so I share the problem and its solution with you here.
The world of the problem: OpenStack on RamNode
There are a number of web hosting providers I'm fond of (having tried many others I'd never touch again). One of those good ones is RamNode. They sell many kinds of hosting products, but I use them when I want low cost virtual private servers that I can rely upon.
A year ago to the day, they launched a new product line, built on OpenStack. These new VPS are billed hourly, so you can run one for everything from a few minutes to several years. They have very similar configuration options to their previous line-up; you can choose standard VPS, premium ones (with faster CPU) and massive ones (with much more hard disk space).
OpenStack comes with the ability to take scheduled backups or manual snapshots of your "instances" (what it calls the VPS you have running). The result is an image file you can use to create another instance, either straight away (if you want to clone your server), or at a later date (if you want to destroy the instance, and get it back later).
As well as billing you for instances you have created, RamNode also bill you for storage. It's not expensive - $0.000075 per GB per hour. That's a fair way to do it - it means you pay for the storage you use, rather than the storage others use. You can also upload distribution ISO files and so on, and again just pay for what you use.
The problem - ever increasing storage
I had a development server, on a plan with 2 GB of RAM and 65 GB of disk space. When running, it was using about 8-9 GB of space. I had finished working on the software I was developing on it, so it was time to convert it to an image, and destroy the instance. That way, I just pay for the image storage, until such time as I want to work again on that project. I removed backups and other software source tarballs, and deleted my swap file. Now I was down to 5 GB of storage. I created my image file.
I was expecting an image file of about 5 GB. Instead, I got one of … 9.9 GB!
I opened a helpdesk ticket, and was told (friendlily, promptly and efficiently as ever) that instance storage only grows, never shrinks, so an image of your instance disk will always be the largest size that instance has ever been.
This isn't a big problem. Paying for 10 GB would cost me about $0.50 per month, whereas 5 GB would be about $0.25 per month. The amounts are small, as is the difference. But as I use more servers for more things, this problem will scale up, so it would be nice to solve this if I could.
The Problem in More Detail
Let's dig into why this happens.
You probably know that if you delete a file from your computer, it may be recoverable (by a criminal, by the security services, or by some helpful PC technician if it's you who wants the file back). The reason is that the computer stores two pieces of information about each file. There is the data in the file itself, and there is a descriptor that tells the operating system the name of the file, its permissions, its exact location on the hard disk, and a few other bits as well.
When you run a simple "delete" command on a file, all that happens is that the descriptor gets removed. The file no longer exists as far as the operating system is concerned, but it does still exist. The bytes of data are still there. Eventually, you'll create other files that overwrite that data, but the operating system is lazy; it doesn't waste CPU and hard drive cycles overwriting that data; it just marks it as available if anyone wants to use it for something else.
So, before you dispose of an old hard drive, you should overwrite the entire hard drive with zeroes, or with random bytes.
OpenStack, in particular, is lazy. My swap file took up space on that hard disk. When I removed it, the link to it was removed, but all the old data was there. So when I created a snapshot, all the data in my instance disk is dumped into a snapshot. The snapshot creation tool doesn't mount the file system, so it can't distinguish which data is actually referenced in the file system and belongs to an actual current file. It just dumps it all, exactly as it is, so that it can be restored, exactly as it was.
This is why the disk never shrinks.
It's also why the obvious solutions to the problem don't work. I tried, for example, creating a new VPS instance from the snapshot I'd taken, then taking a snapshot of that. But the same junk data was being faithfully copied across to the new instance, and then into the new snapshot.
The Solution: How to Shrink the Files
It can be done.
I'll now explain 4 steps to shrink an existing image file down to the size it actually needs to be.
If you still have your instance running, there is another way to do this that doesn't involve first creating an over-sized image file and then shrinking it. However, this other approach does have drawbacks. So I'll explain the alternative at the end of this post, including the drawbacks. Before you use the other approach directly on a running instance, you need to understand what we're doing to shrink an existing image file. So please read the whole post, and don't skip straight to the end.
The disk images are stored in the "qcow2" format, which is part of the "qemu" virtualisation suite. (QCOW stands for Qemu Copy On Write).
We need to remove the unused data on my disk (by replacing it with zeroes). Then qemu will know that those areas of the disc are not needed. We can therefore convert the resulting qcow2 file to another file, also in qcow2 format, and the zeroed parts of the disk will not be copied.
I did all this on a computer running Debian 10. There will be similar tools to use on other Linux flavours. Install qemu tools on your machine
apt-get install qemu-utils
Step 1: Mount the Snapshot
Download the snapshot from RamNode's OpenStack control panel, or that of your favourite OpenStack provider.
Rename the snapshot so it ends in the file extension
.qcow2. (You probably don't need to do this, but I make fewer mistakes if files are named according to what they are).
Upload it to the machine you're going to work on, in my case my Debian machine. You need at least
the amount of disk space currently available to your OpenStack instance
the amount of disk space you're actually using.
(In my example above, I was using an instance that gave me the use of 65 GB, of which I was using 5 GB. So I needed at least 70 GB of spare space. The file I just uploaded was the 10 GB file I was hoping to shrink back down to 5 GB.)
Now we need to mount it. We're going to mount it as a network block device, so run:
qemu-nbd -c /dev/nbd0 your-snapshot.qcow2
your-snapshot.qcow2 for the full path of wherever you uploaded your snapshot.
This creates an NBD device, addressable at
/dev/nbd0, linked to the snapshot file you uploaded.
I'm now going to assume your image only had a single partition on. (If you had multiple partitions, this whole thing gets more complicated. You'd need to use a tool like fdisk to discover the partitions, and mount them all. Let's keep it simple.) Mount your partition:
mount /dev/nbd0p1 /mnt
(Mount partition 1, of
nbd0, at mount point
/mnt.) You don't have to use
/mnt as your mount point; if you choose a different one, alter the commands below accordingly, obviously.
Now have a look at the files in
/mnt - they should be exactly as you left your VPS instance disk. The first time I saw this work, I thought it was really quite cool. There is the contents of my RamNode VPS, on my own machine. I can read the files. I can change them too - and any changes are instantly replicated back to the .qcow2 file that really stores those files.
Step 2: Zero out the Unaddressed Disk Blocks
We're going to create a single file, called
TestFile.bin, that is filled with zeroes, and uses up all the unused space on your disk. That way, all the files you had used and deleted, but that had residual data, will be gone for good, and their space can then be reclaimed. Because we're working inside a mounted copy of your disk image, the file system is able to work out which blocks on the disk are free, and only write zeroes over those.
Here is why you needed all that disk space on the machine you're working on. Doing this will expand your qcow2 file up to the full amount of space it can have (in my case, the full 65 GB). The only way to get 10 GB down to 5 GB is to go up to 65 GB first.
Caution: Be very, very careful that the output file of this next command is correctly entered; you don't want to overwrite something else that you actually wanted!
dd if=/dev/zero of=/mnt/TestFile.bin
It'll take as much time as it takes, depending on the write speed of the hard disk of the machine you're working on.
By the time you've finished, you can run
ls /mnt -lh
You should see there a single, very large file, named
TestFile.bin. It will be as big as it needs to be to ensure every last byte is used up. In my case it was the 60 GB needed to fill up a 65 GB disk that already used 5 GB of space.
But here's the beauty, every byte in that file is a zero.
So we can now get rid of the file.
The operating system is lazy, so all that does is remove the file descriptor for
TestFile.bin. But the data that is left behind is all zeroes, so the data is gone as well.
Step 3: Shrink the qcow2 File
Now we've done that, we can convert the qcow2 file to another one, also in qcow2 format.
First, we need to unmount the partition, and remove the qcow2 file from NBD, so that the file we're going to compress has no locks on it.
Then run the magic command:
qemu-img convert -O qcow2 your-snapshot.qcow2 your-snapshot-shrunk.qcow2
This will also take a while, but the result will be a new file called your-snapshot-shrunk.qcow2. It should be no larger than it needs to be, and be another snapshot for your VM.
At this point, if it was me, I'd test it, just to make sure it works. Let's mount it, and take a peak inside, to make sure the shrunk file is still a valid qcow2 file and still contains the directories and files we were expecting:
qemu-nbd -c /dev/nbd0 your-snapshot-shrunk.qcow2
mount /dev/nbd0p1 /mnt
/mnt then release everything again:
Step 4: Upload and Launch
Lastly, all you have to do is upload your new, shrunk, qcow2 file.
Put the file somewhere web accessible. (You know how to do that, right, otherwise you're probably not needing a guide like this.)
Go to Images in the OpenStack control panel, click the "Plus" icon to create a new image, give it a name, choose a distro if you wish (it only affects the icon displayed with the Image file in the control panel), specify the URL where it can find your qcow2 file, and wait for it to upload and save the image
You can then launch the image as a new instance, just to test that it is a bootable image file with all your files.
Alternatively: Creating Smaller Snapshot Images
Remember the problem: There are blocks of data on the virtual disk that are not zero, but are not part of any current file.
Remember the solution: If we can zero those blocks, the snapshot generator won't store them.
Remember how we did it: We created a file populated with zeroes that used up all the available space; we did this inside the mounted file system so that only spare blocks were overwritten.
So maybe you are already thinking: Can we create a zeroed file within our running OpenStack instance before creating the snapshot? Won't the snapshot then be smaller for the same reason?
You'd be right. But there are several things to bear in mind here:
- Doing this will fill up every available byte of disk space on your running instance. It's generally not a good idea to let a computer use 100% of its disk space, because running processes can then behave erratically.
- Before you start, definitely shut down any database servers (MySQL, MariaDB, Percona, etc.). If an insert or update query were unable to complete because the disk ran out of space, you could be left with corrupt tables.
- Consider if there's anything else you must shut down. Some services may fail to run properly when the disk fills, but the real danger is where something gets left in an inconsistent state. Most services will just run properly again when the server has space.
- There's an outside possibility this approach doesn't work, so I'd suggest taking a snapshot before trying this approach (the 10 GB one, in my example above). That way, you can always revert to that if you need to, but if everything works, you can delete the larger snapshot. The kind of scenarios I'm thinking of are:
- For some reason, OpenStack doesn't see the zeroed disk blocks as unused, so instead of shrinking your backup (from 10 GB to 5 GB), it gets bigger (say, 65 GB), and forevermore you need 65 GB to take a snapshot of your instance.
- The full disk means you lose access to the server (say, you cannot log in over SSH because sshd cannot write to the system logs), so you cannot delete the giant zeroes file, and you're left with an unusable system.
- When you have finished this, and taken your smaller snapshot, the snapshot may contain entries in the error logs stemming from the fact that the system had no space disk space. This isn't a problem, but it means your system will be less "clean". Just remember: You're taking a smaller backup, but it's now a backup of a server that recently ran its disk space right to zero.
With those precautionary notes out of the way, it's easy. Log in to the VPS instance. Again, I'm assuming you have only the one partition. This time, we don't work inside
/mnt, because we haven't mounted the disk image inside another computer; we're actually working inside the instance.
Fill the hard disk write up with zeroes:
dd if=/dev/zero of=~/TestFile.bin
When you get an error that the disk is full, check the file is what you think it is:
ls ~/TestFile.bin -lh
Then delete the file
Check how much disk space you're actually using:
You should see lots of free space because the test file has been deleted. Now, take a snapshot of your instance, and the image size should match the disk space you saw you were using.