the definitive guide to growing your EBS volumes
03 Jan 2014This week I had an issue that my Graphite instance was falling apart.
Every query I tried, every dashboard I loaded was stuck, I couldn’t get anything done.
Now, I use graphite every day, every technical decision I make now is based on a graph, every problem and alert I get for the servers, I look at the graphs first, so naturally, this was not a good place to be.
From the server collectd stats, I saw that the EBS drive exploded, it just spiked to 100% and it was slowing everything down.
Now, I won’t go into Graphite, Carbon or any of these here, but I just want to go through how I solved it step by step, since every single post I read about it was partial, incomplete and inaccurate.
First, lets set out the goals for replacing the drive on your EC2 instance
- Minimal downtime
- Minimal data loss
- Fast (no copy data)
ok, so lets start…
First, you need to snapshot the drive
You don’t have to stop the instance, you don’t need to stop any service, the server can keep running while this is happening.
For a full (100%) 500G drive, it took Amazon around 2 hours, which was agonizingly slow, but the server kept running collecting stats, so I didn’t really mind it so much.
After you have the snapshot, you just create a new drive from it
You can of course configure everything just like a normal drive, you can configure iops, you can configure the size and region, just like you would a brand new one.
The filesystem is already there, your data is intact and the sun keep shining :)
Keep in mind, the drive HAS to be in the same region as your instance, or you will not be able to attach it.
It takes around 30s-1m for the drive to be available, then you just need to attach it to your machine
Then you need to select where you want to attach it
At this point, every other post I read failed to explain it clearly, so I will try really hard to be clearer.
Now, you have two drives /dev/xvdl
for example and the new one at /dev/xvdp
. /dev/xvdl
is mounted to /mnt
and the new one is not mounted yet, it’s just attached to the server.
Now, you have two options
Option #1
sudo vim /etc/fstab
You will see this line:
/dev/xvdl /mnt xfs noatime,nobootwait 0 2
As you can see, /dev/xvdl
is mounted to /mnt
like I said earlier, you can just replace it with /dev/xvdp
and restart the machine.
Your new line should look like this:
/dev/xvdp /mnt xfs noatime,nobootwait 0 2
Then you have to reboot the machine
Option #2
Stop all services that write to this disk
This is super important step, you HAVE to stop all services that write to this mounted drive, or it will just not work, Linux won’t let you unmount it if there are write or read processes.
I just stopped Graphite and relevant services and then ran
sudo umount /mnt
This will unmount /mnt
so you can continue
After the old drive is unmounted, you will need to do sudo mount /dev/xvdp /mnt
and then edit the /etc/fstab
file, just like is step #1
Then start the services again
Next step
Now if you follow the steps, you probably say to yourself it didn’t work, because the drive still shows up as 500G at 100%.
This is where you just need to run sudo xfs_growfs -d /mnt
, which will just bump the space to the drive’s capacity.
Summing up
When I did it, I had about 10 minutes of downtime to my stats machine, which didn’t take anything else down since everything is writing over UDP, for this sort of maintenance it seemed acceptable.
I didn’t lose any data except those 10 minutes where the stats server was down.
Feel free to comment or five feedback on anything