bbcp for fast network copies

This is not strictly database or MySQL related, however DBAs often deal with large data sets so others may find it useful.

I recently discovered bbcp, a utility to copy files efficiently over a network. It has the ability to use parallel network streams and specify window sizes, which can dramatically increase the transfer rates. It copies via the ssh protocol, so it fits nicely into common security environments.

Installation:

For RHEL, Solaris x86, and MacOS they provide binaries, but I chose to compile my own copy. This was as simple as make -f Makefile.amd (for the amd64 platform) and produces a single portable binary. I simply copied this file to /usr/local/bin for any host that I wish to transfer between.

Usage:

For the most normal case, usage is as simple as:
bbcp -s 16 -w 256k filename.txt user@remotehost:/path

In this case, -s 16 specifies 16 parallel network streams, and -w 256k specifies a window size. If you want to get fancy you can add additional features like progress updates, copy from a remote host to another remote host, and a lot more.

Punchline:

In my environments, using bbcp gives has increased my transfer rates by about 4-5x. This is repeatable across several different source/host combinations.

In one particular case I was able to increase a file copy on production class hardware within a LAN from 22MB/s to 116MB/s. In the case of copying from/to production class hardware across a WAN I increased transfer speed from 8MB/s to 25MB/s. Another case, this time from my local desktop to a WAN, I increased from 1.7MB/s to 10MB/s.

# generate 1GB file
dd if=/dev/urandom of=sample.dat -bs=1024 -count=1000000
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 234.145 s, 4.4 MB/s
# first copy via scp
scp sample.dat user@remotehost:
sample.dat 100% 1000MB 22.7MB/s 00:44
# now copy via bbcp
bbcp -v -s 16 -F -f -w 256k sample.dat user@remotehost:
File ./sample.dat created; 1024000000 bytes at 116776.3 KB/s

Nice, eh? Makes a huge difference when priming a new slave across the country from my InnoDB hot backups.

In Linux there are a bunch of sysctl settings you can optimize as well but that is very specific to your particular network and usage pattern, so I won’t go into that here.

Anyone else have tips/tricks to increase copy speeds for large data sets?

2 Responses to “bbcp for fast network copies”

  1. Robert Hodges Says:

    Excellent find! Do you incidentally know of any tools for parallel transfer of SQL data? SQL level bulk load is quite interesting for copying data across database types and version.

  2. ryan Says:

    Robert, I’m not quite sure I understand your question. For parallel transfer of large files, this bbcp program is quite useful and would work on logical backups such as files which contain SQL data. For parallel dump/load of the SQL data, the Maatkit parallel dump/restore is great for MySQL but would not be portable to other database versions.

Leave a Reply