I’ve known that ssh encryption has an effect on the speed of file xfers. So doing thing such as rsync (which will use ssh) or even plain scp can be pretty darn slow, especially on large files and on system with old/slow CPU.
I also know about the recommendation to use different type of encryption when transferring files. Some people recommend blowfish, others arcfour. So I thought I’d do a little bit of testing in a controlled environment.
I have two recent vintage HP servers with the following specs.
HP ProLiant DL360p Gen8
Dual quad core Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz (8 core, 16 threads total)
64G RAM
4 x 3TB, mdadm RAID10, formatted as XFS, mounted noatime,logbufs=8
Tigon ethernet NIC, connected as GigE, full duplex to HP ProCurve 2848 switch
(both servers connected to same switch)
The test file is:
3921247501 Mar 4 08:22 bigdata.tar.bz2 (3.8GB)
I am using OpenSSH_5.3p1, OpenSSL 1.0.0-fips 29 Mar 2010
Kernel is 3.8.1-1.el6.elrepo.x86_64 #1 SMP Thu Feb 28 19:15:22 EST 2013 x86_64 x86_64 x86_64 GNU/Linux
I am going to copy this file from hp1 to hp2, using scp, rsync and ftp. With scp, I’ll try different encryption, no compression to see how the different encryption affect the transfers. For comparison purposes, I also timed using plain ole FTP transfer, which mean no encryption and very little system processing; and the timing proves that. Also tested with plain rsync protocol (direct to rsyncd).
I run this 3 times. Without specifying encryption, ssh/scp will use the default, which depends on the version of OpenSSH (for this version, the default is aes128-ctr). NOTE: the file is rm’ed each time at the dest before I do copy.
run | Xfer type | real | user | system |
---|---|---|---|---|
1 | scp -o Compression=no | 0m52.175s | 0m12.709s | 0m6.504s |
2 | scp -o Compression=no | 0m47.872s | 0m12.603s | 0m6.806s |
3 | scp -o Compression=no | 0m49.317s | 0m12.748s | 0m6.710s |
1 | scp -c arcfour -o Compression=no | 0m49.536s | 0m14.161s | 0m6.903s |
2 | scp -c arcfour -o Compression=no | 0m49.088s | 0m14.045s | 0m6.921s |
3 | scp -c arcfour -o Compression=no | 0m50.698s | 0m14.162s | 0m6.728s |
1 | scp -c blowfish-cbc -o Compression=no | 0m58.673s | 0m44.295s | 0m13.495s |
2 | scp -c blowfish-cbc -o Compression=no | 0m56.399s | 0m43.860s | 0m9.036s |
3 | scp -c blowfish-cbc -o Compression=no | 0m54.869s | 0m43.949s | 0m10.673s |
1 | scp -c aes128-cbc -o Compression=no | 0m49.776s | 0m14.641s | 0m7.083s |
2 | scp -c aes128-cbc -o Compression=no | 0m48.527s | 0m15.154s | 0m7.068s |
3 | scp -c aes128-cbc -o Compression=no | 0m50.554s | 0m15.334s | 0m6.983s |
1 | ncftpput -m -u ftptest -p ‘XXXXXX’ hp2 /data/ /data/bigdata.tar.bz2 | 0m34.306s | 0m0.141s | 0m4.062s |
2 | ncftpput -m -u ftptest -p ‘XXXXXX’ hp2 /data/ /data/bigdata.tar.bz2 | 0m33.351s | 0m0.160s | 0m3.863s |
3 | ncftpput -m -u ftptest -p ‘XXXXXX’ hp2 /data/ /data/bigdata.tar.bz2 | 0m33.839s | 0m0.154s | 0m3.732s |
1 | rsync –stats -a /data/bigdata.tar.bz2 hp2::data/bigdata.tar.bz2.1 | 0m33.485s | 0m10.221s | 0m6.692s |
2 | rsync –stats -a /data/bigdata.tar.bz2 hp2::data/bigdata.tar.bz2.2 | 0m33.490s | 0m10.234s | 0m6.703s |
3 | rsync –stats -a /data/bigdata.tar.bz2 hp2::data/bigdata.tar.bz2.3 | 0m33.497s | 0m10.163s | 0m6.545s |
In terms of speed, we have:
Average over 3 runs
RSYNC: real=33.491 user=10.206 sys=6.6467 FTP: real=33.832 user=0.1517 sys=3.8857 AES128-CBC: real=49.619 user=15.043 sys=7.0447 ARCFOUR: real=49.774 user=14.1226 sys=6.8507 AES128-CTR: real=49.788 user=12.687 sys=6.6734 BLOWFISH-CBC: real=56.647 user=44.0347 sys=11.068
So it look like in modern OpenSSH, using AES, it’s a wash which cipher/encryption method you want to use.
Note that rsync protocol itself is pretty darn efficient, slightly faster than FTP.
3/6/13 Update
AES in SSH. I’ve tested again from an old Dell using Pentium 4 to the fast HP, with no AES support in hardware and the default AES128-CTR is much slower. However, good news is that AES128-CBC is still faster than BLOWFISH, but slightly slower than ARCFOUR. As for FTP and RSYNC, they are neck-and-neck in speed, no clear winner.
So my conclusion is that whether using AES with hardware support (in new Intel CPUs) or software, using the CBC (block mode) variant of AES is usually good enough.
For a “from scratch” transfer with rsync, I recommend using the ‘-W’ (copy files whole, without delta algorithm). I forgot to do benchmark with that when I had the chance, but in my experience, it is faster to start when it’s a fresh rsync or ‘from scratch’ as you put it.
Argh. Kill it…. Sorry dude – didn’t really the blog formatting was going to eat the code that badly.
>”Note that rsync protocol itself is pretty darn efficient, slightly faster than http://FTP.”
Only if you are doing an update in my experience. Then it absolutely rocks.
But to make a ‘from scratch’ transfer of many files over ssh scream I haven’t found anything better than tar piped through ssh with arcfour encryption. It can pretty much saturate the network (as long as your hard drives are fast enough) on even pretty old CPUs. About 2x compared with rsync over ssh in my experience on the same hardware.
I found it so convenient that I actually wrote a Perl wrapper script some years ago for it:
http://snowhare.com/utilities/script_tricks/pull-sync-via-tar.pl
http://snowhare.com/utilities/script_tricks/push-sync-via-tar.pl
>”Note that rsync protocol itself is pretty darn efficient, slightly faster than FTP.”
Only if you are doing an update in my experience. Then it absolutely rocks.
But to make a ‘from scratch’ transfer of many files over ssh scream I haven’t found anything better than tar piped through ssh with arcfour encryption. It can pretty much saturate the network (as long as your hard drives are fast enough) on even pretty old CPUs. About 2x compared with rsync over ssh in my experience on the same hardware.
I found it so convenient that I actually wrote a Perl wrapper script some years ago for it:
#!/usr/bin/perl
use strict;
use warnings;
use Pod::Usage qw(pod2usage);
use Getopt::Long qw(GetOptions);
$|++;
#our ($tar, $ssh) = (‘/bin/tar’, ‘/usr/bin/ssh -2 -c arcfour -T -x’);
our ($tar, $ssh) = (‘tar’, ‘ssh -2 -c arcfour -T -x’);
our $tar_options = ‘–exclude=sys –exclude=proc –exclude=mnt –exclude=*/lost+found –exclude=cache’;
my ($help, $man, $debug, $dry_run) = (0,0,0,0);
my $source_dir = ‘/’;
my ($target_host, $target_dir);
GetOptions( ‘target_host=s’ => \$target_host,
‘source_dir=s’ => \$source_dir,
‘target_dir=s’ => \$target_dir,
‘debug!’ => \$debug,
‘dry-run!’ => \$dry_run,
‘help|?’ => \$help,
‘man!’ => \$man,
‘tar’ => \$tar,
‘tar-options’ => \$tar_options,
‘ssh’ => \$ssh,
) or pod2usage(2);
pod2usage(1) if $help;
my $errors = ”;
if ((! defined ($target_dir)) || ($target_dir eq ”)) {
$errors .= “Missing required –target_dir parameter\n”;
}
if ((! defined ($source_dir)) || ($source_dir eq ”)) {
$errors .= “Missing required –target_host parameter\n”;
}
if ($errors ne ”) {
print STDERR “\n$errors\n”;
pod2usage( -exitstatus => 1, -verbose => 1);
}
if ($man) {
pod2usage( -exitstatus => 0, -verbose => 2);
}
if ($dry_run) { print “Dry run\n”; }
#######################################################################
my $cmd = “$tar –directory=$source_dir $tar_options -Scpf – . | $ssh $target_host ‘$tar –directory=$target_dir -Spxf -‘”;
if ($debug || $dry_run) { print “$cmd\n”; }
if (! $dry_run) { system($cmd); }
exit;
1;
#######################################################################
#######################################################################
#######################################################################
__END__
=head1 NAME
push-sync-via-tar.pl – Performs remote backups using tar over ssh
=head1 SYNOPSIS
push-sync-via-tar.pl –target_host=example.com –source_dir=/ –target_dir=/backups/data/production-servers/daily/box1.example.com
# Show full man page on program
push-sync-via-tar.pl –man
=cut
=head1 DESCRIPTION
Uses tar over an ssh connection to make a full image backup of a specified directory to a remote server.
=head2 Notes
By design this is a ‘as fast as we can possibly go’ implementation. It is I to make a full copy
to the specified remote directory B. This means that it can and B saturate
a network link between the machines while running if the machines and their disk drives
are at all reasonably fast. I have saturated a 100 megabit ethernet network when running this script between
fast machines.
It was written to run on Redhat Linux machines. It will probably work on any *nix type machine, although you will
probably need to adjust the –tar-options to exclude system specific directories on non-Linux machines
as well as the –ssh, –tar settings.
This is essentially a convienence automation of a simple shell script.
=head1 OPTIONS
The command line options are as follows:
=over 4
=item B
Prints a brief help message and exits
=back
=over 4
=item B
Prints the manual page and exits
=back
=over 4
=item B
Specifies the remote source host’s name (ie. ‘example.com’)
Ex.
–target_host=example.com
This is a B option.
=back
=over 4
=item B
Specifies the source directory on the source host (ie. ‘/’). Defaults to ‘/’ if not specified.
Ex.
–source_dir=/
=back
=over 4
=item B
Specifies the target directory where the copy of the source dir will be put.
Ex.
–target_dir=/backups/data/napa-servers/daily/example.com
This is a B option.
=back
=over 4
=item B
Override for the the ssh command and parameters. The script assumes that
the openssh version of ssh is in your path. The default is ‘ssh -2 -c arcfour -T -x’
=back
=over 4
=item B
Override for tar command. The default is ‘tar’. The script
assumes the GNU version of tar is in your path.
=back
=over 4
=item B
Override for the options to the remote tar command. The default is ‘–exclude=sys –exclude=proc –exclude=mnt –exclude=*/lost+found –exclude=cache’
=back
=over 4
=item B
Flag for turning on the ‘dry run’ mode. In the dry run mode, the program
prints its actions, but does not actually do them.
=back
=over 4
=item B
Flag for turning on debugging information
=back
=head1 AUTHOR
Benjamin Franz,
=head1 TODO
Nothing.
=head LICENSE
This program is free software; you can redistribute it and/or modify it
under the same terms and conditions as Perl itself.
This means that you can, at your option, redistribute it and/or modify
it under either the terms the GNU Public License (GPL) version 1 or
later, or under the Perl Artistic License.
See http://dev.perl.org/licenses/
=head1 DISCLAIMER
THIS SOFTWARE IS PROVIDED “AS IS” AND WITHOUT ANY EXPRESS
OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE.
Use of this software in any way or in any form, source or binary,
is not allowed in any country which prohibits disclaimers of any
implied warranties of merchantability or fitness for a particular
purpose or any disclaimers of a similar nature.
IN NO EVENT SHALL I BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT,
SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OF THIS SOFTWARE AND ITS DOCUMENTATION (INCLUDING, BUT NOT
LIMITED TO, LOST PROFITS) EVEN IF I HAVE BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE
=head1 SEE ALSO
L L
=cut
I don’t have access to the equipments anymore to do testing. It would be interesting to compare against hpn-ssh.
Another thing I missed in my testing was rsync over “local” filesystem and NFS mounted fs. That would give a nice baseline for testing.
HI:
I have found that file transfer over hpn-SSH tunnel is way faster than any other methods.
1) hardware firewall with NAT, got around 200KB/s.
2) linux firewall with NAT, got around 200KB/s.
3) bbcp over SSH tunnel, got around 1.0MB/s. (rsync with this tunnel got 200KB/s)
4) rsync with hpn-ssh tunnel, got around 1.4MB/s. That is 700% faster than regular rsync.