Rsync

Rsync is the swiss army knife utility to synchronize files efficiently It uses a rolling hash algorithm to transfer differences only. The algorithm works well even on big files. The complexity of the algorithm is low, making it suitable even on local networks.

For more details on the internals of the tool, you can consult the thesis of Andrew Tridgell (the author of rsync) here: Efficient Algorithms for Sorting and Synchronization

ZFS

ZFS is a next-gen file-system.

It supports a lot of features but the one that interested me most are:
  • Checksumming of all data including file-system structures to detect data corruption and HW failures that would get unnoticed without it. This is clearly the feature that interested me most.
  • Native support of snapshots to store the particular state of the content even while the file-system is mounted
  • On the fly compression support (with many compression algorithms)
  • Data deduplication

Server setup

On Debian Stretch, the installation has become extremely easy as ZFS is now part of the official packages. The module is rebuilt locally. This operation can take some time especially if you’re machine is not powerful. ZFS is designed to work on 64 bits machines. Use @ your own risk on 32 bits architectures.

sudo apt-get install zfs-dkms zfsnap

For rsync, simply install the rsync package:

sudo apt-get install rsync

ZFS pool preparation

First, format the drive:

# Use *sudo dmesg | grep sd* to display all drives detected.
zpool create tank /dev/sdb # Will use the entire drive /dev/sdb, replace with the drive you want to use.
zfs set compression=lz4 tank # Enable lz4 compression.

Deduplication can be enabled with

zfs set dedup=on tank # beware: affect speed too much!

Sadly, deduplication is an interesting features but it requires a HUGE amount of RAM. If you don’t have it, the filesystem will require at worst 1 random read for each block written. In my case, the average write speed dropped to 7-8 Mb/s.

To see some information about your pools:

zpool list # to show a quick summary of all pools
zpool status -v tank # To show details about *tank*

ZFS snapshots

The version of zfSnap included in Debian stretch is 1.11. The main website zfSnap concerns the version 2.x. For the Debian version, consult the wiki: zfSnap 1.x.

To get daily and hourly snapshots, the simplest is to add the following lines to /etc/crontab:

0          *      *              *       *             root /usr/sbin/zfSnap -r -p hourly -a 1m tank
0          7      *              *       *             root /usr/sbin/zfSnap -r -p daily -a 6m tank
0          6      *              *       *             root /usr/sbin/zfSnap -d -r -p daily -p hourly tank

The last line deletes old snapshots.

Rsync server

The configuration of the rsync server can be found in /etc/rsyncd.conf. The details of each configuration option can be found in the rsync documentation: here

In my case, I wanted a simple way to synchronize in my LAN:

uid = nobody
gid = nogroup
max connections = 1
socket options = SO_KEEPALIVE

[tank]
        path=/tank
        auth users = user1 user2
        secrets file = /etc/rsyncd.secrets
        read only = no

Replace user1 and user2 with the logins you want to use for the file synchronization.

Create a file /etc/rsyncd.secrets:

user1:password1
user2:password2

Again, replace user1 and user. Passwords do not need to be identical to the login.

Finally, make sure the file can only be opened by root and start rsync:

chmod 600 /etc/rsyncd.secrets
systemctl enable rsync.service

Client usage

The usage is straight forward, you can consult the main rsync documentation. A simple command like this one:

# Replace <source_folder> & <host>
rsync -avx  <source_folder> rsync://<host>/tank/ --delete

Cold storage

In order to have a backup with all the intermediates snapshots on a 2nd drive, the simplest solution is to use syncoid available on github Sanoid

Have fun!