Rsync is the swiss army knife utility to synchronize files efficiently It uses a rolling hash algorithm to transfer differences only. The algorithm works well even on big files. The complexity of the algorithm is low, making it suitable even on local networks.
For more details on the internals of the tool, you can consult the thesis of Andrew Tridgell (the author of rsync) here: Efficient Algorithms for Sorting and Synchronization
ZFS is a next-gen file-system.
- It supports a lot of features but the one that interested me most are:
- Checksumming of all data including file-system structures to detect data corruption and HW failures that would get unnoticed without it. This is clearly the feature that interested me most.
- Native support of snapshots to store the particular state of the content even while the file-system is mounted
- On the fly compression support (with many compression algorithms)
- Data deduplication
On Debian Stretch, the installation has become extremely easy as ZFS is now part of the official packages. The module is rebuilt locally. This operation can take some time especially if you’re machine is not powerful. ZFS is designed to work on 64 bits machines. Use @ your own risk on 32 bits architectures.
sudo apt-get install zfs-dkms zfsnap
For rsync, simply install the rsync package:
sudo apt-get install rsync
First, format the drive:
# Use *sudo dmesg | grep sd* to display all drives detected. zpool create tank /dev/sdb # Will use the entire drive /dev/sdb, replace with the drive you want to use. zfs set compression=lz4 tank # Enable lz4 compression.
Deduplication can be enabled with
zfs set dedup=on tank # beware: affect speed too much!
Sadly, deduplication is an interesting features but it requires a HUGE amount of RAM. If you don’t have it, the filesystem will require at worst 1 random read for each block written. In my case, the average write speed dropped to 7-8 Mb/s.
To see some information about your pools:
zpool list # to show a quick summary of all pools zpool status -v tank # To show details about *tank*
To get daily and hourly snapshots, the simplest is to add the following lines to /etc/crontab:
0 * * * * root /usr/sbin/zfSnap -r -p hourly -a 1m tank 0 7 * * * root /usr/sbin/zfSnap -r -p daily -a 6m tank 0 6 * * * root /usr/sbin/zfSnap -d -r -p daily -p hourly tank
The last line deletes old snapshots.
The configuration of the rsync server can be found in /etc/rsyncd.conf. The details of each configuration option can be found in the rsync documentation: here
In my case, I wanted a simple way to synchronize in my LAN:
uid = nobody gid = nogroup max connections = 1 socket options = SO_KEEPALIVE [tank] path=/tank auth users = user1 user2 secrets file = /etc/rsyncd.secrets read only = no
Replace user1 and user2 with the logins you want to use for the file synchronization.
Create a file /etc/rsyncd.secrets:
Again, replace user1 and user. Passwords do not need to be identical to the login.
Finally, make sure the file can only be opened by root and start rsync:
chmod 600 /etc/rsyncd.secrets systemctl enable rsync.service
The usage is straight forward, you can consult the main rsync documentation. A simple command like this one:
# Replace <source_folder> & <host> rsync -avx <source_folder> rsync://<host>/tank/ --delete