Saturday, November 30, 2013

Setting up the Volumes in a NAS

In the case of a local disk, there are typically two file system layers: the partitions and for each partition the directory. A NAS is a storage system and as such has many more layers. Unfortunately the terminology is completely different. For example, we do not talk about disks but about volumes. Instead of abstraction layers, we talk about virtualization. Instead of drives, in the case of NAS we talk of bays.

Let us consider a storage system with two bays and and some system flash memory. We will focus on the Synology system with the DSM operating system (built on top of Debian). The flash memory holds the operating system and the system configuration. This memory is not visible to the user or administrator. When the system is first powered up, the Synology Assistant downloaded from the web (also on the CD-ROM, but not necessarily the latest version) is run and it scans the LAN to find the Synology products. You assign a fixed IP address and install the DSM image you also downloaded from the Synology site.

After the DSM has been installed and booted up, you log into your NAS from a browser and do all the administration from there. The browser has to have Java enabled, because the GUI for DSM is an applet in a browser. You keep the Assistant around because it makes a nice status display with minimal overhead.

The next step is to format the disks and set up the directories. The first time around you do a low-level formatting, because you have to map out the bad sectors. After that, you can do quick formatting, because you just have to build the directories at the various virtualization layers and map the layers. Unfortunately the terminology in the DSM manual is a little confusing. Setting up the storage carriers is called "formatting," although you will do a much more elaborate task. You do not use one of the DSM control panels, but a separate application not shown by default on the desktop and called Storage Manager.

By default, you get a Synology Hybrid RAID, but in the case of a system with two equal disks you do not want that. At the lowest level we have a bunch of blocks on a bunch of disks. Locally, DSM runs the ext4 file system, which sees this as a number of disks with partitions and each partition having an ext4 file system.

In the hybrid configuration, each drive is divided into partitions of 500 GB size. A virtualization layer is then created where all partitions are pooled and the partitions are regrouped as follows. In the case of RAID1, one partition is made the primary, and another partition on a different drive is made its secondary onto which the primary is mirrored. This process continues until all partitions are allocated, then a new virtualization is performed where all primaries are concatenated and presented as a single primary volume. Similarly, the secondaries are concatenated in a single secondary volume.

The reason Synology has introduced the hybrid RAID concept is that in practice the customers of its very inexpensive products fill them up with whatever unused disks they have laying around. In a conventional RAID, the size of the NAS would be that of the smallest disk, with the remaining sectors of the larger disks remaining wasted. If you have a server with 4 or 5 bays, with hybrid you can use all or almost all sectors available.

With 4 TB HDD available in the red flavor for about $200, a 2-bay NAS should do it in the typical SOHO application. At that price, you want to get two equal-size drives. By the way, with only 2 drives, even with hybrid you can only use the sectors available in the smallest drive, because you have only one other drive to put the secondary volumes for mirroring. Finally, you get a RAID because disks fail, so you want to buy a third spare disk for when the first disk breaks. Since statistically the disks from a manufacturing batch tend to have the same life expectancy, you want to buy the spare disk at a slightly later time.

In the case of RAID0, all odd blocks will be the first disk and the even blocks will be on the second disk. This is called striping and it is useful for example if two users stream the same movie but start at a different time. with striping you have less contention for the head and the NAS will run faster. However, you do not have a mirror copy, so when a the first disk fails, you lose your whole NAS.

Therefore, you want RAID1, where each sector has a mirror on the secondary drive. When the first disk fails, you just replace it with the spare drive and keep going. You want to buy right away a new spare disk, because statistically the second drive will fail shortly thereafter. By the way, in the above scenario, depending on the operating system your I/O rate can actually be faster than with RAID0, because the OS will serve the data for the first user from the primary and that for the second user from the secondary, so there will be no latency due to head positioning. In a well designed system the bottleneck will be in the processor and the network.

In the case of a 2-bay RAID, you want to create a RAID1 with 3 volumes. The reason for 3 volumes is that you want to have a separate volume to present outside your firewall; in my case, I allocated 100 GB. The second volume is for backup, because typically backups behave like a gas: they fill the available volume. Modern backup programs fill an entire volume, and when it is full will start deleting the oldest versions of files. This leaves you with no space for your actual data.

The typical IT approach to backups behaving like a gas it to allocate quotas to each user. However, allocating a separate backup user with a limited quota will cause the backup program keeping nagging with a ``quota exhausted'' message. By creating a separate backup volume, the backup software will quietly manage the available space.

The third volume is the one with your actual data and you allocate the remaining sectors to it. Because the second and third volume contain all your private data, you definitively want to encrypt them, so a thief cannot easily get to it. The best encryption is disk encryption. If that is not available, make sure your NAS server has encryption hardware, otherwise it can be slow.

On a NAS with multiple volumes, the volumes are called a disk group. Therefore, in the Storage Manager wizard, you select the mode "custom" and then the action "multiple volume on RAID." Selecting "create disks" will ask you to specify the disks for the group, and of course you select all disks. For "RAID type" you select "1" and the first time around you want to select to perform a disk check, which means low level formatting.

I already gave numbers for the capacity allocation for each disk group. After formatting the first disk group yielding the first volume, you run the wizard again and get volumes 2 and 3. Unfortunately you cannot rename them, so you have to remember their purpose.

At this point, the NAS has two disk drives with an ext4 file system, which at the higher virtualization level does not yet give you any storage, because you do not yet have file systems at the higher level. You can quit the storage manager and continue with the DSM. In local storage parlance, the next step is to install a file system on each partition, which in the local case is done automatically after a disk has been formatted and partitioned. In the case of remote storage, this is achieved by creating a shared folder on each volume. On your local system this folder shows up as the mounted disk; you want to choose the name carefully, so you know what that disk icon on your desktop is.

You will probably use your NAS by mounting it on your computers. The embedded operating system on the NAS will implement various network filing protocols and translate their file operations through the various virtualization layers to ext4 operations. On your client workstation, the operating system will make your mounted remote file system look like and behave like an external disk.

For security reasons you will want to active as few protocols as possible. If you have Unix or Linux clients, you have to activate NFS. If you have Windows PCs, the protocol to activate is SMB a.k.a. CIFS. If you have a Mac, you may want to activate AFP, which will allow you to use Time Machine for backups.

Other than Time Machine, when you mount a remote system on Mac OS system without specifying a protocol, Mac OS will first try to use SMB-2, because it is the highest performance protocol. On a Mac, at first sight it would make no big difference which protocol you are using. However, the semantics for ACL is very different between -nix operating system and Windows, so which protocol you use matters and SMB appears to be the way of the future.

Other notable protocols are HTTP, HTTPS, WebDAV and FTP. The first two you would enable through an Apache Tomcat server. WebDAV and its relatives like CalDAV, etc. are only necessary if you need to edit directly data on the NAS. You want to avoid activating and using FTP, because it sends the password in clear text: use SSH instead.