Xylos brands

High Available File Server Design in Azure: a delicate dance between possibility and limitation

Traditional file servers require a lot of maintenance and it is generally difficult, expensive and inefficient to expand or update the existing infrastructure. This is why we made a technical design to take away all these worries. But how do you design such file servers in Azure? We’ll illustrate the process with one of our customer cases.

Our customer scenario

Recently, we’ve been asked to offer a solution to migrate on-premises file servers to the cloud. These servers were spread all over Belgium.

The solution had to be:

  • High availability
  • Centralized
  • Scalable
  • Easy to maintain
  • Easy to recover

Furthermore, we needed to take the following requirements into account:

  • Data size: 8 TB (11 million files), spread over 10 different locations in Belgium
  • NTFS permissions needed to be transferred
  • Files needed to be retained for at least 10 years

After a thorough analysis and a successful proof of concept, we came up with the following technical design:

In the following sections, we’ll delve deeper into why we’ve made some of our choices.

Our Azure solution

Azure Files

At first sight, using Azure Files seemed to be the most fitting solution – however, one crucial feature was missing: to be able to make use of ACLs, Azure Active Directory Domain Services need to be set up. Since this was not an option, we had to take an alternative route to achieve our goals.

Windows Server, with the power of Azure

What about the Windows Server Configuration?

Let’s start at the beginning: first, we needed to create two file servers in Azure. Preferably, this is done with an ARM template; this provides an automated way of redeploying these file servers and it’s a document in which the specifics of the design can easily be consulted by anyone who should need to.

We needed a general-use VM. After careful consideration, we concluded that a “Standard_D4s_v3” would fit our needs. Should we need more or less power, we could always resize it.

Furthermore, we made sure to place the servers in two different Availability Zones to achieve the desired redundancy. This means that even if a whole data centre would go down for some reason, we would still have one of the virtual machines in another data centre.

Lastly, we also added two 4 TB disks to store our data.

Why do we use two 4 TB disks?

We did this to bypass an important limitation of Azure: “Large disk support – Now you can backup VMs with disk sizes up to 4 TB (4095 GB), both managed and unmanaged.”

One deployment later…

One of the customer’s problems was that disk space on the file servers was nearly full. To remedy this, one of the things we aimed to achieve was to make it possible for our customer’s dataset to grow indefinitely for the foreseeable future.

We achieved this by using Windows Storage Spaces. Disks of these file servers are configured as a striped virtual disk, thin provisioned (with Simple Storage Layout). This means that additional disks can be attached to the VM at any time and the virtual disks can be expanded when needed with no downtime. This way, each Storage Space can grow dynamically. An additional benefit is that we don’t have to overprovision storage space (read: waste too much money); we’re only provisioning the storage space we’ll need in the very near future.

With Windows Storage Spaces, we’ve made our solution scalable.

Over to high availability

To keep these file servers in sync with each other, we used Azure File Sync.

How do we know which server employees are working on? What happens if two employees work on the same file, on a different server?

To overcome these challenges, DFS (Distributed File System) comes to the rescue. We created certain namespaces according to each Site – however, this can be like any data topology you like.  

The trick is to use priority on a namespace. When we give priority to the namespace of Fileserver 1, for example, all traffic will be routed to Fileserver 1, unless the connection becomes unavailable. This way, we create an active-passive setup. If Fileserver 1 goes down for some reason (patching, Azure data centre goes down…), the DFS namespace will start routing all traffic to Fileserver 2 until the connection is restored.

What about the Azure configuration?

We had two file servers with DFS. The key difference with an on-premises file server setup is that we’d use Azure File Sync to keep the files on both servers in sync with each other.

To make use of Azure File Sync, we first needed a storage account. Since our files in the storage account should be duplicated on both servers and are only passively used to sync both servers, standard performance with locally-redundant storage is sufficient.

We created a file share for each Site; this was another choice not to be made lightly. With another Azure limitation in mind (a file share in Azure can have a maximum size of 5 TiB) and some basic mathematical equations, we concluded that if every site holds approximately 10 TiB/10 Sites = 1 TiB, every site has still room to expand with another 4 TiB before action needs to be taken. Another advantage is that we kept it simple by not creating an extra abstraction layer (by, for example, creating a file share for each disk).

Next, we needed to install the Azure File Sync Agents on both servers. This can easily be done by some Powershell CmdLets as the installation interface, so I won’t be going into detail about these instructions. You can find more information here

The result

Sync groups

The endpoints

Next up: migrating all data. It’s very important to make sure you get these steps right. To make the setup of 10 TB go as smooth as possible, it’s essential to transfer all data to only one server, after which AFS syncs all files to the second server. Transferring the same data to both servers simultaneously will inevitably mess things up.

To migrate all data, we used Robocopy, a proven robust file copy solution. Robocopy has some very powerful switches that made it the perfect tool for our case.

  • /mir switch: Mirrors directory tree – very useful for deltas and incremental migrations.
  • /mt:[n] switch: Creates multi-threaded copies with N threads. N must be an integer between 1 and 128. This significantly speeded up the process.
  • /copy:<copyflags> with flags D=Data, A=Attributes, T=Time stamps, S=NTFS access control list (ACL), O=Owner information and U=Auditing information. It’s essential for all these values to also be copied over.

While there are other very interesting switches, these three were crucial to achieving our goal. More information can be found here

Azure provides us with some very nice visual charts and they’re continuously working to improve these charts. It’s an easy way to know what’s going on.

To Cloud Tier or not to Cloud Tier?

Cloud tiering is an optional, yet powerful feature of Azure File Sync in which frequently-accessed files are cached locally on the server while all other files are tiered to Azure Files based on policy settings. When a file is tiered, the Azure File Sync file system filter (StorageSync.sys) replaces the file locally with a pointer or reparse point. The reparse point represents a URL to the file in Azure Files.

While it’s perfectly possible to make use of this feature, we bumped in another limitation of Azure, which forced us to abandon this feature (for now). You’ll read more about that in the next section: backup and retention.

What about back-up and retention?

For compliance and regulatory reasons, all files need to be backed up for at least 10 years. There are several options to go about this; let’s discuss some of them.

MABS Backup server

While DPM/MABS provides full granularity for backup and recovery, in practice, we realised that MABS couldn’t handle the data size. Backing up any virtual machine containing a dataset larger than approximately 3 million files would result in a continuously failed backup. Furthermore, DPM/MABS also requires additional maintenance, which we tried to avoid as much as possible.

Azure Files Backup (preview)

We can back up all files by backing up the Azure File Shares (our Azure File Sync Cloud Endpoints). It’s easy to recover any file or files from a specific day. In the preview, it’s only possible to create a scheduled backup once per day. Another important limitation is that backups can only be retained for a maximum of 180 days, while our customer needs to retain all data for at least 10 years. Since it provided us with a very easy method to recover short-term data, we still decided to make use of it.

Azure Virtual Machine Backup

As mentioned before, Azure Virtual Machine backup only supports VMs with disks that are less than 4 TB in size. Since we’ve checked this at the start of the project, it’s smooth sailing from now on.

Azure VM Backup provided us with all the features we needed. We could retain backups for at least 10 years, make daily backups and easily recover our VMs and data without any additional maintenance. There was only one catch: if we’d made use of Cloud Tiering earlier, a lot of files would not have been backed up, since they would’ve only been stored in Azure Files and only pointers would’ve been stored on the VM itself. As previously stated, this forced us to abandon the Cloud Tiering feature.

Regardless, Azure Virtual Machine Backup provided us with a nice backup solution that fit our needs.

Our conclusion

A complete file server environment was set up using AFS and DFS to provide a high-available environment. The infrastructure is resilient, performant, easy to maintain and recover, monitored and flexible. Azure provides us with a lot of nice features to make every project a success.

Do you want to know more about our Azure solutions? Don’t hesitate to have a look at our website.

Share this blogpost
Categories: Azure
Tags: Azure

Also interesting for you

Leave a reply

Your email address will not be published. Required fields are marked.