Sonntag, 3. Dezember 2017

ZFS based NAS and home lab [Intro]

I am running a small home server for many years now. My current version is an i3 based small form factor custom build which is mainly there to provide basic storage and services like streaming, a nextcloud instance and a bunch of other things i need on my local network. This setup worked perfectly fine for a long time but as my demands shifted i needed a different approach.



So far i do not run any raid configuration as i run excessive backups and i did not want to add disks for mirroring of data that is not exactly important. Besides that the case i use so far does not provide a lot of space for disks anyway.

By now i have been working with ZFS in different job and private scenarios and i have to say that it makes your life a lot easier. I'll come to exact use a bit later but just the option to create snapshots and sync those to remote locations  justifies using it. The downside of ZFS is that you need to have the hardware resources if you want to use it seriously. Yes you can run it on a single disk or even on a file backend but you can do a lot more if you do it right. I do not want to go into the details of ZFS here as these alone could fill multiple blog posts but as some of it's features have direct impact on hardware decisions i'll try to introduce these aspects as we go.


Ok so why do i actually want to extend my setup?

  • more space
  • run VMs on the server not my workstation
  • backups from remote locations (eg my brothers company)
  • i have fun building such things :)

More space is an easy concept most likely everybody understands. Over time a lot of data aggregates and while a lot of it can be on an external drive it's very convenient to have at least the relevant stuff available online all the time.

I have a bunch of VMs tailored for specific use cases. For example i have a windows VM for photo processing, one for Windows Coding and a bunch of different Linux setups to test things out. Currently i have all of that on my local machine which is ok but if i go for a real server i can move those as well.

Remote ZFS snapshots are currently pulled from my brothers company to a file backend ZFS on my server. As ZFS pushes just the incremental data between snapshots this is an easy and fast solution for off site backups. You sync the full content once and from there on you just add the daily changes. This is fast even via a thin internet line but you still get full off site backups. So far i can only use this approach in one direction as all my data is not on ZFS. I do push my important data to his server but i have to rsync over ssh which works ok but the ZFS approach is a lot nicer.


Ok as i have laid out the basic motivation for a new server let's go on and find out what i need or want to get this working. 
My current server has roughly 4T space which is ok but as i build new let's double that to a target size of 8T. I want to be on the safe side of data integrity so i'll run mirror vdevs so i need at least 2 disks. 8T disks exist but are expensive so i go with 4 * 4T.

ZFS is memory hungry as it basically operates in memory. Data that should be written to the pool first goes to memory then it will be committed in the ZIL and then actually written on the disks. ZIL is short for ZFS Intent Log which can be on the pool disks but as a write can only be acknowledged when it is written to something persistent - this is only true for synchronous writes, async io completes when the data is in the write queue - i want something faster than a rotating disk here. So let's add an SSD for that and as data integrity is crucial i'll go for a second one in a mirror layout. These could be pretty small ones as they only hold data that is not yet written to the pool but will be in a few seconds not a lot piles up there. 

Memory is not just used to hold data written but as well for reading. The ZFS ARC is basically cache but it's a lot more intelligent than just fifo. It learns about which chunks of data are often used and keeps recently used and often used data in memory. Besides that memory is as well used for mapping the storage units so more is better here.

There is a limit to how much memory i can give to ZFS alone so i'll add 2 more SSDs for l2ARC which is an extension of ARC where data that does not fit into memory anymore but might still be useful to have available fast can go. As this space has to be managed as well it's important to scale this with available memory in mind. It does not make sense to only run 16G in memory and add 512G of l2ARC as the mapping of l2ARC would take a substantial part of the available memory. As i don't get an SSD below 120G these days i'll go with 2 of these but this time as stripe set.

Ok so the disk for the new server will be:
  • 4 * 4T rotating HDD
  • 2 * 120G SSD for ZIL
  • 2 * 120G SSD for l2ARC
I need a bunch of memory for ZFS and as well for KVM so the lowest boundary i need is 64G and more is better.
I'll need some CPU power as well for the KVMs and as well for the ZFS compression even though that won't be too much with lz4.

At this point you start checking shops for hardware and soon get to the point where you see this get's pretty expensive if you buy new hardware. Ok i don't want to spend many thousand euros on that build but i don't want to compromise on the feature set so let's move to used hardware. Consumer hardware is cheap to get used but the specs do not align with my needs so i need something else. Obviously all my needs are covered in enterprise server hardware and as companies usually drop those out when the warranty period is over you can find a lot of it on eBay decently cheap.

Server hardware has some downsides though. A server is not intended to be placed in the living room but in a rack in the server room. So the form factor is 19" and these things have a tendency to be very loud as nobody cares about that in a server room anyway.

I think i'll break this into a series now as it's already a long post and actually i did not yet get to the build. So stay tuned for the second part where this story continues.

A small outlook for this series might be appropriate. In this first part we covered  my motivation and requirements, the second part will go into the details of the hardware i use, in the third part i'll introduce the build of my silencer tower case for the 19" server, then there will be a software part where i describe what software i run and the home lab part which will cover OpenNebula as VM management layer, Jenkins as cron replacement as well as for building the base images used in OpenNebula and then there will be a final part on how it all performs in the end.



Keine Kommentare:

Kommentar veröffentlichen