Saturday, October 20, 2012

ZFS Home Storage Network at 10GbE

About a year ago I've decided to put all my data on a home built ZFS storage server. The growing amount of devices around my household prompted for an easier and much faster way to share the data. Since then the box was happily serving both CIFS as well as iSCSI over 1GbE network without any issues.

I was keen on upgrading to 10GbE for quite some time as both my server as well as clients could easily saturate 1GbE link when ZFS had all the required data in ARC. 32GB RAM in my storage server usually left me with the ARC of about 20GB which in most cases happened to be enough to cache the entire workset I was working with. Bottom line is the box rarely had to touch the disk and even if it did there was 120GB L2ARC SSD to even the bumps capable of maxing out 1GbE link as well.

It so happened that I managed to get my hands on a pair of 10GBASE-T Emulex OCe11102-NT NICs which I bought at a significant discount. With 10GBASE-T switches still costing upwards of multiple thousand dollars (even when used) I decided to just get a pair of CAT6 crossover cables running from the ZFS storage box to my workstation and do some tests to see what this configuration could be capable of.

Storage Server

My storage server is running Solaris 11 and the storage pool is built using 4x3TB Hitachi 5400RPM drives in RAID10 (ZFS mirror). The box has 32GB RAM and 120GB Vertex 3 MAX IOPS SSD for L2ARC. As mentioned above, the cache subsystem is enough to keep the box from hitting the disks most of the time. All that is driven by Intel Core i7-3770 CPU (Ivy Bridge).

iSCSI network

I've decided to dedicate 10GbE adapters to the iSCSI network I have between the storage box and my workstation. First of all, this is where I need all the speed I can get. Secondly, I can utlilize both ports with iSCSI MPIO thus archiving 20Gb available bandwidth. This is probably a total overkill but since my cards are dual ported I may as well use both ports as all I need is an extra cat6 cable. The network utilizes 9K jumbo frames. ZFS volume is using 4K block size to match NTFS file system cluster size built on top of the iSCSI volume. COMSTAR is used as an iSCSI target server with Microsoft iSCSI Initiator used for the client.

Test Results - IOPS

I'll start with IOPS results with 100% random read access over 20GB of data using Iometer at different block sizes and worker counts. Each worker was set to do 16 outstanding I/Os.

IOPS
With 4K blocks the system is able to archive quite impressive 226K IOPS! The storage sever CPU is simply running flat out at this point so I'm confident there is more available from the network. At 16K blocks the system is pushing over 1.5GB/s of random IO which is equally impressive and clearly goes beyond what a single 10GbE link is capable of so the second link is certainly being put to a good use.

Test Results - Bandwidth

For bandwidth test I've just set Iometer to do 1MB sequential reads with 16 outstanding IO/s per worker.

Throughput

Even with a single worker the system can push 2085MB/s across the wire which is getting quite close to the maximum practical speed you can get out of 2x10GbE NICs so I'm quite happy with this result!

Conclusion

I'll be doing more testing in the upcoming days but so far it appears that the upgrade was totally worth it. Having a home system capable of pushing 226K IOPs and 2GB/s bandwidth is an impressive show of how far you can push the system consisting mostly of consumer grade components. Keep in mind that the only way I could get the above numbers is by making sure all the data is available in ZFS ARC which was the initial goal of my setup.

15 comments:

  1. Micke6:09 AM

    Hi,

    Thats pretty impressive numbers.. You should do a few runs with SLOB and see where you end up..

    I'm doing some SLOB testing at the moment with my home lab-server (esx 5.1) and a couple of OCZ Vertex 4 in RAID0 and sofar with 48 readers I'm pushing ~70000 IOPS (8k) to a OL6.3 VM with 4 cores/16GB.
    We'll see where this ends.

    regds
    /M

    ReplyDelete
  2. Micke,

    thanks for your comment. I'll see if I can get some time to do the SLOB run. I've started with Iometer since these results are more important to me due to the fact that I'm using my storage server as a general purpose storage across my entire household. Nothing gets put on a local storage for me these days. Whatever Oracle workload happens to hit it is coming from the VMs running on my workstation.

    The setup seems to be capable of pushing 300K+ IOPS with 4K blocks but getting there will require a platform change (as I'm already running the fastest CPU you can get for Socket 1155) and is not worth it for me at the moment.

    ReplyDelete
  3. Micke1:56 PM

    Hi,

    I understand, at home I'm putting everything except VM's on a central NAS. I wouldnt mind putting the VM's on the NAS as well, but I like to not be constrained by GigE ;-)

    And for completeness sake, I ended up slightly north of 82000 8K read-iops (72 readers) with my slob runs and I'm pretty happy with that. Next stop, writes.

    Good post!
    regds
    /M

    ReplyDelete
  4. Hi! interesting numbers but you are just hitting ARC. Do you mind configuring the Vertex as ZIL and try a random write benchmark please? Thanks!

    ReplyDelete
  5. ES,

    the entire setup is designed to hit the ARC most of the time so that's what matters to me. I have the pool set to sync=disabled (the server is behind a UPS) so ZIL doesn't fit into the picture.

    ReplyDelete
  6. Anonymous10:38 PM

    What a complete overkill for a bunch of cheapo 5400 drives! Still, a nice toy;) I'm impressed.

    I put my zsf server inside vmware should i ever need to access it from windows though windows is crashed at the moment and i didnt really bother reinstalling. Still it sucks that vmxnet doesnt really work good on bsd, so im running linux at the moment. So far it seems stable. Btw, vmxnet3 does 5gbit/s on my linux guest

    ReplyDelete
  7. To the post above... I believe that trying to scale performance of a home system using spinning media is counter productive. The goal instead should be to use a (relatively) small number of low power high-capacity spinning disks and then outfit the system with enough cache so it doesn't have to hit the spinning media most of the time. At which point who cares what kind of disks are there. Helps the electric bills as well.

    ReplyDelete
  8. Hi Alex, awesome post ^_^ how about add ssd to test stand ?
    now thinking abount such device for me and by your opinion what is better to use 4x1TB 5400 hdd or 3x1TB +1 240 ssd?

    ReplyDelete
  9. If you meant using the SSD as the main storage then that's unlikely to happen any time soon since buying 12TB of SSDs (to match current raw capacity the system has) will get very expensive very fast (for home use, that is). They also won't do much for performance since ARC is faster any way you look at it. Also the system already has an SSD for L2ARC.

    In your case the best value would probably be to do RAID Z on 3x1TB + 240GB SSD for L2ARC and then I would go for 32GB of RAM.

    ReplyDelete
  10. The random IOPS is massive. Awesome! Which OS? FreeBSD 9 or 10 or something else? And what NICs? Did you do anything special to get crossover/crosslink (without switch) working? Special CAT6 cables? Thanks a bunch for this article, Tobias

    ReplyDelete
  11. Just saw (had been somehow overreading that): "10GBASE-T Emulex OCe11102-NT" and "CAT6 crossover cables". So one needs special cables, right? Is that cross-over capability in any way dependent on the NIC also? Say, should it work in principle with Intel X540-T1 (single-port) also (no MPIO then of course)?

    ReplyDelete
  12. These are regular CAT6 Ethernet cables so no, no special cabling is required. Also 10GbE does not require any special cables to be used for crossover so you just hook two cards together and the way you go ;-)

    ReplyDelete
  13. Ah, ok. Thanks for pointing this out! I didn't knew that. And the OS? Which OS did you run on the ZFS storage server?

    ReplyDelete
  14. Replies
    1. Awsome results! I also have two Intel X540-T2's coming. I was wondering how I tell COMSTAR to use the Intel NIC instead of the NIC supplied bij VmWare. I am using VmWare ESXi 5.1 and running OpenIndiana with napp-it.

      Delete