Hacker News new | ask | show | jobs
by wil421 1307 days ago
I’m running a FreeNAS box on an i3-8100. Right now I’m converting the NAS and my desktop to server chassis and putting them in a rack. Once I get a 10GB Unifi switch and NICs off ebay, I’m debating on running my desktop and servers diskless using iSCSI backed up by RAID0 NVME drives.
2 comments

Whatever floats your boat, but iSCSI is limited to 1500 MTU (9k? Are you sure you can boot with 9k enabled?) and while you can have 10Gbit thoughput that doesn't mean what you will get it always, eg 100 IO operations would generate 100 packets and it doesn't matter if it was 1500B each or only 100B.

And you wouldn't see the speed improvement on RAID0 NVMe drives except extremely rare fully sequential operations lasting for at least tens of seconds.

You also can try it just by running a VM with iSCSI boot on your current desktop.

Been a long time since anything iscsi related didn't hand 9k, for boot or otherwise.

But I look at it this way. You need 40gbit networking for a single pci3 nvme ( and newer drives can saturate that, or close )

And because you're throttling throughput you'll see much more frequent, longer, queuing delays, on the back of a network stack that ( unless you're using rdma ) is already 5x-10x slower than nvme.

It'll be fast enough for lots of things, especially home/lab use, and it'll be amazing if you're upgrading from sata spinning disk.. but 10gbit is slow by modern storage standards.

Of course, that's not the only consideration. Shared storage and iscsi in particular can be extremely convenient! And sometimes offers storage functionality that clients don't have ( snapshots, compression, replication )

> Been a long time since anything iscsi related didn't hand 9k, for boot or otherwise.

Don't have anything on the hands to look if the boot firmware even allows to set 9k, but I didn't touch iSCSI boot for a long time, so I would take your word for it.

> But I look at it this way. You need 40gbit networking ... is already 5x-10x slower than nvme.

This one.

> It'll be fast enough for lots of things, especially home/lab use

Yep, in OP's case I would consider just leaving the OS on the local [fast enough] drive and using iSCSI (if for some reason NFS/SMB doesn't fit) for any additional storage. It would be fast enough for almost everything, while completely eliminating any iSCSI boot shenanigans /me shudders in Broadcom flashbacks.

Another neat thing about iSCSI is what you can re/connect it to any device on the network in a couple of minutes (first time, even faster later), sometimes it comes really handy.

> Whatever floats your boat, but iSCSI is limited to 1500 MTU (9k? Are you sure you can boot with 9k enabled?) and while you can have 10Gbit throughput that doesn't mean what you will get it always, eg 100 IO operations would generate 100 packets and it doesn't matter if it was 1500B each or only 100B.

Ugh, ISCSI does have queueing so you can have many operations in flight, and one operation doesn't really translate to one packet in the first place, kernel will happily pack few smaller operations to TCP socket into one packet when there is load.

The single queue is the problem here but dumb admin trick is just to up more than one IP on the server and connect all of them via multipath

> kernel will happily pack few smaller operations to TCP socket into one packet when there is load.

And here comes the latency! shining.jpg

It wouldn't be a problem for a desktop use of course[0], especially considering what 90% of operations are just read requests.

My example is crude and was more to highlight what iSCSI, by virtue of running over Ethernet, inherently has a limit of how many concurrent operations can go in one moment. It's not a problem for a HDD packed SAN (HDDs would impose an upper limit, because spinning rust is spinning) but for a NVMe (especially with a single target) it could diminish the benefits of such fast storage.

> The single queue is the problem here but dumb admin trick is just to up more than one IP on the server and connect all of them via multipath

Even on a single physical link? Could work if the load is queue bound...

[0] hell, even on 1Gb link you could run multiple VMs just fine, it's just when you start to move hundreds of GBs...

>> kernel will happily pack few smaller operations to TCP socket into one packet when there is load.

>And here comes the latency! shining.jpg

Not really, if you get data faster than you can send packets (link full) there wouldn't be that much extra latency from that (at most one packet length which at 10Gbit speeds is very short) and it would be more than offset by the savings

Then again I'd guess that's mostly academic as I'd imagine not very many ISCSI operations are small enough to matter. Most apps read more than a byte at a time after all, hell, you literally can't read less than a block from a block device which is at least 512 bytes.

>> The single queue is the problem here but dumb admin trick is just to up more than one IP on the server and connect all of them via multipath

> Even on a single physical link? Could work if the load is queue bound...

You can also use it to use multiple NICs without bonding/teaming, althought it is easier to have them in separate network, IIRC linux had some funny business when if you didn't configure it correctly for traffic in same network it would pick "first available" NIC to send it and it needed /proc setting to change

To elaborate, default setting for /proc/sys/net/ipv4/conf/interface/arp_ignore (and arp_announce) is 0 which means

> 0 - (default): reply for any local target IP address, configured on any interface

> 0 - (default) Use any local address, configured on any interface 1

IIRC to do what I said required

    net.ipv4.conf.all.arp_ignore=1
    net.ipv4.conf.all.arp_announce=2
which basically changed that to "only send/respond to ARPs from NICs where actual address exists, not just ones with the address in same network" and fixed the problem.
> I'd guess that's mostly academic

It is, that mattered on 1Gbit links with multiple clients, ie any disk operations in VMs while there is vMotion running on the same links - you could see how everything started to crawl (and returned back after vMotion completed). For 10Gbit you need way, way more load for it to matter.

> You can also use it to use multiple NICs without bonding/teaming

You MUST (as in RFC) use multiple links without bonding and I learned to not to use LACP the hard way (yea, reading docs before is for pussies).

After second attempt I understood the implication (multiple NICs in the same IP network), but this is a self inflicted wound, usually. You don't even need a physically separate networks (VLANs), but using separate IP networks works fine, it's up to initiator to use RR/LB on them.

> it would pick "first available" NIC to send it

Yep, the usual magic of doing things to be easier for average folks. In the same vein - you need to disable Proxy ARP in any modern non-flat network or you will get shenanigans what would drive you mad.

I’m out of SATA ports and I have 2 M.2 slots available. When I can test with VM in my current desktop I will.
That's a lot of effort to put silent piece of silicon few metres away from the machine.

iSCSI gotta eat some of your CPU (you're changing "send a request to disk controller and wait" to "do a bunch of work to create packet,send it over the network, and get it back) if you don't have card with offload, it also might kinda not be fast enough to get the most out of NVMe, especially more in RAID0

And, uh, just don't keep anything important there...

It’s an i3 with 2 M.2 slots available. Enough for the home. SATA becomes the limit.