| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by andrewcanis 2945 days ago

Nothing was dialed back. My previous post just confirmed that our elasticache was not misconfigured. I used the latest github memcached with 14 worker threads as you suggested and your benchmark script gives the same results that we reported for the r4.4xlarge.

1) Actually your earlier test confirms our point that the CPU does not saturate the 10Gbps network with small value sizes. For example, in your 10B value example you got 15M req/sec with 16 cores. This rate is 1.2Gbps (15M x 10B * 8), well below the network limit of 10Gbps. The FPGA would still be ~8X faster at line rate.

2) Regardless of the get/set ratio the FPGA will hit line rate. However, thanks for pointing out that GET requests are much faster in the latest version of memcached, we didn't know that. Looks like the FPGA is only 3X faster for this 10:1 Get to Set workload.

3) Users would prefer if there was no pipelining at all. We only used pipelining to get around packet/sec limitations on AWS. If the FPGA was connected directly to the network we could hit line rate without any pipelining.

We aren't trying to mislead, we are just showing what's possible with the FPGA on AWS: line-rate processing of incoming requests at close to 10Gbps. The cool part as I mentioned is that the FPGA is still under-utilized so we could add encryption without affecting requests/sec at all because hardware cores execute in parallel. Another idea is to compress the data on the fly.

Agreed that RAM is the expensive part, which is why we picked a CPU instance that had similar RAM to the FPGA instance. Yes we have heard of users caching data to SSD to save cost.