|
|
|
|
|
by bnprks
777 days ago
|
|
Seems like the claims of the abstract for speed and energy-efficiency relative to an RTX 3090 are when the GPU is using a batch size of 1. I wonder if someone with more experience can comment on how much throughput gain is possible on a GPU by increasing batch size without severely harming latency (and what the power consumption change might be). And from a hardware cost perspective the AWS f1.2xlarge instances they used are $1.65/hr on-demand, vs say $1.29/hr for an A100 from Lambda Labs. A very interesting line of thinking to use FPGAs, but I'm not sure if this is really describing a viable competitor to GPUs even for inference-only scenarios. |
|
AWS instance prices are more of a supply/demand/availability thing, it would be more interesting to compare from a total cost of ownership / perf-power-area prespective.