|
|
|
|
|
by datadrivenangel
378 days ago
|
|
GPU type and utilization mean that the costs likely rise only logarithmically or sub-linear. If you commit to buying enough inference over long enough, someone can buy a rack of the newest custom inference chips and run them at 100% for you, which may be a lot cheaper per request than doing them on a cpu somewhere. |
|