| If I followed the links correctly this benchmark was made on a 16xH200. At current prices I'd assume that is a system price of around $750,000. The year has 86400*365 = 31536000 seconds. Thus 63072000000 tokens can be generated. As pricing is usually given per 1M tokens generated, this is 63072 such packages. Now lets write off the investment over 3 years, 250,000/63072 = 3.96. So almost $4 per 1M tokens generated with prompt processing included. Model was a Deepseek 671B 32B MoE. Looks to me that $20 for a month of coding is not very sustainable - let's enjoy the party while VCs are financing it! And keep an eye on your consumption... Electricity costs seem negligable with ~$10,000 per year at 10cts per kWh but overall cost would be ~10% higher if electricity is more like 30cts like it is in Europe. Edit: like it is pointed out by other commenters it is 2200t/s per single GPU thus the result needs to be divided by 16: $4/16 = $0.25. This actually somewhat matches the deepseek API pricing. |
8xH200 enclosed in DGX H200 system power draw is ~14kW in its peak (CTS) configuration/utilization. Over one year, and assuming maximum utilization, this is 123,480 kWh per single DGX H200 unit. We need 2x such units for 16xH200 system configuration under subject so it's 246,960 kWh/year. This is ~$25,000 at 10cts per kWh and ~$74,000 at 30cts per kWh. At ~1,110,000 1M batches this gives us: (1) ~$0.02 - $0.07 per 1M of energy cost and (2) ~$0.25 per 1M assuming the same HW depreciation rate. In total, this is ~$0.3 per 1M tokens.
Seems sustainable?