Hacker News new | ask | show | jobs
by fpdavis 1013 days ago
That is a very high volume, especially for being in development. Here are a few things we have done...

  * Use local machine learning models wherever possible.
  * Summarize and consolidate calls whenever possible (i.e. reduce token sizes using language analytics). 
  * Log all calls/responses so it is possible to reuse them and/or to train your ML models. This can cut down on duplicate calls.
  * Monitor your API call logs to make sure the system isn't making calls it shouldn't.
  * Throttle your calls by introducing delays/bottlenecks in the user interface (by far my least favorite).
  * Charge more for your services to decrease demand.
  * Contact your account rep and see what options they have to offer with a higher price tier.