Hacker News new | ask | show | jobs
by ndriscoll 1336 days ago
> how do you deliver a callback from server there?

You don't. The callback is on the server before it ever responds to the request. The client sees a synchronous response with a delay of a few extra milliseconds.

Here, I slapped this together to demonstrate the technique[0]

Your HTTP route handler becomes

  override def create(body: String): Task[Long] = {
    for
      rspP <- zio.Promise.make[Nothing,Long]
      _ <- createQueue.offer(InsertRequest(body, rspP))
      rsp <- rspP.await
    yield (rsp)
  }
i.e. make a promise, put your work on a queue, and have the HTTP response be the result of the promise. Then you have a background worker:

      _ <- ZStream.fromQueue(createQueue)
        .groupedWithin(8192, 10.milliseconds)
        .run(ZSink.foreach(repo.createChunk)).forkDaemon
Which processes up to 8k requests at a time, waiting up to 10 ms for a batch.

The worker does a bulk db insert, and completes the promises with the generated ids.

Similar techniques should work on read batching, but I haven't tried that. You can also speed that up some more with the COPY protocol, but IIRC you need to be more careful about escaping/SQL injection. The example I wrote uses prepared statements/parameter binding.

On my 6 year old mid-range desktop (this CPU[1] and this disk[2]) this program can process ~30k `create`s per second. For about $1500, I could buy a new computer with a Ryzen 9 7950 with 4x the core count/8x the thread count and 2x the single-threaded performance, so around ~10x more processing power, 128 GB of RAM, and a Samsung 980 Pro SSD, which can do 1M Write IOPS (25x more than my SSD) or 5GB/s sequential writes (10x more). So a $1500 computer with a single disk should be able to do around 300k/s. PCIe gen 5 is now coming out, which will allow for another doubling of disk performance.

128GB of RAM means you can keep at least 100M rows worth of index in memory. It's not that expensive (under $10k) to build a server with 1TB of RAM.

Totally feasible for a hobbyist to do without tons of tricky optimization (the code I posted is purely functional Scala!); people spend $20k on a jetski or $80k on a truck. Like I said, the most expensive part is going to be the storage, but you could do something like only store the most recent 1000 tweets per person, and charge $10 to bump that up to the most recent 10 million tweets or something. You'd come out at a substantial profit with that model if you got a few thousand takers. Similarly you could charge to let someone follow more than a few thousand people so you could pay for a read replica or two.

[0] https://github.com/ndriscoll/twit/commit/19b245677b978b42a6f...

[1] https://www.cpubenchmark.net/cpu.php?cpu=Intel+Core+i5-6600K...

[2] https://www.disctech.com/SanDisk-SDSSDHP-256G-256GB-SATA-SSD