Hacker News new | ask | show | jobs
Low-Latency Inference with Speculative Decoding on D-Matrix Corsair and GPU (gimletlabs.ai)
1 points by nserrino 103 days ago