User: mezark | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

user: mezark
created: 2023-02-27
karma: 126

submissions:

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

What happens when you run a CUDA kernel?

294 points | 32 comments

A running list of reasons to move to open source

6 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

0 points | 0 comments

Moe inference optimizations: 15% lower expert load by request reordering

3 points | 0 comments

0 points | 0 comments

Tensor Network Attention

2 points | 0 comments

Redundant Information in LLM Weights

5 points | 0 comments

Tans: Precomputing RANS

3 points | 0 comments

Also-RANS: Asymmetric Numeral Systems for Entropy Coding

25 points | 0 comments

70x faster cold(ish) starts for SGLang

4 points | 0 comments

QueueSpec – drafting speculation tokens while a request queues

1 points | 0 comments

ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism

1 points | 0 comments

Parallel Primitives for Multi-Agent Workflows

1 points | 0 comments

New fastest AI Model Gateway – 450x less overhead than LiteLLM

2 points | 0 comments

0 points | 0 comments

Should GPUs Make Free Trade Agreements?

3 points | 1 comments