Hacker News new | ask | show | jobs
user: mezark
created: 2023-02-27
karma: 28

submissions:

0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
0 points | 0 comments
Moe inference optimizations: 15% lower expert load by request reordering
3 points | 0 comments
0 points | 0 comments
Tensor Network Attention
2 points | 0 comments
Redundant Information in LLM Weights
5 points | 0 comments
Tans: Precomputing RANS
3 points | 0 comments
Also-RANS: Asymmetric Numeral Systems for Entropy Coding
25 points | 0 comments
70x faster cold(ish) starts for SGLang
4 points | 0 comments
QueueSpec – drafting speculation tokens while a request queues
1 points | 0 comments
ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism
1 points | 0 comments
Parallel Primitives for Multi-Agent Workflows
1 points | 0 comments
New fastest AI Model Gateway – 450x less overhead than LiteLLM
2 points | 0 comments
0 points | 0 comments
Should GPUs Make Free Trade Agreements?
3 points | 1 comments
0 points | 0 comments
0 points | 0 comments
Controlled generation of OS LLMs – without impacting latency
7 points | 1 comments
0 points | 0 comments
Takeoff Inference Server Is Now Open Source
3 points | 1 comments
0 points | 0 comments
0 points | 0 comments
Falcon 7B running real time on CPU
11 points | 3 comments
0 points | 0 comments