Y
Hacker News
new
|
ask
|
show
|
jobs
user:
mezark
created:
2023-02-27
karma:
28
submissions:
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
0 points
|
0 comments
Moe inference optimizations: 15% lower expert load by request reordering
3 points
|
0 comments
0 points
|
0 comments
Tensor Network Attention
2 points
|
0 comments
Redundant Information in LLM Weights
5 points
|
0 comments
Tans: Precomputing RANS
3 points
|
0 comments
Also-RANS: Asymmetric Numeral Systems for Entropy Coding
25 points
|
0 comments
70x faster cold(ish) starts for SGLang
4 points
|
0 comments
QueueSpec – drafting speculation tokens while a request queues
1 points
|
0 comments
ZeroDP: Just-in-Time Weight Offloading over NVLink for Data Parallelism
1 points
|
0 comments
Parallel Primitives for Multi-Agent Workflows
1 points
|
0 comments
New fastest AI Model Gateway – 450x less overhead than LiteLLM
2 points
|
0 comments
0 points
|
0 comments
Should GPUs Make Free Trade Agreements?
3 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
Controlled generation of OS LLMs – without impacting latency
7 points
|
1 comments
0 points
|
0 comments
Takeoff Inference Server Is Now Open Source
3 points
|
1 comments
0 points
|
0 comments
0 points
|
0 comments
Falcon 7B running real time on CPU
11 points
|
3 comments
0 points
|
0 comments