|
I've been building an HTTP server from scratch in Rust, designed around zero heap allocations on the hot path. No async runtime, no framework — just io_uring, a custom HTTP parser, and a thread-per-core architecture. The parser is the part I'm most proud of. Instead of allocating strings for each parsed field, everything is a Span { off: u16, len: u16 } — a 4-byte view into the original buffer. The full header table is [Header; 64] on the stack (640 bytes). During parsing, it also extracts content-length/chunked/keep-alive and builds an O(1) known-header index (21 common headers tracked in a fixed array). Header lookup after parsing is a single array dereference — about 0.6 ns vs 20-23 ns for a linear scan. I benchmarked head-to-head against httparse (the parser behind hyper/axum/actix-web), same machine, same inputs, Criterion:
- Small request (35B): 42 ns vs 52 ns - 1.25x faster
- Medium request (368B, 9 headers): 200 ns vs 230 ns - 1.15x faster
- Large request (733B, 20 headers): 420 ns vs 466 ns - 1.11x faster synapserve does strictly more work per parse than httparse (semantic extraction + header indexing) and is still faster. The gap widens to 1.38-1.46x when you add equivalent semantic extraction to httparse. SIMD scanning (AVX2/SSE4.2 with runtime detection, NEON on ARM64) handles header name validation, header value validation, and URI scanning at 16-32 bytes per instruction. The I/O layer uses io_uring with:
- Multishot accept (one SQE, N connections)
- Multishot recv with provided buffer rings (kernel picks the buffer, no userspace allocation)
- Zero-copy send (SEND_ZC) and splice for static files and proxy relay
- kTLS — rustls does the TLS 1.3 handshake, then session keys are installed in the kernel via setsockopt(SOL_TLS). After that, the kernel handles encrypt/decrypt transparently, so SEND_ZC and splice still work through TLS. Each worker thread owns its connections, buffers, and ring. Connection state is a flat array indexed by slot, with generation counters for stale CQE detection. What works today: HTTP/1.1 request handling, radix-tree router, virtual hosts, static file serving (ETag, Range, Brotli), reverse proxy with upstream load balancing (weighted round-robin, least-conn, IP hash, health tracking, automatic failover, zero-copy splice relay), TLS 1.3 with kTLS. Static file serving benchmarks (wrk, 256 connections): 205K req/s on small files (+79% vs nginx), 14.5MB RSS. What doesn't exist yet: HTTP/2, HTTP/3, WebSocket. These are next.
Honest limitations:
- Linux-only (io_uring). No plans for macOS/Windows support.
- HTTP/1.1 only for now. HTTP/2 is in progress.
- The parser uses u16 spans, so max header area is 64KB. Fine for real traffic, but it's a hard limit.
- Single-machine only. No clustering or distributed config.
- Not production-battle-tested yet. It works and benchmarks well, but it hasn't handled real traffic at scale. All the benchmark code is a separate crate with the exact same inputs for both parsers — nothing cherry-picked. The parser deep dive with methodology is on the site. https://synapserve.io Parser benchmark writeup: https://synapserve.io/posts/http-parser-performance/
Happy to answer any questions about the architecture, the io_uring integration, or the SIMD scanning approach. |
Reading your website and the investor deck, one of my main questions was "who is behind this (and is this just AI-generated)?" It would be useful to put more of a bio on there.
Not having macOS/Windows support is going to make it hard to develop with. Would it be possible to build some kind of shim on top of other libraries that mirrors the API, even if it doesn't match the performance?
Also, one of the advantages of using a popular HTTP server stack is getting lots of battle-tested middleware that other people have developed and tested. Is there any way to leverage any of that or do you need to build everything from scratch? Granted, that is certainly somewhat easier in the AI era, but still.