| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MorphisCreator 4000 days ago

Thank you very much!

Yes, on my 2nd generation low power i3 running my normal desktop at the time, I get individual ~200ms response time, and that is with 115 32kbyte requests per second.

I am rewriting the high level protocol code that is a bit rickety because it is snowballed from the earliest code in the project other than the asyncio SSH library I implemented from scratch.

When that rewrite is done (a week or two), that will decrease latency greatly, and improve efficiency greatly, and thus should even improve throughput (although that is already max out your pipe with actual data as it is low overhead).

If I switch from 32k blocks to 512k blocks which I did some testing on (Freenet is 1meg blocks), that gives me a 10x !! throughput improvement with same CPU usage and no increase in latency of per request.

The only reason I am 32k blocks originally was the ssh protocol is 35k max packet size and I don't want to break spec so as to be able to hide as normal ssh traffic :) The 512k blocks test I did as multiple packets and was 10x faster, because that means 512k per FindNode operation instead of just 32k :) I was sitting on the fence on switching to it because I want to do the rewrite of the high level code first because the multiple data packets per request complicates that snowballed code even more :)

Also, I am using pycrypto which initial tests show is actually much slower than the other library I will likely switch to (it is called simply cryptography, it wraps platform openssl instead of implementing itself as pycrypto does). I went with pycrypto to minimize dependencies. I will have it detect if you have cryptography installed and use that optionally. I've already abstracted the pycrypto api so I can easily have it switchable at runtime. This should decrease latency a good amount as well.

1 comments

anotherangrydev 4000 days ago

Ok, try testing with packets of around 128-1024 bytes.

A 'good' result for an i5-i7 core is to get at least 10k requests per second on that situation.

You are gonna live or die by this measure dude, so work on improving it. You are around 100x far from it but if you're lucky you can get there with 'just' two 10x improvements. I suggest you to look into Flame Graphs [1], they are awesome. I have used them to pinpoint exactly where are the 10-100x bottlenecks on my code and unclog them.

Also, about your website, just make it sound less like an infomercial and you'll be fine.

And last but not least, best of luck!

[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html

MorphisCreator 4000 days ago

I should have mentioned one very important point with the figure I provided. That is 115 requests per second on /one/ i3 /core/. The network itself isn't sweating. Every user of a distributed Reddit for example would be getting those 115 requests per second themselves if they weren't a sizable portion of the network. So 100 users and your figure is matched. The bittorrent mainline DHT has multiple millions of simultaneous nodes at any one time. MORPHiS as is already deprecates bittorrent so is destine to absorb all those nodes. Also, the network scales log, and not just log base 2. Kademlia has an 'accelerated lookup' where you can control the base with memory cost to achieve O(log base 2^b) lookups. A b=3 is a resonable value for memory usage.

I should have mentioned, it is currently limited to one core, due to python. I will make it multicore probably before 1.0. It will be relatively easy to do with the block based design of morphis and the multiprocessing pool api of python. I already use the multiprocessing to great effect in the proof of work and prefix generation.

Also, this is written in a scripting language, Python. Also it is first draft and unoptimized.

If the 115 requests is enough of a problem, which I don't see it being because that is per node, not of the whole network. If that is a problem and can't be improved enough with python, the idea was originally to port it to Rust. Rust wasn't even 1.0 when I started coding, never mind their asyncio io library didn't exist until a couple months ago. Come the Rust port using their newly released asyncio library which performs nearly as well as the libev C well known one, we will be talking the kind of performance you are talking, although 10k requests per second just isn't needed on one node. It is certainly doable though if given the time! Remember, a distributed app doesn't run on one node, and thus need all 10k requests per second on one node.

MorphisCreator 4000 days ago

Thanks for that link, that is a good idea to use that instead of just Python profilers. I will give it a try when I have the time to do optimizations.

Thanks also for the wish of luck!