| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lunixbochs 2606 days ago
	Note: this was implemented without referencing any ZFS source code and should not be subject to the CDDL.

3 comments

josteink 2606 days ago

So we port this python to something not slow, and all the kernel-people can shut up about ZFS being terrible ;)

link

aidenn0 2606 days ago

I can Port from Python -> Common Lisp at a rate of ~100 LoC per hour, and that's a pretty friendly port (the yield expression is the only thing that can't be just done line-by-line; truthyness is the only real "gotcha" as there are a lot of "false" values in Python, but only 1.5 in Lisp[1]).

1: I say 1.5 because there is only one false value, but it has 2 idiomatic meanings: nil (equivalent to python's None) and the empty list.

link

j88439h84 2606 days ago

Dont forget, pypy is fast.

link

cure 2606 days ago

PyPy is faster than Python, yes. But Go, C and many other (compiled) languages are way faster than PyPy. Plus, if you use a language like Go or Rust then you avoid Python's GIL and you'll have much more reasonable memory usage. Best of all, deploying is a matter of copying a binary, rather than having to deal with the absolute disaster that is Python packaging.

link

deaddodo 2605 days ago

> Plus, if you use a language like Go or Rust then you avoid Python's GIL

No, but then you run into Go's GC and green threads. File systems fit squarely in the realms of "systems programming" (old definition [1], not new). Languages like Ada, Pascal, C/C++, Rust and D (without GC).

[1] - https://en.wikipedia.org/wiki/System_programming_language

link

klyrs 2606 days ago

Filesystem with a GIL, what could possibly go wrong? /s

link

emmelaich 2605 days ago

A lot less will go wrong than a filesystem without a GIL.

GIL is for safety and correctness, not speed.

link

y4mi 2605 days ago

Uh, no?

Python's global interpreter lock was added for single threaded speed and c library integrations, which often can't be used multithreaded

There was some talk about removing it recentlish to improve pythons multithreaded performance and Guido said something along the lines of

> "I'll remove it as long as single threaded performance doesn't suffer"

Which nobody succeeded in

link

nine_k 2606 days ago

Go? A GC'd language in kernel? (Well, yes, this has been done, from Lua to Haskell, but only experimentally.)

link

hu3 2606 days ago

I wouldn't advise writing low level stuff in Go but people do enjoy a challenge from time to time: https://news.ycombinator.com/item?id=18399389

link

pjmlp 2605 days ago

I wouldn't consider the workstations sold by Xerox, TI, Connecting Machines, the OS research department at ETHZ or the Microsoft’s natural language search service for the West Coast and Asia, just experiments.

link

weberc2 2606 days ago

Python is also a GC’d language...

link

twa927 2605 days ago

CPython is mostly reference-counted.

link

fnord123 2605 days ago

>PyPy is faster than Python, yes.

Python is only slow if you use it wrong:

https://apenwarr.ca/diary/2011-10-pycodeconf-apenwarr.pdf

link

GuB-42 2606 days ago

Maybe plenty fast for most applications but a filesystem is not one of these IMHO, especially for something as naturally resource hungry as ZFS.

A good filesystem implementation requires tight memory management and good control of what happens at the OS level. I am not saying it can't be be done in python, but it clearly isn't the right tool for the job.

I meant that for a production implementation. Python is perfectly fine for a proof of concept, in fact, it may be better than jumping straight down to C. But keeping it for production is foolish IMHO.

link

spullara 2606 days ago

Faster that CPython doesn't mean it is fast.

link

twa927 2606 days ago

I was trying to speed up a log processing service running on PyPy by rewriting it in Java. I was surprised that the result was about twice slower (I know Java quite well and I didn't see obvious optimizations; most of the time was spent in GC). So it can be quite fast even in more absolute terms (VM languages), at least for some types of code.

link

spullara 2606 days ago

If more than 1% of your time is in GC you are doing something very wrong.

link

michaelmrose 2606 days ago

The fact that a singular implementation was better than java says less about the languages and more about the particular software.

link

tigershark 2606 days ago

I know that it’s asking a lot, but any chance that you can post a minimum reproducible sample? From what I know it is quite smelly...

link

twa927 2606 days ago

I don't have access to this codebase now but I'll try to write some benchmark.

link

d0mine 2605 days ago

I would have expected it: https://stackoverflow.com/questions/9371238/why-is-reading-l...

link

int_19h 2605 days ago

I had a binary parser written in Python that took around 30 seconds on typical input on CPython. PyPy took that down to about 10 seconds. Rewriting it in C# took it down to 200 ms.

link

twa927 2604 days ago

If this was using a loop processing a single byte in an iteration I would expect a greater speedup on PyPy. I've seen 100x speedup in such cases.

link

mehrdadn 2606 days ago

I routinely fail to get speedup on PyPy. In fact I frequently get slowdowns. I imagine it's only fast if your code is slower than it needs to be to begin with.

link

twa927 2605 days ago

It works well for tight loops processing much data, or heavy object-orientation (multiple levels of class hierarchies). It probably won't work well for regular Django webapps or scripts. Also, real-world Python numerical/AI code uses numpy/ML libs so there's not much to optimize in Python...

link

tanilama 2606 days ago

Only when comparing to CPython

link

fragmede 2606 days ago

pypy is fast, but even were this written in C, there's still the kernel-userland boundary to contend with.

link

moonbug 2606 days ago

Fuse

link

yjftsjthsd-h 2605 days ago

If you're going to run the file system in user space, there's no reason not to just use normal ZFS. The problem with ZFS licensing is only in combining CDDL+GPL in one unit. If you're working across the kernel/userspace boundary, there's already no problem. ZoL even already ships a fuse version that works fine.

link

Conan_Kudo 2605 days ago

> ZoL even already ships a fuse version that works fine.

This is not true. A FUSE implementation is wanted though: https://github.com/zfsonlinux/zfs/issues/8

link

newnewpdro 2606 days ago

Would the CDDL matter for a python implementation that will never become part of the kernel?

link

CaliforniaKarl 2606 days ago

You'd want a reverse-engineering lawyer, so be certain. But my (IANAL) guess is: If this is a proper reverse-engineered implementation, you could then convert _this_ implementation to C, and contribute _that_ into the kernel.

Except, it seems this is BSD-licensed, so I'm not sure how that would work in the kernel (which is GPLv2).

link

loeg 2606 days ago

BSDL code is fine in the GPLv2 kernel. E.g., most of the DRM drivers are dual BSD-GPL licensed.

link

lunixbochs 2606 days ago

BSD is a subset of GPL's restrictions, so you can include BSD-licensed code in a GPL work.

link

loeg 2606 days ago

Subset isn't quite accurate, but "GPL compatible" might be a good way to describe it.

link

dr0verride 2606 days ago

It's clear to me that the solution is to integrate python into the kernel.

link

yjftsjthsd-h 2605 days ago

I assume you mean this as a joke, but I would point out that at least one of the BSD family has gone and baked lua into their kernel. Granted, lua is rather meant for that kind of thing and python isn't, but it is entertaining to point out an interpreted language that has been stuck into a unix kernel:)

link

riffraff 2605 days ago

there was lua in linux too, with lunatik https://github.com/lunatik-ng/lunatik-ng

link

lelf 2606 days ago

Bad title. It’s not a ZFS implementation, so hold your horses.

link

AHTERIX5000 2606 days ago

What do you mean? Can't you mount & read ZFS filesystems with this one?

link