I can Port from Python -> Common Lisp at a rate of ~100 LoC per hour, and that's a pretty friendly port (the yield expression is the only thing that can't be just done line-by-line; truthyness is the only real "gotcha" as there are a lot of "false" values in Python, but only 1.5 in Lisp[1]).
1: I say 1.5 because there is only one false value, but it has 2 idiomatic meanings: nil (equivalent to python's None) and the empty list.
PyPy is faster than Python, yes. But Go, C and many other (compiled) languages are way faster than PyPy. Plus, if you use a language like Go or Rust then you avoid Python's GIL and you'll have much more reasonable memory usage. Best of all, deploying is a matter of copying a binary, rather than having to deal with the absolute disaster that is Python packaging.
> Plus, if you use a language like Go or Rust then you avoid Python's GIL
No, but then you run into Go's GC and green threads. File systems fit squarely in the realms of "systems programming" (old definition [1], not new). Languages like Ada, Pascal, C/C++, Rust and D (without GC).
I wouldn't consider the workstations sold by Xerox, TI, Connecting Machines, the OS research department at ETHZ or the Microsoft’s natural language search service for the West Coast and Asia, just experiments.
Maybe plenty fast for most applications but a filesystem is not one of these IMHO, especially for something as naturally resource hungry as ZFS.
A good filesystem implementation requires tight memory management and good control of what happens at the OS level. I am not saying it can't be be done in python, but it clearly isn't the right tool for the job.
I meant that for a production implementation. Python is perfectly fine for a proof of concept, in fact, it may be better than jumping straight down to C. But keeping it for production is foolish IMHO.
I was trying to speed up a log processing service running on PyPy by rewriting it in Java. I was surprised that the result was about twice slower (I know Java quite well and I didn't see obvious optimizations; most of the time was spent in GC). So it can be quite fast even in more absolute terms (VM languages), at least for some types of code.
I had a binary parser written in Python that took around 30 seconds on typical input on CPython. PyPy took that down to about 10 seconds. Rewriting it in C# took it down to 200 ms.
I routinely fail to get speedup on PyPy. In fact I frequently get slowdowns. I imagine it's only fast if your code is slower than it needs to be to begin with.
It works well for tight loops processing much data, or heavy object-orientation (multiple levels of class hierarchies). It probably won't work well for regular Django webapps or scripts. Also, real-world Python numerical/AI code uses numpy/ML libs so there's not much to optimize in Python...
If you're going to run the file system in user space, there's no reason not to just use normal ZFS. The problem with ZFS licensing is only in combining CDDL+GPL in one unit. If you're working across the kernel/userspace boundary, there's already no problem. ZoL even already ships a fuse version that works fine.
You'd want a reverse-engineering lawyer, so be certain. But my (IANAL) guess is: If this is a proper reverse-engineered implementation, you could then convert _this_ implementation to C, and contribute _that_ into the kernel.
Except, it seems this is BSD-licensed, so I'm not sure how that would work in the kernel (which is GPLv2).
I assume you mean this as a joke, but I would point out that at least one of the BSD family has gone and baked lua into their kernel. Granted, lua is rather meant for that kind of thing and python isn't, but it is entertaining to point out an interpreted language that has been stuck into a unix kernel:)
But I'm surprised this is possible without a specification - how can you test a filesystem through hexdumps? The effects of some operations are going to pretty far-reaching, surely?
Does someone know whether it would be legal for someone to go through the ZFS code and write a specification of the features this author hasn’t figured out yet? I.e. could someone write a detailed description of the missing functionality that doesn’t include any details about the implementation so other people can implement it in non-CDDL code?
Original comment: I could swear this was actually the standard practice for writing an implementation of an unknown file format or interface without infringing on copyright. But I don't remember the term for it.
That's called a clean room implementation and was the standard way to make x-compatible products (like for example, the bios on an IBM PC clone). Not sure what the current legal standing of that method is.
EDIT: Ninjad because I left the reply in a tab without posting.
Reverse engineering is legal in the US, but you had better have detailed records proving no one who knew the insides of the original product ever influenced the clone. And be prepared to explain that in court.
That PDF says ”Unless otherwise licensed, use of this software is authorized pursuant to the terms of the license found at: http://developers.sun.com/berkeley_license.html”*. That link is broken, but it seems that’s Berkeley license (whatever that means for a specification, and for which variant?)
This is the greatest thing ever.
I wish I could just write code for the fun of it. Every time I wonder whether people will use it and give up before I even get started.
This was a really simple project but tickled all my fancies: Python, low-level, networking, reverse engineering, system administration.
Just do it! Who cares if people use it?
Alternatively, contribute something to some open source project you use. I’ve done that too. Just small stuff here and there but that’ll guarantee someone uses your code if that’s what’s important to you. It only takes 39 commits to get on this page:
I had this problem too. I’ve been able to get over it by from coming up with a scenario, even if it’s completely fabricated, where what I am doing can be useful. I also make sure that I incorporate something new that I want to learn in the project. Whether it’s a language, library, whatever. Then I give myself a date I can quit. Normally it’s about two months. This makes me really consider whether I want to take something on because if I do I force myself to dedicate two months of time to it. If I enjoy it still at the end of two months then I continue otherwise I move on to another idea. At least in that time I because a little better at whatever I was trying to do. That’s the real goal anyway.
Exactly! Write code that you find interesting and/or need for something and then share it. If someone uses your stuff, then great, if not, at least you've become a slightly better programmer! It's a win-win!
Absolutely, some of the most fun I've had coding was reverse engineering/implementing known protocols. Although something this big may be a little overboard :)
It's capable of doing IO against a real ZFS array without any other code. ARC is an implementation detail and not necessary for correctness. If you removed ARC from ZoL it would still work, just slower. ARC is far from the most interesting milestone for a reimplementation effort because an ARC implementation doesn't need to be anything like the Sun version internally, as long as it offers similar performance.
This project is cool not because you're going to run the Python in your kernel today, but because someone can use it as a documented reference implementation of all of the data structures and transactions that is not covered by the CDDL, so another implementation based on this can live in the Linux kernel without problem.
If the GPLv2 GRUB ZFS code[1] wasn't enough to get someone started then I doubt this will make any different in porting ZFS to GPL given there would be more work involved in turning this into a usable kernel driver.
Not taking anything away from the work that the author has done though. It's a nice project. I just think a little pragmatism is needed before we get carried away with the ZFS GPL comments.