We believe that Rust supports all the ingredients necessary to implement the techniques in this paper: it has a strong static side, and its traits system supports zero-cost abstractions which can be provably eliminated by the compiler. It has a great pointer aliasing model, a suitable mid-level IR, a vibrant and engaging community, and a great open language evolution process.
A concern with using Rust is that a strong goal of this project is to appeal to the entire TensorFlow community, which is currently pervasively Python based. We love Rust, but it has a steep learning curve that may exclude data scientists and other non-expert programmers who frequently use TensorFlow. The ownership model is really great, but mostly irrelevant to the problems faced by today’s machine learning code implemented in Python.
As I pointed out in two lengthy comments on day one[1][2], that reasoning is nonsense. If Chris wants to use the language he created in this new endeavor for machine learning simply because he made it, that's totally fine and completely his prerogative, but he should just say so, rather than trying (and failing) to convince people that other languages aren't better suited for this task.
From my point of view, a weak justification is worse than no justification in cases like this.
Rust is much better suited to this task than Swift from a technical point of view. The far superior platform support for Windows and Linux is ample reasoning to say Rust is better suited for this task, since very few data scientists will be training models on macOS. However, that's only one of several areas where Swift has shortcomings for a project like this. Swift is great for iOS and macOS development, of course, since it was designed for that. I don't think Swift is a bad language by any means, and with enough effort, it can be reshaped to be good for Tensorflow... the GitHub document just provides zero useful justification for the work required to make it good for Tensorflow.
EDIT: to some of the replies talking about Rust's learning curve, that mostly applies when you start trying to design efficient, interlinked data structures involving ownership. For most applications of machine learning, this simply wouldn't be a problem. The library would provide the data structures, you just have to use them. Rust can provide simple interfaces to complicated things.[3] The compiler's error messages are usually incredibly helpful.
The learning curve of Rust should not be relevant here, compared to Swift, which is also full of idiosyncrasies. Swift and Rust both have a large learning curve for someone coming from Python. This is because they're statically typed languages that are just different from a scripting language. For an application like this, I would say those learning curves are roughly equal at the language level, but as I pointed out in my comments, Swift has an enormous learning curve of requiring many data scientists to either install and learn Linux, or throw out their current computer, buy a Mac, and learn macOS.
My point here is not that Rust is the most suitable language for Tensorflow (although it could be), but rather I'm making the point that Rust is more suitable than Swift for a project like this, and therefore this document is just annoying. It would be better for them to delete this document and just say "we're using Swift because our team has a lot of experience with it and because the creator of Swift is leading this project, so we would lack enthusiasm and momentum if we were using something else, even if it were more suitable."
Julia would be really interesting to see explored further, since it would appeal much better to many existing data scientists who would be transitioning from Python. The times that I've played with Julia, I was amazed at how slow the JIT is for even tiny scripts. LLVM is powerful stuff, but it is painfully slow at everything. It would be nice if Julia offered an alternative backend for rapid development.
I personally find Rust to have quite a learning curve (which I guess is also an opinion shared by others). The language is great though.
I do agree with your criticism of the document here, though. It feels very much like Swift happens to check many boxes, but the lack of Windows support is baffling. It's simply table stakes to be able to run, fully supported, on Windows, macOS, and major Linux distributions. That should be the very first thing anyone considers.
But beyond that, I think even with Rust's macro system it could be difficult to make it work for Tensorflow in a way that feels appropriate for Rust programmers _and_ for TensorFlow. This was explored in F# for Tensorflow research[0] and a completely different approach[1] was taken because making a type system suitable for tensorflow got too unweildy.
> But beyond that, I think even with Rust's macro system it could be difficult to make it work for Tensorflow in a way that feels appropriate for Rust programmers _and_ for TensorFlow.
It seems likely that the justification is retrofitted and team's familiarity with Swift was the bigger driver.
I am surprised they didn't find Scala to be a good fit given that it has already been used with great success in Spark which I presume has similar technical requirements. Anyone can throw light on the short explanation below? Does it really apply to Scala?
"Java / C# / Scala (and other OOP languages with pervasive dynamic dispatch): These languages share most of the static analysis problems as Python: their primary abstraction features (classes and interfaces) are built on highly dynamic constructs, which means that static analysis of Tensor operations depends on "best effort" techniques like alias analysis and class hierarchy analysis. Further, because they are pervasively reference-based, it is difficult to reliably disambiguate pointer aliases."
More to the point static typing is just not that important for data scientists. Arguably it's not that important for backends devs either (e.g. lisp, erlang).
Having done user research on this by speaking to data scientists, I can say that static typing is desired by a nonzero number of who practice what we would consider to be data science and machine learning. Much like how TypeScript is seen as a revelation to hordes of JavaScript programmers who have never used static types before, the ability to get some level of correctness verification at design-time matters.
The more time I spend with strongly typed languages the more I am convinced it is the right way to go. For modern languages with good type inference, and good tools for protocols/interfaces not tied to an inheritance hierarchy, it is a at worst minor inconvenience for a huge benefit.
> I can say that static typing is desired by a nonzero number of who practice what we would consider to be data science and machine learning
Who would trade static typing with fast prototyping any time.
Data science is a really nebulous term covering many drastically different domains of CS. Many DS I talked with, don't really produce code, they do coding to produce analysis, which is the actual delivery. For them, code is ad-hoc and disposable, created on demand and left in the dust until rediscovered when mission comes.
Some of the code do survive and enter production stage, I guess that is where they would seek some assurance from static typing. But I do think they could learn to mitigate most of pain if they can commit themselves to write some unit-tests/functional tests, yet such awareness is rare among the DSs I know and worked with.
So all in all, yes static typing MIGHT help, in some way, but I don't think it addresses the underlying pain point as much.
it's fairly well accepted that rust has a high learning curve and their targeted users are not software engineers, so I wouldn't say their point is nonsense
> If Chris wants to use the language he created in this new endeavor for machine learning simply because he made it, that's totally fine and completely his prerogative, but he should just say so, rather than trying (and failing) to convince people that other languages aren't better suited for this task.
Do you have any insider knowledge that Chris Lattner had the unilateral power to choose Swift for this project? I would imagine with the importance of TensorFlow at Google, the decision to go in this direction had to be agreed on by a number of people.
> The learning curve of Rust should not be relevant here, compared to Swift, which is also full of idiosyncrasies. Swift and Rust both have a large learning curve for someone coming from Python.
How exactly would Rust-Python interoperability work? Swift for Tensor Flow allows any python library to be called like a native library in Swift. Could you do that in Rust?
> I wonder if Swift could be replaced with Rust for iOS development?
If you like the pain of using a non supported language without all the XCode, UIBuilder, CoreData, Instruments, Metal Shaders debugging,... goodies then yes.
Chris Latner is the driving technical force behind the project and he wrote Swift. So they were able to fix any issues with Swift so the trade study was “unfair” in that regards.
We believe that Rust supports all the ingredients necessary to implement the techniques in this paper: it has a strong static side, and its traits system supports zero-cost abstractions which can be provably eliminated by the compiler. It has a great pointer aliasing model, a suitable mid-level IR, a vibrant and engaging community, and a great open language evolution process.
A concern with using Rust is that a strong goal of this project is to appeal to the entire TensorFlow community, which is currently pervasively Python based. We love Rust, but it has a steep learning curve that may exclude data scientists and other non-expert programmers who frequently use TensorFlow. The ownership model is really great, but mostly irrelevant to the problems faced by today’s machine learning code implemented in Python.