If you don't click through to read about this: you can write an FPGA image in verilog/VHDL and upload it... and then run it. To me that seems like magic.
This is so awesome, I can't even. I wrote arachne-pnr [0] to learn about FPGAs to get ready for this day. Just signed up, can't way to play with these!
I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.
Wow Clifford is that you ? I hope this, exciting as it may be, won't make you leave open fpga efforts for the dark side (saw your talk last Fosdem, was very exciting)
Cotton is the author of arachne-pnr. Clifford is the author of Yosys and IceStorm, which are all separate projects. Not the same person.
FWIW, Clifford has recently started reversing the bits of the modern Xilinx FPGA series. So, stay tuned for a Xilinx IceStorm-equivalent sometime down the road (a few years, probably...)
No, Clifford is cliffordvienna on HN. He wrote Yosys (and amazing piece of software) and did the iCE40 reverse engineering (amazing work). I wrote the place and router, arachne-pnr.
I'm very curious if/how you have managed to make the developer experience sane and enjoyable. I've experience with a FPGA cluster of ~800 FPGAs and it definitely does not get used to its full potential because of the tooling around it.
What others have said is true, and also note that Amazon bought Twitch 2 years ago, so I'm sure Amazon can run their own product announcements through Twitch if they want :)
I'm guessing it is covered under the Twitch Creative Conduct, since there is an entire Creative category now that is getting more popular which involves people painting, cosplay, digital art, etc.
So it's tied to the PCIe bus - how do you interact with your FPGA once you programmed it - are there general drivers you can use, or do you also have to create a linux driver to talk to your FPGA ?
Xilinx provide software drivers and IP for PCIe DMA and memory mapped interfaces. These are fairly easy to integrate (probably not the best for latency though - I've developed my own but I require a specific use case - low latency but don't care about bandwidth).
The magic part is the thing we have gotten used to with the cloud -- virtual hardware you never see and rent by the minute. Imagine having an FPGA idea and not needing to make board, pay for a dev board, or even find a dev board in your lab... Like your idea and need more? Spin up 100 more right now...
Exactly what I thought. This is amazing. FPGA is commonly used in embedded systems to perform application specific tasks and now application developers have access to this power too. I guess many machine learning application might take profit of that power instead of using comparatively very expensive graphics hardware.
To the best of our knowledge, state-of-the-art performance
for forward propagation of CNNs on FPGAs was achieved
by a team at Microsoft. Ovtcharov et al. have reported
a throughput of 134 images/second on the ImageNet 1K
dataset [28], which amounts to roughly 3x the throughput
of the next closest competitor, while operating at 25 W on a
Stratix V D5 [30]. This performance is projected to increase
by using top-of-the-line FPGAs, with an estimated through-
put of roughly 233 images/second while consuming roughly
the same power on an Arria 10 GX1150. This is com-
pared to high-performing GPU implementations (Caffe +
cuDNN), which achieve 500-824 images/second, while con-
suming 235 W. Interestingly, this was achieved using Micros
oft-
designed FPGA boards and servers, an experimental project
which integrates FPGAs into datacenter applications.
That's hard to compare. Typically FPGAs are doing fixed-point math, so they can do more operations with less power. GPUs have traditionally done floating point. However, with the new Pascal architecture, certain cards (P4/P40) support 8-bit integer dot products, which give a massive boost in performance/W. It's still fairly high at 250W, but that's for an entire card with 24GB of memory. You'd have to compare that to an FPGA with that much memory on a PCIe card if you're doing apples to apples. Something like this is appropriate for comparison: http://www.nallatech.com/store/fpga-accelerated-computing/pc...
This is very awesome. Could you add some more thoughts on the tooling and the development workflow? Is it possible to target the Xilinx hardware using only open source (or AWS proprietary) tools? Or is Vivado still required for advanced stuff?
Vivado is required for all advanced features and programming Xilinx chips in general; like the sibling post said, there is no open FPGA toolchain implementation for Xilinx devices, especially for extremely high end ones like the ones being offered on the F1 (I expect they'd run at like, several thousand USD per device, on top of a several thousand dollar Vivado license for all the features).
It doesn't look like there's much AWS proprietary stuff here, though we'd have to wait for the SDK to be opened properly to be sure. I imagine it's mostly just making all of the stuff prepackaged and easily consumable for usage, and maybe some extra IP Cores or something for common stuff, and lots of examples. If you're already using Vivado I imagine using the F1/Cloud won't introduce any kind of major changes to what you expect.
> I expect they'd run at like, several thousand USD per device...
You're guessing about an order of magnitude too low, actually. The VU9P FPGAs Amazon is using cost between $30,000 and $55,000 each, depending on the speed grade.
Yes, this means a fully equipped F1 instance costs nearly half a million dollars. Don't count on the instances being cheap to run.
"This AMI includes a set of developer tools that you can use in the AWS Cloud at no charge. You write your FPGA code using VHDL or Verilog and then compile, simulate, and verify it using tools from the Xilinx Vivado Design Suite (you can also use third-party simulators, higher-level language compilers, graphical programming tools, and FPGA IP libraries)."
So basically, buying a copy of Vivado is the minimum. There aren't any open source tools that directly output Xilinx FPGA bitstreams that I know of.
It looks like the FPGA Developer AMI includes Vivado and a license explicitly for use on these platforms (look at the PuTTY screenshot in the blog post; it has a customized MOTD). You just need to set up the license server that Vivado will use and point it to the right license.
So I guess the real question is: what exactly is granted by the Vivado license on these AMIs? Do we get things like SDSoC, SDAccel, etc, and all the libraries? [1] The blog seems to imply you can program these things with OpenCL too (AKA SDAccel), so I'm guessing that these features are all enabled, but details about the included Vivado license in the AMI would be nice.
I'm currently working on this. Speedup around 2x for most operations. Not kidding, quite a few startups are currently trying to optimize typical data operations with special algorithms.
You can find 'equivalents' to CPU data structures for FPGAs and speed up operations on/with them while still saving power.
There's lots of trouble with how buffers are used and memory is accessed. So it's not a trivial task, but IF you can optimize generic data structures and replace the existing ones you basically have 2x the speed or half the energy consumption for any DB.
I hope the growing popularity of FPGAs for general-purpose computing will help push the vendors to open up bitstreams and invest in open-source design tools.
[0] https://github.com/cseed/arachne-pnr