Faster, potentially cheaper, and more expensive to produce?
The history of the bitcoin miner has details and us a real-world example of software on x86 ASIC -> FPGA -> Custom ASIC process. It's easy to find the relative performance of the bitcoin miner running on everything from Rasberry PI's to CUDA clusters [1].
Note that the article is using the very flexible DE2-115 and there's lots of interesting trade-offs made to fit a bitcoin miner in only 115,000 gates... iirc, if you have 250k gates, it can run 4x (???) faster due to optimizations during synthesis.