Currently working on an accurate model of the MIT CADR in VHDL, and merging the various System source trees into one that should work for Lambda, and CADR.
Maybe try replacing the ALU with one written directly in Verilog, I suspect this will run a lot faster than building it up from 74181+74182 components.
The current state is _very_ fast in simulation to the point where it is uninteresting (there are other things to figure out) to write something as a behavioral model of the '181/'182.
~100 microcode instructions takes about 0.1 seconds to run.
I was thinking more of a behavioral model of the whole ALU, just so that the FPGA tools can map it onto a collection of the smaller ALUs built into each slice.
What clock speed does your latest design synthesize at?
Currently working on an accurate model of the MIT CADR in VHDL, and merging the various System source trees into one that should work for Lambda, and CADR.