| Well, I tested three different configurations for memory: darkriscv@75MHz cache=off 0-wait-states 2-stage pipeline 2-phase clock: 6.40us darkriscv@75MHz cache=on 3-wait-states 3-stage pipeline 1-phase clock: 9.37us darkriscv@50MHz cache=on 3-wait-states 2-stage pipeline 2-phase clock: 13.84us The first configuration works in a zero wait-state environment with separate instruction and data high speed synchronous memories working in a different clock phase (weeeeeird!). As long there are no latency, this configuration works at 75MIPS with a 2-stage pipeline, which means only one clock is lost when the pipeline is flushed by a branch. The second configuration uses a small hi-speed cache with 256 bytes for instruction and 256 bytes for data, a 3-stage pipeline, which means two clocks are lost when the pipeline is flushed by a branch and a more convencional single phase clock architecture, as well a memory with 3 wait states or something like this. Although working at 75MIPS, the cache miss and the longer pipeline decrease the performance to around 51MIPS. The third configuration is the core configuration from the first scenario, but with the small hi-speed cache from the second scenario and the 3 wait states. In this configuration, the performance decreased to 50MHz and, according to my calculations, the performance is around 34MIPS. By this way, if is possible work only with the interna FPGA memory, the first configuration is better, otherwise you can use the second configuration. I guess is possible create a fourth configuration with the 3-stage pipeline and zero wait-states (no cache), but I need implement a two-clock load instruction. In this case, I guess is possible peak around 100MHz. |