If, for fun, I wanted to train an ML model on a ton of CPU instructions (which each predicted state/label being the state of the registers), does anyone have any clue how to gather that kind of data?
QEMU isn’t cycle-accurate, but would be a good start (and probably good enough). Just run some benchmarks and whatnot there, and use a tracing tool like Cannoli to capture instructions.
If you need real instructions (without an emulator like qemu doing its own translation and messing up timing), you could use a simulator like Gem5. That’s a bit more work and a lot more compute per simulated instruction.
If you need real instructions (without an emulator like qemu doing its own translation and messing up timing), you could use a simulator like Gem5. That’s a bit more work and a lot more compute per simulated instruction.