you dont port it you wrap it.
you can put anything in an rl environment. usually emulators are done with bizhawk, and some lua. worst case theres ffi or screen capture.
well thats the golden issue with rl, sample efficiency. it is env bounded, so you want an architecture that extracts the max possible information from each collected sample, avoiding catastrophic forgetting, prioritizing samples according to relevance