Hacker News new | ask | show | jobs
by Jasper_ 2247 days ago
I've reverse engineered a number of content encryption schemes. It's always a ton of fun, and you get to see the large amount of psychological warfare at play at the higher tiers.

A very common trick that I've seen in a lot of Japanese games, for offline material, is to combine a hashing system and encryption. That is, the game will attempt to load "main.script", which is a custom bytecode scripting language. The file stored on disk would have the filename of a SHA1 hash of "main.script", but the contents would be encrypted with a private key like "tprics.niam". "main.script" then loads a number of other files using its scripting system, so it's a very annoying process to take the whole thing apart, as you need to hunt down the original filename through the scripting system. Either that or you guess at filenames.

You tend to see some really high-level effort put into systems, like the one game I took apart that had its own custom scripting language with classes and coroutines.

https://gist.github.com/magcius/bff948b13128b70695e3841e2084...

One game I found had a custom bytecode system that drove me nuts for weeks. The opcodes were specifically picked so that a large number of the popular ones were reflections of each other in dec, hex and binary. So you'd go "I've seen opcode 0x0353 before", but alas, you had actually seen opcode decimal 353. Similarly, there were opcodes 101 and 0x101 and 0b101 and they all did slightly different things. You think you could stick to hex, but there's enough slop in the process and your brain is so used to pattern-matching that it was pretty effective.

4 comments

Indeed, it's often more fun than playing the game itself; but then again, I'm someone who has been taking things apart since I was very young (and not surprisingly, got into trouble a few times for it...) I suspect RE is something that's closer to what those in the other sciences do, i.e. analysis and thinking more about how/why things are the way they are, rather than what they can build; which is why not a lot of developers (who almost always build, except when they have to debug) seem to have any interest/skills of RE much if at all.
I've previously described physicists as reverse engineers of the universe (and biologists as reverse engineers of nature) :-)

They really are much closer disciplines than most people might imagine.

Just trying to read a mathematical paper is like RE. You could take the approach that you try to follow every step and pause until you have the full specification, but that would take months. And more importantly, it's unlikely that you are willing or able to just do that one paper over the course of 6 months.

Or, like people do in RE, you fall back to things you know (OP knew hex, decimal and binary before he encountered the reflected codes he talks about) and you try to force the paper through your personal veil. I guess when people reverse engineer hardware you follow the routes you took (maybe taking months) the first time you took something apart.

Usually, it makes sense to do so because the reason why you read the paper in the first place is because you think that it has some connection with your own work.

In Quantum Mechanics circles, many authors have different mathematical backgrounds, so just translating what they are doing and thinking is already RE. A good example of this is logical semantics: There are countless flavours of how to write logic down, each with their own symbols and motivations. I would prefer if any logic that you end up with is the internal logic of a category, but analogously this would be like Apple forcing everyone to use their hardware connector pins.

A paper usually does have a path that is chosen by the author, but the RE component is inevitable if you want the paper to be in context with your own reasearch. Otherwise I guess it would be more like a class or university module, where you are following along, but you don't really have an intention of building on the subject matter in your own time. Science also has the disadvantage to newcomers that you don't know how much has already been done, and hence you are forced to have endless lectures to just bring you up to speed.

The real reason is that open source software completely eliminates the need for reverse engineering. RE only makes sense when you don't have access to the source code. Doing things like private server emulation for an MMORPG might sound cool but the reality is that half the game content exists server side. You're not able to invest enough resources to rebuild the original experience and even if you could why not spend those resources on your own game? The demand for reverse engineering is pretty small.
There's many MMOs that now officially only exist as memory, either shut down or altered in such a way that they aren't the same games anymore. Some people don't want to make their own game, they want to play the game they used to with friends/family years ago that is now unavailable. The amount of community effort put into some private servers is impressive. I wouldn't spend my own time doing it, but I'm glad there are people that do.
I just find that more disappointing because the copyright holder is positioned to be able to open-source or public-domain-dedicate the material but they don't for various reasons and it rots on a backup drive somewhere or gets lost, and now unnecessary duplication of effort must occur to recreate it.

Often it seems rather paradoxical in nature because the main reason they don't open-source it is that they're waiting for a time in the future when the demand rises and it will be worth something again, and yet the increased demand only happens because of the efforts of reverse engineers keeping the community alive. It's almost always impossible for the reverse engineers to legally get paid for this work too. The only hopes for that is either for the copyright holders to raise enough money and decide to hire them, or to do an anonymous patreon and hope it doesn't attract the wrong kind of attention.

Indeed, and conversely that's why to a researcher like me RE feels like the only branch of coding that actually achieves something (emotionally; intelectually I know of course that's not true).

One of the reasons I was attracted to computers when I was a kid was figuring out Windows secrets.

That comparison was really insightful to me
I'm reminded of the 'xcodes' bytecode-interpreter/virtual-machine used in the original Xbox, as part of its security model.

See [0] and page 25 of [1]

[0] (PDF) https://cs.oberlin.edu/~ctaylor/classes/341F2012/xbox.pdf

[1] (PDF) https://events.ccc.de/congress/2005/fahrplan/attachments/674...

Which parts are the coroutines?
Multiple methods in that snippet use something like:

    while(true) {
        Animate(image);
        wait 200;
    }
implies there are multiple synchronous execution stacks. It's not threading, as the synchronization points are explicit.
>synchronization points are explicit

Hmmm, I'd have just called it cooperative threading/multitasking then. Or do you think that would be wrong?

It wouldn't. That's literally what coroutines are. In-process, lightweight, cooperative multitasking, built in as a language feature (or at least close enough to it).
It is not. Coroutines are a much more fundamental feature. They can be used to implement cooperative multitasking, and since that is such a common use case, they have gotten confused with the concept of cooperative multitasking.

Actual coroutines are a lot more flexible and interesting than just cooperative multitasking, though.

Erm, I'm pretty sure I can implement coroutines using cooperative multitasking. Also, the other way around. So I'd say they're equivalent in that sense.
Coroutines and cooperative threading/multitasking are indeed threading. I assume they meant to say that it was neither preemptive nor parallel, which makes it somewhat limited. Some people think a system has to be one or both of those to be called threading.
For me, and for a vast majority of people I'd say, the word "threading" in common parlance implies pre-emptive scheduling with implicit synchronization points.
cooperative threading is just coroutines plus a scheduler.
>a lot of Japanese games

What kind of games are we talking about? Computer games or mobile ones?

Some Japanese arcade games are known to have byzantine content protection schemes meant to prevent cabinet bootlegging. There's been a lot written about Capcom's CPS-2 and CPS-3 systems in particular.