| The approach Ryan used was to look at ObjC classes, but you're right that many SDKs and tools don't show up there. We actually scan millions of apps from the app stores. Here's a public view of the top 500 in US, and if you choose the cross-platform category, you can see apps like this. For example, "Pac-Man 256" is rank 28 (Unity), Amazon is rank 30 (Cordova), etc. https://sourcedna.com/stats/ It's actually very difficult to accurately match code written in so many different languages as compilers discard a lot of the info you need. I spent a lot of time researching and evaluating different ways to fingerprint libraries, as well as reconstruct the boundaries of internal modules when there weren't any symbols. We match code by using a similarity search across all components we've ever seen. Since code written in C can be compiled to x86 or ARM, we disassemble the code into an intermediate language. Then we reconstruct control-flow graphs, data dependencies, and other platform-independent features. We index these in a custom search engine, which allows quick lookup and matching. It's very difficult, but ultimately a really fun problem to solve. Most of our engineers got started with exactly the exercise Ryan did here. :-) |