Hacker News new | ask | show | jobs
by ggcdn 2328 days ago
A slightly related question for HNers: Is there any easy tool for a non-cs guy to reverse engineer a binary file containing numbers and text in some specific format?

I have to work with some old structural analysis software. The material and element definitions come in an obscure file format ".PF3CMP". I know it contains text like the material names, and numbers/letters for the material properties.

Ultimately its my goal to be able to write these files from matlab or python, instead of using the horribly clunky user interface. But first I need to know the structure of the file, and I'm not even sure how to begin figuring that out.

[0] is what it looks like when opened in a hex editor

[0] https://imgur.com/a/jvqV3k8

7 comments

I don't know of any straightforward tools, most people I've seen reverse engineer a format do it with a hex editor and writing custom scripts. It's not directly relevant but the best I've seen is this presentation about reverse engineering the protocol used to communicate within a car: https://www.youtube.com/watch?v=KkgxFplsTnM

It uses some techniques that might be relevant, like monitoring different parts of a file as you make different changes (like accelerating or decelerating). In your case it might be possible to compare between different material definitions for example.

Ok thanks, I'll take a look. It's possible for me to generate these files for each of the various material settings so I can manually 'diff' them, simillar to what you're describing
It sounds like you might eventually be able to write a kaitai struct [0] for the resulting format which would make it fairly easy to use the format in your language of choice.

[0]: https://kaitai.io/

If there are massive differences with minor changes that can be a clue that the data is compressed or encrypted in some manner.

A good test would be if you can name/tag/comment items in the file, you can search for these strings.

I don’t think there’s compression or encryption. I can search and find the hex representation of text and values that I expect to be there. I guess I need to bite the bullet and spend some time tagging the parameters I know, then figuring out the pattern of padding that is in between.
Have you tried the 'file' command on various *nix systems (can download for Windows too)? It mightn't know this format but I think it will tell you if it finds compressed (zipped) data streams in common formats, which will be your first step since many files have some form of compression.

I'll also echo the other comment about reverse engineering the reading functions. Some formats only include certain structures if necessary so even if you have a lot of files you might be missing some example data to complete the picture.

Depending on how weird the format is, it might be more efficient to reverse-engineer the file-reading routines of that program which can work with these files.
Thanks but this sounds... above my level of computer competence
Something like this may help: https://ide.kaitai.io/, but I've found it a bit overwhelming.

It might be easiest to just start writing a utility that parses it, first making guesses and then refining as you generate and test more files like you mentioned in another reply. You already know what the magic bytes are at the start of the file - PF3CMP.

The Linux tool “od” might help you here. The -c flag will print ASCII characters.

You can get it with WSL on Windows, or even just install git and you’ll get git-bash for another easy option.

If its helpful in any way, lots of tool specific file formats like that are basically C structs dumped to a file, then loaded when the file is loaded.
related possibly? what domain is this file from?

https://techdocs.broadcom.com/content/broadcom/techdocs/us/e...

thanks but sadly not, its from a structural analysis program called PERFORM-3D.

I've contacted the developer but they will not release the format of the files to me.