|
|
|
|
|
by derefr
705 days ago
|
|
> It's just a mess, hence why I sort of gave up at some point and for some "esoteric" syscall I just hardcode them. Presuming you don't want to keep doing this forever, but would rather do insane amounts of up-front work if it would enable you to never have to touch this again: 1. Have you considered writing some code that takes a configured + built kernel source tree; finds the intermediate build artifacts pertaining to the code unit that contains the syscall handler; and parses those? And then taking the resulting IR data-structure / AST / whatever, and doing some symbolic interpretation of it — to enable you to essentially do an xpath-like expression match on "does something specific with a concrete syscall number that isn't already in the known set for the arch"? AFAICT you could generate your own syscall table from that, and it would be exhaustive. 2. Have you considered dropping a little bit of driver-program code into the kernel source tree, that just "does syscall handling according to the passed-in paralemeters" — i.e. where the artifact built from compiling this file, would be an EFI-app pseudo-unikernel that naively pretends all kernel services were already initialized (they weren't); would do one syscall operation, calling directly into the syscall handler; and then would immediately halt afterward — and then feeding the resulting "executable" to https://github.com/google/AFL ? |
|
> finds the intermediate build artifacts pertaining to the code unit that contains the syscall handler
Hmm I think this is unneeded, vmlinux already has all the code. Also things move around too much across kernel versions and archs so can't easily pinpoint which object files to choose. Additionally, you would need an entire built kernel source tree, which is a lot more than simply a built vmlinux plus an optional non-built kernel source dir (that is what I use right now). Just as an example: currently I have some 600 kernel images with debug info that I keep for reference, which requires around 76 Gigabytes of space on my disk. Having 600 built kernel trees would require a lot more space, in the order of Terabytes.
> taking the resulting IR data-structure / AST / whatever, and doing some symbolic interpretation of it
I have been thinking about this a lot. I do a simplified version of this for x86 >= v6.9 because the syscall table was removed and turned into a giant switch case, which I symbolically emulate to extract syscall numbers, but that's pretty simple and definitely not an exhaustive analysis (some other stuff could be happening before reaching the handler). The problem is that this kind of solution is very hard to implement and I think would be way too slow on a general case. There also aren't even decent symbolic execution engines to do this for some archs. You are right when you say "insane amounts of up-front work" - that is definitely too much for me for a hobby project like this :').
The first main problem however is that all of this starts from the assumption that you already have built a kernel with all the syscalls available. This is not the case unless you meticulously configure it accordingly, which is not so simple and requires constant manual (sigh) updates to the build configuration each kernel release. There isn't a way to e.g. pretend that "all kernel services were already initialized" as you say in point #2. If a kernel is built w/o a certain syscall, the code will simply not be there. Kernel configuration remains a problem also for your point #1. The only real solution I see would be submitting kernel patch to add a target in the root Makefile that enables all syscalls with their related configs, and hope kernel devs like it (doubt it).