Hacker News new | ask | show | jobs
Ask HN: What's the prerequisite to become an exploit developer?
10 points by Qrius 3307 days ago
I want to learn reverse engineering (RE) and exploit development.

There are great resources for both of them (like MBE http://security.cs.rpi.edu/courses/binexp-spring2015/ and RE https://github.com/fdivrp/awesome-reversing).

The only problem is that there is hardly any article which actual lays out a path for a complete beginner. I want to understand what are the ACTUALLY NECESSARY topics required and in RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics so that the knowledge aqcuired is more practical in context of current vulnerabilities rather than being more theoretical.

Something like programming fundamentals>python>C>Assembly>Computer Organization>Windows Internals>Reversing>Fuzzing etc. (its only an example, please teach me the correct order)

Actually there's an article http://www.myne-us.com/2010/08/from-0x90-to-0x4c454554-journey-into.html but it is quite outdated.

Your opinions along with updated resource links and views on them will be greatly appreciated. Please, only focus only on topics that are actually necessary because RE and Exploit development are vast topics in themselve (for eg: Do I actually need to learn computer architecture or organization? AFAIK those circuits and architecture specific thing is more helpful for a electrical or electronics engineer rathen a RE. We need to learn the intricacies of a particular architecture like x86 or ARM but in my opinion those circuits won't help any RE in his daily reversing schedule, so please foucs on the only required part like one I know: OS fundamentals.

2 comments

EDIT: Please also describe your best practices and things you learned from your mistakes.
[Part 1 of 2]

I'm sorry this didn't get more responses, its a worthy question.

First, I've tackled this topic a few times: https://www.reddit.com/r/AskNetsec/comments/5i73db/path_to_e... is probably the most concise post of mine but I'll give a more complete (opinionated) guide below

> There are great resources for both of them (like MBE http://security.cs.rpi.edu/courses/binexp-spring2015/ and RE https://github.com/fdivrp/awesome-reversing).

I like the RPI course but without lectures it feels like there are too many gaps for it to be something I actually recommend to people

> Actually there's an article http://www.myne-us.com/2010/08/from-0x90-to-0x4c454554-journ... but it is quite outdated.

Don't worry about content being dated, you will not find any good resources that cover right up to modern exploits what you will find are resources that can be used to train you up in the necessary foundations for you to understand modern exlpoits with a bit of your own leg work. The "From 0x90 to 0x4c454554" article actually looks like a very good collection of resources. Dates resources are still valuable as modern exploits are still doing the same thing (trying to get control of the IP register) its just now there are mitigations in place that require extra steps or to follow certain restrictions to be successful, but if you don't understand the foundation you can't learn the modern stuff.

> I want to understand what are the ACTUALLY NECESSARY topics required and in RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics

The problem with that is that that time wasting and learning other topics that are not immediately useful are immensely beneficial on a whole. Honestly, a big part of exploit development is spending hours researching dead ends...seriously that's a big part of it. It requires a wide breadth of knowledge not just a bunch of tips and tricks specific to exploit development. In order to craft sophisticated exploits against a system you need to understand that system, how its built and how it works. This comes from that wandering research, it may not be immediately valuable but doing it often leads to building up a set of topics that you are deeply knowledgeable in. And, you you can draw from that knowledge in exploiting certain interactions later on. That wandering, beating your head against a wall, getting stuck not knowing what to do is part of exploit development. Trying to avoid it honestly doesn't sound like a good idea, you'll end up knowing a bunch of tricks but lacking the background to apply them creatively.

None the less, the first thing you need is some development background. If you want to break software start by learning how software is built.

So, I recommend starting by learning C. First C is the lingua franca of exploit development and reverse engineering. If you understand C you can break any high level software down into its 'C parts' and from that C you can determine what the machine code probably looks like. C is the perfect middle ground, thinking in machine code or assembly is too tedious and languages like Python or Java are too high level to capture whats going on at the CPU level. Software exploits ultimately are about controlling the IP register on the CPU that is why being able to understand the whole stack is important.

Start with C, but learn a higher level scripting language, something that is used for quick jobs, prototyping, etc. While C is nice it also requires a lot of lines to do some tasks that are very simple in higher level languages (string manipulation and network communication for example). So, its useful to know a scripting language to actually do quick jobs in. Python and Ruby are the two common choices right now, historically Perl was the most common choice. Now everyone I work with knows Python, but Ruby has its place too.

If you struggle to learn C, you might find it useful to start with a scripting language which are often said to be better for learning to program with. Maybe it is but I tend to side with Evan Miller who writes: http://www.evanmiller.org/you-cant-dig-upwards.html in that C is a better place to start even if the results are not as immediate. Still, if C isn't working for you start with the scripting language then learn C.

Further, once you know C and a scripting language, learn one of the modern work-horses, Java or C# (I lean towards Java). Most developers will know one or both of the languages and getting an understanding of Object Oriented Programming is useful to understanding how software is built. You find OOP design patterns in most software.

Once you know a native language (C, C++) an intermediate language (Java, C#, etc) and a scripting language (Python, Ruby, Perl, etc) you'll be able to work with the majority of code-bases out there even if you don't know the specific language in use you'll understand the core concepts that underline most languages. There are some other paradigms seen in less common languages like prolog (logic programming) or haskell (functional programming) that you won't understand but those three languages cover practically all modern software you'll encounter.

Recommended Books: you don't need to read all of these but its a selection you might find useful. Also, I am a fan of the Head First series which usually are not very thorough and some find the writing to be too casual and annoying so YMMV.

1. Head First C, covers the basics of C in an easily digested format. Some I've recommended it to have found its section on pointers useful.

2. The C Programming Language, the Bible of the C language, you'll pick up some bad habits from it, and its dated but its a concise overview of C.

3. Violent Python, Its a bit steep of a learning curve but its teaching Python with a security focus so it fits in very well.

4. Head First Programming - If you struggle with the basics of coding, it might be worth checking this book out.

5. Dive Into Python - http://www.diveintopython.net

6. Design Patterns: Elements of Reusable Object-Oriented Software - Once you understand the programming its time to learn the architecture of software, this is the classic book on the topic. 7. Head First Design Patterns - if the last book was a bit too steep for you the Head First book might be a bit easier.

8. Beej's Guide to Network Programming - You need to understand sockets, and socket programming, no exceptions and for C this is the best guide available.

While you're learning to code you should also be trying to get some experience working with Linux. Even if Windows is your daily driver (as it is mine) Linux is inescapable you'll rarely come across a network that doesn't use Linux anything (and similarly, you'll rarely come across a corporate network without some Windows servers too, so don't ignore Windows server). I don't have any book recommendations on this, just setup your own Linux box learn the different package managers and how to do what you need from the terminal.

I do recommend working through the following to get some general experience.

1. http://overthewire.org/wargames/leviathan/ - You'll learn about Linux with this one, and it doesn't require any programming ability so its a fair starting place.

2. http://overthewire.org/wargames/bandit/ - Bandit is a easy server/wargame to start with, and it'll teach you a bit about Linux and require a bit of coding but there is nothing that requires technical exploit development knowledge. 3. https://exploit-exercises.com/nebula/ - Nebula is a little more difficult than Bandit but still non-technical and imo a little more fun.

Working through these give you some comfort working in a terminal and some experience getting the attacker's mindset and breaking things. You can do these while learning the stuff above.

You may also want to get some practice programming with challenges such as those at:

1. ProjectEuler.net - These often require some mathematical insight but they're fun so I'm including it

2. https://leetcode.com - This is used by some developers to practice before an interview, various levels so it'll work just for getting some experience with programming.

Now that you've got a foundation in software development we can move onto exploit-development. Well not quite first you need to bridge your knowledge from building software to breaking it. For this you need to start learning about how software works at a lower level, the CPU level. There is a good book: Computer Organization and Design. Its solid but its also a textbook and covers a lot more detail than you need (though still valuable to know). It covers MIPS and how the CPU works, you probably can skip stuff about the hardware and microcode though. MIPS is a simpler assembly language than intel's stuff so its a nice starting place though.

Intel is what you'll most often encounter though so you do need to learn x86 and x86_64.

OpenSecurityTraining.info provides a number of courses that are valuable for this bridge to breaking software.

1. Life of Binaries - http://opensecuritytraining.info/LifeOfBinaries.html - This helps you go from understanding to software to understanding the system around the software and the context in which software runs. 2. Introductory Intel x86 - http://opensecuritytraining.info/IntroX86.html - Really boring/dry class on x86 instructions but gives you the introduction you need.

3. Introductory Intel x86-64 - http://opensecuritytraining.info/Intr oX86-64.html - Just slides this time, good to review and get a sense of the differences between 32bit and 64bit intel assembly.

Once you've gotten the basics its finally time to move onto learning the actual exploit development skills.

1. Introduction to Software Exploits - http://opensecuritytraining.info/Exploits1.html - In my opinion this is simply the best resource out there to learn the basics. It uses the book "The Shellcoder's Handbook" as its textbook and I completely recommend the book.

2. Hacking: The Art of Exploitation - This is the most often recommended book, its great and has a much better introduction than the Shellcoder's Handbook but if you can make it through the course above without probably you can probably skip this book as the two resources covers more content but this book is one of the best introductions available. 3. Corelan's Exploit Development Tutorial Series - https://www.corelan.be/index.php/2009/07/19/exploit-writing-... - It'll start in familiar territory but it'll get into some new stuff, overall a good series. 4. Exploitation in the Windows Environment - http://opensecuritytraining.info/Exploits2.html - You'll find some overlap with Corelan's tutorial series and this course so you might want to take this course and reference the tutorials as you go.

5. A Bug Hunter's Diary - Excellent book that covers some similar topics as the previous resources but spends a bit more time on actually finding vulnerabilities not just exploiting them and goes into more mitigations than the previous resources also, skip the stuff you already know. 6.

While learning all this exploit development stuff, there is another necessary skill to actually finding vulnerabilities: reverse engineering.

There are two books that I frequently recommend on the topic:

1. Reversing: Secrets of Reverse Engineering - this is the most popular recommendation and its a great resource to work through.

2. Practical Reverse Engineering - This is a new comer (2014) but I quite like it. It isn't as 'complete' as Reversing is but it covers a wider rage of topics that I find more useful.

3. (Bonus) Malware Analyst's Cookbook - Malware Analysis is probably the most RE heavy field you can be in so this is a solid book on the topic. Just because of its name I didn't give it a fair chance when I was reviewing books to recommend but I did review it recently and do want to give it a plug and it has a lot of practical information and labs to work on.

By this point you should have a reasonably solid foundation and a good understanding of exploitation. You will not be up to writing the latest browser 0day but you'll have the foundation necessary to understand (and learn from) modern sophisticated exploits so you can find and development them yourself. There are no resources to fill in the final gap but to go out do your research on a system and apply what you've learned to find some way to break them and development that weakness into an exploit.

To get experience, there are a few resources I can recommend:

1. Exploit-Exercises, I already mentioned Nebula, Protostar should be accessible to you once you've done the first Software Exploits course, and Fusion after the second one.

2. Over the Wire, I've already mentioned a couple of their servers, check out the rest of them.

3. Pwnable.kr - Challenges are at various levels use the harder ones to challenge yourself.

4. Capture-The-Flag competitions - every year several CTFs are run, sign up and play in them. What is nice about CTFs is that they are bite-sized challenges, still difficult, still involving modern techniques (the ones worth the most points atleast) but not tedious and they don't require a big time investment to find a weakness in. The focus of the challenge is on the exploit development rather than on finding vulnerabilities. 5. CVE lists - find software that interests you, find a known vulnerability and try to build your own exploit in it.

6. Real world software, go and break something of interest to you, learn how it works, find a vuln and exploit it.

You may need to learn a new language, or research some new techniques to handle some mitigations, but you should have the foundation necessary to figure out what you don't know and how to learn what you need.

...and with all this content I never even touched on breaking web applications, so I must atleast give mention to "The Web-Application Hackers Handbook" cover that book, practice against any of the many vulnerable meant to be hacked web-apps out there (Damn Vulnerable Web App, OWASP Mutillidae 2, HackThisSite, HellboundHackers, Enigma-Group, HackThis.co.uk, etc, etc)

Good Luck!

I'm extremely grateful and was not at all expecting such an explanation.

I wanna exlpain few things.

Let me rephrase what I meant by "minimize the time wasting". You see there are lot of great advice available online. You ask something on a subreddit or here and then people will share great resources. I love this and this kind of learning. My concern is that sometimes these resources and advice is given along the lines of "although its not completely necessary, it'll still be an experience in itself".

The problem here is that such kind of learning sometime waste too much of time and leave you with confusion. People daily ask so many questions on CompSci and you'll find books starting from complete basics of computer like Code https://www.amazon.com/dp/0735611319, Nand2tetris course http://www.nand2tetris.com etc to something very sophisticated like AI. I hope you can understand that if a person spends too much time on these kinda things given that he's got a job or he's student in university with a sweet CompSci curriculum (you know what I mean) then its a problem. Although the above mentioned resources are exceptional there are others too which teaches the same thing. Can a person read all of them one by one "just to satisfy his curiosity and thinking that it'll help him in future"?

RE is already an extremely sophisticated and vast field which requires computer mastery. I'm in college and it has made me hate things I loved. I'm extremely curious guy and can spend 10-20 hours in front of PC easily. I've ~6 years of experience with linux. Now I'm literally not in a state to read 2-3 400-800 page books on a single topic which I don't even know would be required in RE. There are some topics which are quite difficult but at least if we have an idea that it IS mandatory for RE then you can be sure and refer other resources. If you don't even know what's your syllabus how can one concentrate and master it let alone learning. RE requires you to study every minute details or computer system but wasting too much of time on those horrible digital logics and design is really not worth it.

So My purpose is to make it completely clear what I actually need to know so that I can focus on it instead of reading each and every topic in complete detail thinking that if I'll miss the direction of even a single electron in I/O I won't be able to do efficient reversing. I'm literally fed up of those architecture diagrams with arrows and cramming those definitions ROM, EEROM, EEPROM.............. again and again for tests and assignments.

I've few questions for you:

You mentioned Computer Organization and Design which I think is authored by Patterson and Hennessy which is used by almost all Universities. I'm just curious about its not so good looking amazon reviews. Also what's your opinion on Tanenbaum's books which you've mentioned in that reddit link.

Now let's summarize what I've understood (PLEASE help me correct if I'm wrong)

>>>> UNDERSTANDING the system you want to hack

> Learn the most used fundamental programmming languages. (the way we TALK with computers) 1. C (also C++ in some cases) 2. Python or Ruby (given its dominance in industry right now thanks to its productive nature, also being used exploit writing) 3. Java or C# (object oriented programming which along with above languaged completes our programming fundamentals) 4. Assembly (obviously needed in RE) I think it need not be mentioned that we need to have good grasp of Data Structures and Algorithms with above languages (obviously not all)

> Understand each and every data flow and HOW a computer system work

Computer Organization and Design and Architecture

(OS fundamentals, memory management, virtual memory, paging, caching etc, Linux(macOS too) and Windows internals part I think comes here)

You restored my faith in humanity when you said I can skip the hardware and microcode part (please explain what specific topics, I swear I won't look at them again until I'm done with required topics.)

> Network Fundamentals and Programming Basics of http, TCP/IP and other protocols.... Socket programming

>>>> THE HACKING PART

> Learning WHAT loopholes are there in this above process of data read write Types of attacks (buffer overflows, heap overflows....)

> HOW those loopholes are exploited

>Reverse Engineering (Learning tools of trade: IDA, gdb.....) learning and practising reversing. Fuzzing

>Exploiting the bugs making exploits.

Please review and correct. Thanks again.

Shameless self-promotion. I have a YouTube channel where I basically try to offer a path for learning exploitation. I'm done covering all the basics, and we will soon move to more advanced stuff. I have videos on various different security topics, but here is the probably more relevant playlist: https://www.youtube.com/playlist?list=PLhixgUqwRTjxglIswKp9m...
I know your channel very well. Its praised everywhere because of such good content. I will be happy if you go through my main concern in the details and read the above discussion. Thanks again for such a wonderful channel. I'll surely learn from it when I'll cover the prereqs to understand what you're saying in those videos.
> I want to understand what are the ACTUALLY NECESSARY topics required and in RIGHT ORDER to MINIMIZE the TIME WASTING and wandering in between topics so that the knowledge aqcuired is more practical in context of current vulnerabilities rather than being more theoretical.

To be honest with you? I consider that sentence almost offensive. I hear you, but I think you have absolutely wrong expectations. You want to learn something that is not a profession like plumber where a really good expert can teach you everything you need to know with all the little tricks learned over the years. The field is sooo huge diverse and complicated that this won't work. And I think my playlist offers a rough outline that you can follow, but without going down rabbit holes left and right, and getting stuck many many times, you wont become good at it.

I understand the frustration that you don't want to "waste time" and that you are busy already. But everybody I know who is good in this field, including my own experience shows me, that nobody learns this stuff through a straight path. And everybody knows that most of the time will be spent chasing rabbits through a labyrinth and getting stuck.

Also there is no clear path. It's a complicated web you have to learn to traverse. For example like "Learn C" - what the f* does that even mean? To what extend? Hello World? Drivers? Or Operating System? "Learn assembler" - which assembler? have you looked into the Intel Instruction spec once? I doubt any human knows every instruction. Also who said that intel is the way to go, why not ARM or AVR. All of these fields offer a lifetime of studying in itself.

The "art" in becoming good at security and RE is to get a broad knowledge of a lot of things and try to simultaneously go deeper 'n deeper in all of them. And if you are interested in a specific field, put more weight on those topics.

You know how long it takes to reverse engineer something? People stare on IDA for weeks or months at a time. You can't learn RE just by reading a book or a blog. You gotta start to just doing it, and hopefully find a few blogs and people to keep up the spirit.

EDIT: I don't know why the edit is not updating.

"Basics of http" and "making exploits" are from next line. Thanks for bearing with me. ;)