Hacker News new | ask | show | jobs
by spxneo 815 days ago
I think pairing this tool with something that recursively clicks through app would be insanely helpful. (the latter is what I have trouble finding)
3 comments

One slightly related thing you can do is to test the API with schemathesis[0]

[0] https://github.com/schemathesis/schemathesis

This seems pretty simple to me to do. Search the html of the main page for anchor tags. Add the links in those tags to an array as your exploration frontier. Once done parsing that html, load the next link. Add deduplication to avoid loops and just run a depth-first search. What am I missing?
For brochure / static content sites this is definitely the beginnings of a web crawler but it can be a lot trickier for web apps.

For example, clicking a link which loads some data, then clicking edit (which isn't even an anchor), typing in & clicking stuff, then clicking the save button (don't click the cancel button!) would not be an interaction that would get picked up with your suggestion. Detecting loops becomes much more ambiguous and backtracking to get all the permutations of interactions becomes a whole other problem to solve.

In many web apps there are going to be buttons and links that are not represented as <a>. You would realistically have to enumerate everything that has any kind of event handler attached since it could potentially trigger an API call.

You would also have to fill and submit forms with valid and invalid data. You would have to toggle checkboxes, change radio buttons, click buttons, (e.g. "Apply filters" after changing values in a product filter section), and generally go through many combinations of inputs to find all valid parameters and possible responses.

Open to ideas! We're thinking of adding agents/crawler suggestions to the github if there's a package that clicks around in that fashion
Forgive the naive question, but to pair with the GP, thoughts:

1. Wouldnt this also be helpful in understanding the exact nature of all traffic/calls against a particular page, user-workflow matriculating through your site from a UX perspective?

2. Could one make a proxy from this on a local home egress such that you could see the nature of outbound network traffic to site you visit (more importantly, traffic heading to 3rd-party trackers/cookies' APIs via your site visits?

3. Could it be used to nefariously map open API endpoints against a system one is (whiteHat) pen testing?