Hacker News new | ask | show | jobs
by jnguyen64 1661 days ago
Hey, posted on your other comment asking for advice, so thought I'd return the favor. I haven't built a scraper for workday yet, but I looked a little bit at a workday board to figure out the process before writing this comment. 1.) Navigate to a workday page (we'll say https://broadinstitute.wd1.myworkdayjobs.com/broad_institute... for example)

2.) Open up your developer console in chrome (Ctrl+Shift+J command on windows), and navigate to the Network tab.

3.) Change the filter to Fetch/XHR

4.) Refresh the page

5.) You should see a few requests pop up, the one you care about is the clientRequestId request

6.) Take a look at the response payload of that request (throw it in http://jsonprettyprint.net/ for readability)

7.) You get a json payload that gives you the job positions you're looking for

8.) In addition to that, go back to the original web page and scroll down. You'll see a new request pop-up, giving you a format for how you'll traverse through the next positions.

Hope this helps!

1 comments

It is helpful! Annoyingly, there's some site-to-site variation in how companies structure results in their Workday instance. I get similar (but not identical) results when I look at NXP's Workday site, for example:

https://nxp.wd3.myworkdayjobs.com/en-US/careers

I'm going to try this technique with individual posting results - it's been challenging to get them to render as well, but I think that's more a Javascript thing than a requests thing.

Ah okay, I see what you mean with that one. I think the way I would approach Workday is categorizing the different companies that use Workday into certain buckets. So the example I gave would be one bucket, and the one you gave would be a different bucket. I would create a script for each of these buckets, instead of trying to use a one-size fits all approach. The approach I'd use for the website you linked would be something like:

Create a request mimicking this curl command:

curl 'https://nxp.wd3.myworkdayjobs.com/wday/cxs/nxp/careers/jobs' \ -H 'Connection: keep-alive' \ -H 'sec-ch-ua: " Not A;Brand";v="99", "Chromium";v="96", "Google Chrome";v="96"' \ -H 'Accept: application/json' \ -H 'Content-Type: application/json' \ -H 'Accept-Language: en-US' \ -H 'sec-ch-ua-mobile: ?1' \ -H 'User-Agent: Mozilla/5.0 (Linux; Android 5.0; SM-G900P Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Mobile Safari/537.36' \ -H 'sec-ch-ua-platform: "Android"' \ -H 'Origin: https://nxp.wd3.myworkdayjobs.com' \ -H 'Sec-Fetch-Site: same-origin' \ -H 'Sec-Fetch-Mode: cors' \ -H 'Sec-Fetch-Dest: empty' \ -H 'Referer: https://nxp.wd3.myworkdayjobs.com/en-US/careers?p=4' \

  --data-raw '{"limit":20,"offset":80,"searchText":"","appliedFacets":{}}' \
  --compressed

Change the offset by +20 (second to last row) each time until you reach the desired number of jobs. May need some changes but that's the general approach!
Thanks for going down the Workday scraping rabbit hole with me. :)

Did you pull this from the browser console's "Copy as cURL" function?

I tried this with some success (there's even a utility for translating cURL to Python - imagine that! https://curlconverter.com/) but I had some issues after a while, probably because the cookie/session token expired.