Hacker News new | ask | show | jobs
by xando 3342 days ago
Hey, a friendly reminder. I’m parsing the thread, all job offers added here are also available on the map on

https://whoishiring.io or just HN items https://whoishiring.io/search/36.0440/-90.8984/4?source=hn

If you post here, please use the below format to help me with parsing. If you won’t, no worries, I will do my best to get all the things right.

  1) {company} | {job title} | {locations} | {attrs: ONSITE, REMOTE, INTERNS, VISA, SALARY, company-url}
  Google | Software Developer | SF | VISA https://google.com
  DuckDuckGo | Software Developer | Paoli PA | REMOTE, VISA, SALARY:100k-120k
  Facebook | Web-developer | Zurich | SALARY:120k CHF 
  Google | Site Reliability Engineer | London | SALARY:120k GBP, VISA, REMOTE
or

  2) {company} | {job title} | {location}
  Google | Site Reliability Engineer | Sydney
  Facebook | Web-developer | Zurich
I’m using this regex to test the first line, you can test it here https://regex101.com/r/relwQD/3

  \s*(?P<company>[^|]+?)\s*\|\s*(?P<title>[^|]+?)\s*\|\s*(?P<locations>[^|]+?)\s*(?:\|\s*(?P<attrs>.+))?$
Check bellow for the SALARY regex.

  SALARY:(?P<salary_min>\d+(?:k|K)?)(?:\s*\-\s*(?P<salary_max>\d+(?:k|K)?)?)?(?:\s?(?P<currency>[A-Z]{3}))?
and you can test it as well https://regex101.com/r/SRWkMz/2/
5 comments

I always want to use this but find too many false positives. Is there a way to flag something for your attention? Mostly my issues have been finding jobs near me only to click and find that it's been matched to the wrong location.
This sounds like a really good feature to have. I will try to build something to flag "broken" jobs.
For example, right now there are 7 jobs displayed in Martinez, CA on the map, but which are really in Mountain View, CA.
What about multiple locations? I started my post with this:

"Southeast USA including: Texas (Austin and San Antonio), Virginia (Arlington and Dulles), Alabama (Huntsville), Florida (beach east of Melbourne), South Carolina (Greenville), Maryland (Annapolis Junction), and possibly others, all ONSITE."

Currently you have me as Austin. That is just 1 of 8 locations. I don't really see how I could fix this. We actually are hiring multiple people at multiple sites. I suppose I could post 8 times, but they'd probably get marked as dupe posts due to being identical except for the location.

I think I get it. I didn't want to stretch HN formatting too much, but since this was raised before, looks like a real issue. My thoughs:

- We could extend the locations format to comma separated values. Although some people will use "London, UK" or "Philadelphia, PA" so comma doesn't look like a good idea.

- use ";" instead of ",' to separate values eg. " ... Lead Engineer | London, UK; Philadelphia, PA | ... "

Other ideas welcome.

Be looser. Grab every location you see, or at least until you hit a sentence without a location. That means city, state, and country.

Next, look for geographic nesting. If you see Moscow and Maine, it isn't Russia. If you see Georgia and Tbilisi, it isn't the USA. If you see Melbourne and Florida, it isn't Australia. Country codes and state codes may conflict, so try both ways. The larger area may come first or second. For example: "Austin, TX" or "We're a Texas company in Austin, Houston, and Dallas."

If you are left with a geographic region that is larger than a city, log the problem and investigate. Note that DC (also "D.C.") and Vatican City are tiny, so they are OK as is.

See what you can do about stuff like NOVA (Northern Virginia), Research Triangle, Twin Cities, Cape Cod, beltway (probably around DC), and Bay Area. Maybe they are small enough. Hawaii might go by island name.

Hi, is there a problem with the location? I put "Bangkok" and the post in whoishiring is showing in "Orlando, FL" I just changed to "Bangkok, TH", hopefully this will help. Does the scraping update the posts when there is a difference found?
How often do you scrape the thread ?
Every 20 minutes from https://hacker-news.firebaseio.com/. Sometimes update delays could be caused by HN to firebase push (which happens every X interval as well)
Thanks, strangely I can't see my posting on your site (posted around noon PST on the 1st. The posting is for ZestFinance (Sr. Devops Engineer).

Might be related to the Firebase API failures...

keep getting "There was an error completing your request"
Please send me console logs / screenshot. sebastian@whoishiring.io
thank you for providing this!