Hacker News new | ask | show | jobs
by throwaway2016a 3181 days ago
> On cloud software that's simply not possible, due to the way that everything is scheduled.

As someone who works in cloud software this makes me cringe a little.

I have no doubt this is how existing cloud SEO crawlers work but with elastic scaling, web sockets, and serverless there is no reason why this has to be true.

It is not a limitation of cloud software. It is a sign of devs and/or product owners deciding making instant results is not a priority for the product.

Edit: I hear that a lot from industries that are not intimately familiar with web apps. "You can't do that on the cloud"... a typical web software engineer will not be able to do it but there are people out there who can. They are more expensive than your typical developer but if depending on your product they are worth it.

1 comments

Sorry, maybe I misread, but I kind of read the comment as 'what separates this from other cloud products on the market?'

So I wasn't trying to argue what is and isn't possible with cloud architecture, simply what is and isn't possible with (our) cloud-based competitors.

The process is along the lines of: 'Click Start', get taken to a screen which says 'Initializing' or similar, then maybe 2-3 minutes later you'll see something start to happen. But there is little to no data on which URLs are actually being crawled.

Sitebulb, and desktop crawlers in general, has a much quicker feedback loop.

> Sitebulb, and desktop crawlers in general, has a much quicker feedback loop.

I wasn't denying that. I'm sure it does. I am confident this is way better than most (if not all) current cloud solutions.

I just think it is unfortunate because there is no technical limitation of the cloud that prevents it from being instant on the cloud as well.

The cloud can't handle spikes well (1,000 customers all unexpectedly try to scan at once) but if the load is predictable, linear, or easily done in parallel which I suspect it is for this use case than it is perfectly doable with no delay on the cloud.

Deepcrawl does that, but again it entirely depends on the website type and infastructure you have got.