Hacker News new | ask | show | jobs
by pierrefar 5641 days ago
As the contact page and the FAQ require signing in, I can't comment on the exact implementation. I suspect they monitor Google's cache as that's the only publicly available data that can power this service. This is the wrong way to do it for many reasons:

1. There can be a delay between crawling and the page showing up in the cache. I've seen it to be on the order of several days in some cases, even though the page was ranking well and getting traffic!

2. Email is not a good log format. It's just not. I suspect for this service you'll need to enter each URL you want to monitor manually (yikes!) and then you get an email. If your sitemaps are any good, you already have a good list of your website's URLs and this could be a massive copy-paste operation, but then you'll get bombarded with emails. One time when I implemented such a system, GMail locked for days.

3. A much better solution is to do it server side with a proper log like in MySQL or MongoDB or whatever. You can do this with a cron job that analyzes the HTTP server's log files at the end of each day, does clever number crunching, generates a report and emails you.

Clever number crunching means calculating numbers that affect your business's bottom line. You can monitor percentage of pages indexed, segment by sections, do trends over time analyses, and more clever stuff like time between crawl and first Google-referred traffic.

1 comments

Great feedback, thanks a lot!

All valid points and if this grows into something larger than what I intended it, I'll certainly look into some more advanced/clever ways of handling the email side of things.

Index Ping was intended for top-level domain names only. IE, "Bob" launches a new website and wants to know when it is indexed. Not down to the individual page level, just when he "breaks" in to the index. If people want more granular control over this, I'll take a look.

Cheers!

Two things for you to consider:

1. Turn this into an API that webmasters can send you their data which you crunch into useful reports. This just might be a decent money earner.

2. A sitemap importer to kick start the reporting. It should be quite easy as sitemaps are some flavour of XML or plain text.