Hacker News new | ask | show | jobs
by 13pixels 125 days ago
Distinguishing 'AI Research' (crawling) from 'AI Referral' (user clicks) is the hardest part. Most agents (OAI-SearchBot, ClaudeBot) declare themselves in UA, but the actual click-through often strips the referrer or shows as direct/none. We've had some luck correlating 'time of crawl' with 'time of visit' to fingerprint AI traffic, but it's noisy.

Self-hosted is definitely the way to go for raw logs though. GA4 obfuscates too much of this.