Hacker News new | ask | show | jobs
by joevandyk 4714 days ago
I started saving all my page views in a postgresql database. Schema is pretty simple.

I have the following tables:

    sessions
      session_id (uuid type)
      created_at
    
 
    page_views
      page_view_id
      session_id
      created_at
      site_id
      path
      query_string (hstore)
      user_agent
      referral_url
      ip_address
      user_id
      http_method (get, post, etc)
      details (hstore, used to tag page views/actions)
   
This allows me to simply query all my page views against data in my live database. I can see the path a user took to place an order. I can easily integrate a/b tests. If someone uses a coupon on the site and we want to see if they later came back and viewed/purchased more, we can easily write a sql query to figure that out. We can simply figure out lifetime customer value, even if not logged in. If we're getting a large amount of traffic from a certain affiliate, we can alert our staff.

It's really awesome to be able to have your data in the same place. Having analytics data spread out to GA made it difficult to match that data against ours. If we need to scale out to multi-terabytes, postgres_fdw will make querying against the analytical database simple.

Since we're also tracking affiliate purchases to pay out commissions, I also have another table that that stores additional information about a page view if they came from an affiliate site (click id, the affiliate network, etc).

Here's the plpgsql function I use for saving the sessions and page views: https://gist.github.com/joevandyk/f63523cdd1a3aa75d0ec

1 comments

Yeah, we do that kind of stuff as well. At least you know what your data means. But when you start getting millions of hits a day, you won't necessarily want to spend some time scaling your system... In that case leaving it to the pros and focusing instead on your product may prove the most sensible move.
It should be pretty easy to scale out a simple set of data like this.

"Leaving it to the pros" means you don't control your data and you can't easily combine it with your other data about products, orders, whatever.