Hacker News new | ask | show | jobs
by graue 4926 days ago
Thanks for this, I was immediately curious what's included and how it works.

Would you be willing to share an example of the JSON structure for a tweet?

1 comments

Here's my first ever tweet. There's a separate JS file for every month of tweets. The JSON object itself is named according to the month.

  Grailbird.data.tweets_2006_12 = 
  [{
    "source" : "web",
    "entities" : {
      "user_mentions" : [ ],
      "media" : [ ],
      "hashtags" : [ ],
     "urls" : [ ]
    },
    "geo" : {
    },
    "id_str" : "547413",
    "text" : "counting down the seconds until 5",
    "id" : 547413,
    "created_at" : "Sat Dec 02 00:57:17 +0000 2006",
    "user" : {
      "name" : "Jim Ray",
      "screen_name" : "jimray",
      "protected" : false,
      "id_str" : "35623",
      "profile_image_url_https" :   "https://si0.twimg.com/profile_images/1234214846/avatar_normal.jpg,
      "id" : 35623,
      "verified" : false
    } ]
The CSV data is much more basic

  547413,2006-12-02 00:57:17 +0000,counting down the seconds until 5,
Does anyone know the purpose of having both the "id" and "id_str" attributes?
It's probably because modern tweet IDs are larger than a 32-bit integer. Presumably some JSON parsers aren't too hot on parsing bigints, so they give you the option of having a string instead.
Ahh, that makes sense. Thanks! (And thanks to andrewf as well!)
In Javascript (as well as Lua and PHP) ints are -- under the hood -- doubles. This means they can only represent a 53-bit int: outside that range they start to alias. You can see this in effect in a JS REPL:

10000000000000001 === 10000000000000000 True