Hacker News new | ask | show | jobs
Ask HN: How do you view large JSON files?
55 points by chuchuva 3630 days ago
I use JSON Viewer [1] to view data in JSON format but it freezes on files larger than 1 MB. I want to be able to easily collapse and expand elements to understand the structure of the data.

[1] https://jsonviewer.codeplex.com/

32 comments

I view them unfavorably.
Ha! It is funny because it is true. I remember listning to one of Joe Armstrong's talks (React 2014 conf and he talks at some point ( https://youtu.be/rQIE22e0cW8?t=2009 ) how parsing is rather expensive CPU-wise and also bandwidth expensive. Especially in mobile networks. The company he works for control data paths for smart phones to the internet, and they sweat wasting every little bit because it eats into the precious bandwitdth available to consumers -- and what do developers do? -- they shove JSON through that channel in the application level!

It was a silly observation but it is also true at some level. JSON might be easy to read, but reading a 100M json still needs a special editor.

Another funny observation Joe made at some point when a response to him calling JSON out, because "after all, you can see JSON" was that, your eyes cannot see JSON, they see photons bouncing from the screen. You still use an editor or some other translation program to display it and read it. So at some point might as well use binary (thrift, protobufs, sqlite, ...).

I personally think that's one of the more sane replies.

It is really important to chop up your JSON files into smaller sub-files. This will not only make it easier to backup and read manually but will usually give you a speed boost (can read to and write to more then 1 part of the "db" at a time).

This is probably sensible. JSON is a text-based format that requires you to parse the entire thing just to get an outline, unlike well-designed binary formats.
Well-designed text formats too?
Not necessarily. (I'd actually argue "not at all.") Presumably you have a text format in the first place because you want your representation to be human-readable [with common tools like a text editor], and very likely human-writable as well. Those are the real constraints of a (useful) text format, and they tend to be in direct conflict with high-performance parsing, or partial parsing.

For an arbitrary example, a binary format could have an index table of objects at the start of the file, and then you could perform partial reads to access only the subset of objects you care about. That's something you could do in a text format too, but if the file is edited in a text editor you can't guarantee that the user remembered or bothered to update the index when they added a new object. The parser would effectively not be able to trust the index, and have to parse the entire file. (I suppose you could use CRCs or something to enforce this, but then you'd end up with a very brittle format that people get frustrated when trying to edit.)

Really, the true advantage of a binary format is you generally assume that nobody messed with the data behind your back, so you can have duplicate data (like an index) if you want without worrying that it's out of sync. This pretty much goes hand in hand with the fact that you can't just open it in a text editor and fiddle with stuff.

TLDR: Human-writability and high-performance are arguably mutually exclusive features.

> Really, the true advantage of a binary format is you generally assume that nobody messed with the data behind your back

I would rephrase that a bit and say the true advantage is flexibility, as you're not subject to the constraints of textual data.

The integrity of the data is a separate matter, and should be carefully verified rather than trusted implicitly. A huge amount of security vulnerabilities, and program crashes in general, come from errantly assuming that user-supplied data is correct.

Indeed. Another example is that you could have a length field in a text format that precedes a string, so you can skip over it without parsing it. But humans will forget to update it, or update it incorrectly.
There's addons/extension for Chrome and Firefox. Both also called JSON Viewer (different author). For really large files I use a command-line tool plus grep and awk though https://stedolan.github.io/jq/
jq looks promising but I still need to view the JSON file first. I want to be able to somehow get the general understanding of the structure very quickly and interactively. The tree control works very well for that. JSON Viewer is almost perfect but it doesn't work with local files.
I usually use jq to understand the structure of the file. `keys` will get you all the key names in an object; `with_entries(.value |= type)` will replace each element on an object with its type. You can quickly delve into a JSON and gain some understanding of it with jq.
JSON Formatter for Chrome works with local files (you have to check the box in the settings to enable it).

It's super handy. Drag and drop json files into a new tab.

Link: https://chrome.google.com/webstore/detail/json-formatter/bcj...

jq . file.json | less
jq . file.json | less

or

curl ... | jq . | less

Yep, all that above + `sed`.
I think Sublime Text should be able to handle fairly large json files without a problem.

If not, I think I would just split data into multiple smaller files. Something like the python code below should work, assuming the file can fit in memory. If not, I assume you can find some json lib that can work in streaming mode and then do the same thing.

    import json, os

    json_input = '''{
    "foo": 1,
    "bar": 2,
    "baz": 3,
    "bug": 4,
    "thing": 5
    }'''

    entries_per_group = 2

    if not os.path.exists('sub_files'):
      os.mkdir('sub_files')

    main_d = json.loads(json_input)
    iter = main_d.iteritems()
    for group_count in range(10 ** 6):
      sub_d = {}
      try:
        for _ in range(entries_per_group):
          k, v = iter.next()
          sub_d[k] = v
      except StopIteration:
        break
      finally:
        json_output = json.dumps(sub_d, indent=2)
        with open(os.path.join('sub_files', '{}.json'.format(group_count)), 'w') as f:
          f.write(json_output)
I used to be a fan of JSONView or one of those chrome plugins / extensions. Then I realized that I can simply use the network tab to view the JSON data as it gets loaded.

Here is how it works - keep the network tab ready. When you see the JSON data request, click on the request and hit the "Preview" tab. It gives you data in a collapse / expand format.

Advantages: 1. There is one less plugin that scans all your browsing activity, 2. Slightly extra battery life when just browsing and not developing stuff.

Disadvantage: You need to keep the network tab ready, otherwise you will have to reload the entire page with the network tab open.

EDIT:

My apologies if it wasn't clear. I was talking about the "Developer Tools" option in Chrome, in which there is a "Network" tab. It is available in "Chrome Menu" > "More Tools" > "Developer Tools". Alternatively you can hit Command + Option + I in mac, or some equivalent in Linux / Windows to get there.

Usually F12 opens these tools in most browsers (IE, Chrome, Firefox at least) on Linux and Windows, but I do not have a Mac to test Os X. Your tip also applies to Firefox, but I don't know about IE or Edge. :)

I've never really had to handle really large JSON files, as I'm not a big fan of those but for smaller files I tend to be lazy and paste it in jsonlint.com. It's usually just for reference or debugging purposes (like finding the name of a property or some strange value).

If you're familiar with vim, you can do this

  cat unformatted.json | python -mjson.tool | vim -
And then use vim's folding methods to navigate the file (http://vim.wikia.com/wiki/Folding)
I use json.tool a lot too, but usually pipe it to `less`, using the paging and searching.
It's not nice to abuse innocent cats.

    python -mjson.tool < unformatted.json | vim -
If you're not using cat to concatenate things, the shell can probably handle the job.
People like cat because it allows notating the command closer to how they think. How can I pipe while putting the file first?
Well, just do it:

    < unformatted.json python -mjson.tool | vim -
What's the immediate advantage for me in using a less intuitive syntax? I understand cat is meant to concatenate files, so what?
On Windows I use Notepad++, which handles large local JSON files just fine, and has a collapsible tree structure.
A human can't easily understand the structure of a large JSON file - JSON is internally self-describing, so there is no guarantee that the sample of records currently visible in your editor's viewport are representative of the whole.

Instead, use a tool like Schema Guru (https://github.com/snowplow/schema-guru ; disclaimer: we wrote this at Snowplow) to programmatically extract the JSON Schema (http://json-schema.org/) which represents all JSON instances in the file.

I wrote a tool some time ago. It's not a viewer. It just shows the structure. https://github.com/ilyash/show-struct
For very large files, like 1M rows and 100's of MB's, of JSON arrays try http://www.jsondata.ninja
I have a question related to this which nobody has really touched on yet.

Imagine you have a json structure like this:

{"records" : [ {"name":"joe", ...lots of other fields in each record}, {"name":"fred", ...lots of other fields}, ... thousands of records ]}

Now a lot of people in this thread have mentioned tools for collapsing the json, but the trouble I have with this is that you can't browse the record names without opening each one and looking at all the fields. It would be nice if there was a tool that took a few key identifiers (name, id) and "bubbled them up" so that in a collapsed view you would see something like:

{"records":[ {...click to expand...}, // name=joe {...click to expand...}, // name=fred ]}

I have the same problem when working with XML as well (though in XML sometimes the ID's are attributes of the parent which mitigates the problem, but other times the ID would be nested in a child element of the structure more like you would have in json). I even found it to be enough of a problem that I wrote my own XML editor to solve this issue.

Of course one issue with this is that there's so clear standard as to which fields of an object represent "ID" information which would be important/useful to "bubble up" to the next level when collapsing. It would have to be something user-configurable (though having some sensible defaults like looking for "name" and "id" keys would work in a lot of cases). In the XML world, there's probably something to do with Schemas that would help with this problem, and fancy editors which understand your data using a schema, though some of the editors I looked at which went into that level of detail seemed like way overkill for what i wanted to do.

So essentially my question here is whether this concept of "collapse child, but keep important identifying information of the collapsed child visible" exists in any json tools? is this a thing that has a name/buzzword associated with it that I don't know about? or is it purely an issue that's my own personal quirk which nobody else cares about?

That data structure is basically a CSV file. If I were you, I would convert the JSON to a CSV file, load it into excel and use a pivot table dive into your data and convert it back if needed.

A pivot table will do exactly what you want and more. If you hate excel, You could load it into a database and do a groupby query. I suspect most JSON in that data structure were auto generated from CSV files or database anyways.

Another option is to just convert that JSON to another JSON keyed by name with Lodash then inspect it.

var anotherJSON=_.keyBy(oldJSON, 'name');

Chrome console allows you to inspect an object and expand on the properties visually.

For the most common cases, I just copy and paste the string into the chrome console if I need to inspect some random JSON file really quick.

If it is too big to copy and paste, pipe it into a html file as a global variable and view the object in the console.

You can also just open a node repl and load the JSON into memory. node lets you auto complete property names by pressing tab, this is great for inspecting some unknown object.

Whether you are looking at JSON with a bit of expected structure or otherwise, parsing something that big is not humanely possible in a timely fashion. Throw it into a db to analyse it.
There are few viewers/editors listed at http://softwarerecs.stackexchange.com/questions/18839/json-v...

JSONedit should work well with files up to few MBs, with larger files loading time may be not acceptable. Side note: for editors that generate structured view file size may be not precise metric - number of elements might work better.

jq
This is something I use quite frequently when dealing with JSON. https://www.getpostman.com
The Python REPL and the JSON lib. Probably not the most efficient if you're starting out but if you're familiar with Python it can be quite effective.
This. I use the Ruby REPL, but same principle.

You have to be a little careful about not accidentally printing out the whole thing to stdout, but after its loaded into memory you can check keys at a given level, drill into the substructure, etc.

Generally I'll copy it to the clipboard (usually `pbcopy < file.json`), fire up Chrome, Cmd+Alt+J, then paste into the console. Instant, attractive, collapsible rendering of the content. Usually when I do this I'm interested in performing some sort of transformation on the data, and this way I can prototype it without any further steps.
I use IntelliJ for that. It parses the JSON and adds code folding automatically. I've used it with fairly large files.
I'd suggest either use your browser with a good JSON viewing extension or a code editor.

> I want to be able to easily collapse and expand elements

That feature is often called "code folding", many programming oriented text editors do it.

print_r(json_decode(file_get_contents("http://url.com/"), TRUE));
What about collapsing/expanding elements interactively?
For extremely large CSVs I use Emeditor (windows / paid). It also works with JSON, but never had to handle large JSONs
You could try the JSONView chrome extension.
I'm already using JSONView Chrome extension and it's great but it doesn't work with local files. Nor can I copy and paste JSON.
Why not just throw the file(s) up on a box somewhere so you can use it, or host it locally, can you not use locally hosted links with it or something?
On the macOS I've had some success with "Cocoa JSON Editor.app". It takes a while to load files, but they are presented in an outline view you can search and browse.
http://kmkeen.com/jshon/ or python -mjson piped to $editor
vim
IntelliJ scratch files
jq with less/more as pagers...
notebook++ has a great json viewer plugin and handles huge files
>but it freezes on files larger than 1 MB

That's a pretty large bug. Considering what JSON is supposed to be used for, passing data between two programs/processes, files larger than 1 MB should have been accounted for.

Does it freeze then crash, freeze until the OS takes care of it, or freeze temporarily then resume?

It freezes temporarily then resumes after few minutes.
vi (5 gigs)