Hacker News new | ask | show | jobs
by vanderZwan 3070 days ago
Using loompy in Jupyter is what you would do when analysing your own data. You would connect to a loom file with the loompy library, then extract the data you're interested in, apply whatever algorithms you need to apply, and plotting the results. The difference is that this is all code of your own. Loom is just the storage format for the data in this context.

The viewer is a specialised application: it has a server and client. The server extracts (meta)data requested by the client from a loom file, and serves it as JSON. The client then uses this metadata to generate plots. The off-line viewer is actually just running that server locally and opening it on localhost:8003.

That makes it better for sharing raw data on-line: most of the time, people do not need the full dataset of 27k+ genes, they're only interested in a dozen or so. This makes it easy to access that.

Hosting your own viewer is quite simple:

    # this also installs the loom CLI

    pip install loom-viewer

    # start the server

    loom --dataset-path [DATASET_PATH] --server --port [PORT_NUMBER]
(Well, you probably want to use something like a supervisor script for that, which is what we do, but you get the idea)

We don't use a database; instead the server looks for loom files in a dataset folder like this:

    [DATASET_PATH]\[PROJECT_FOLDER]\[LOOM FILE]
That means that sharing a loom file is as simple as copying it to the right folder.

This probably not web-scale or really safe or anything, but we're talking small labs sharing data with other labs - the risks are different. These viewers will be accessed by a few biologists. Using files in a folder structure keeps it simple enough to set up for the less tech-savvy.

In theory, a third work-flow is also possible: having Jupyter open in one tab and manipulating the loom file from there, and the viewer in another.

There are three blocking issues for that, however:

- the stale cache problem I mentioned in the other comment,

- single writer/multiple reader support,

- the server needs to be an isolated sub-process due to gevent monkeypatching messing with Jupyter

Main issue here is dev-team of one person so... this might take some time.