Hacker News new | ask | show | jobs
by westurner 2296 days ago
> This sounds like the sort of thing that would be useful outside of data science.

The instruction/operation costing or the computational essay/notebook environment setup?

Ethereum ("gas") and EOS have per-instruction costing. SingularityNET is a marketplace for AI solutions hosted on a blockchain, where you pay for AI/ML services with the SingularityNET AGI token. E.g. GridCoin and CureCoin compensate compute resource donations with their own tokens; which also have a floating exchange rate.

TLJH: "The Littlest JupyterHub" describes how to setup multi-user JupyterHub with e.g. Docker spawners that isolate workloads running with shared resources like GPUs and TPUs: http://tljh.jupyter.org/en/latest/

"Zero to BinderHub" describes how to setup BinderHub on a k8s cluster: https://binderhub.readthedocs.io/en/latest/zero-to-binderhub...

1 comments

The notebook/procedure thing. Like, doesn't everybody everywhere operate on a basis of mixed manual/automated procedures, where it needs to fluidly transition from one to another, yet be controlled and recorded and verified and structured?
REES is one solution to reproducibility of the computational environment.

> BinderHub ( https://mybinder.org/ ) creates docker containers from {git repos, Zenodo, FigShare,} and launches them in free cloud instances also running JupyterLab by building containers with repo2docker (with REES (Reproducible Execution Environment Specification)). This means that all I have to do is add an environment.yml to my git repo in order to get Binder support so that people can just click on the badge in the README to launch JupyterLab with all of the dependencies installed.

> REES supports a number of dependency specifications: requirements.txt, Pipfile.lock, environment.yml, aptSources, postBuild. With an environment.yml, I can install the necessary CPython/PyPy version and everything else.

REES: https://repo2docker.readthedocs.io/en/latest/specification.h...

REES configuration files: https://repo2docker.readthedocs.io/en/latest/config_files.ht...

Storing a container built with repo2docker in a container registry is one way to increase the likelihood that it'll be possible to run the same analysis pipeline with the same data and get the same results years later.

...

Pachyderm ( https://pachyderm.io/platform/ ) does Data Versioning, Data Pipelines (with commands that each run in a container), and Data Lineage (~ "data provenance"). What other platforms are there for versioning data and recording data provenance?

...

Recording manual procedures is an area where we've somewhat departed from the "write in a lab notebook with a pen" practice. CoCalc records all (collaborative) inputs to the notebook with a timeslider for review.

In practice, people use notebooks for displaying generated charts, manual exploratory analyses (which does introduce bias), for demonstrating APIs, and for teaching.

Is JupyterLab an ideal IDE? Nope, not by a longshot. nbdev makes it easier to write a function in a notebook, sync it to a module, edit it with a more complete data-science IDE (like RStudio, VSCode, Spyder, etc), and then copy it back into the notebook. https://github.com/fastai/nbdev

> What other platforms are there for versioning data and recording data provenance?

Quilt also versions data and data pipelines: https://medium.com/pytorch/how-to-iterate-faster-in-machine-...

https://github.com/quiltdata/quilt (Python)