Hacker News new | ask | show | jobs
by manually 2296 days ago
Next:

- Autosuggest database tables to use

- Automatically reserve parallel computing resources

- Autodetect data health issues and auto fix them

- Autodetect concept drift and auto fix it

- Auto engineer features and interactions

- Autodetect leakage and fix it

- Autodetect unfairness and auto fix it

- Autocreate more weakly-labelled training data

- Autocreate descriptive statistics and model eval stats

- Autocreate monitoring

- Autocreate regulations reports

- Autocreate a data infra pipeline

- Autocreate a prediction serving endpoint

- Auto setup a meeting with relevant stakeholders on Google Calendar

- Auto deploy on Google Cloud

- Automatically buy carbon offset

- Auto fire your in-house data scientists

5 comments

Would be funny but most of those things are already on AutoML Tables, including the carbon offset

https://cloud.google.com/automl-tables

> Would be funny but most of those things are already on AutoML Tables, including the carbon offset

GCP datacenters are 100% offset with PPAs. Are you referring to different functionality for costing AutoML instructions in terms of carbon?

...

I'd add:

- Setup a Jupyter Notebook environment

> Jupyter Notebooks are one of the most popular development tools for data scientists. They enable you to create interactive, shareable notebooks with code snippets and markdown for explanations. Without leaving Google Cloud's hosted notebook environment, AI Platform Notebooks, you can leverage the power of AutoML technology.

> There are several benefits of using AutoML technology from a notebook. Each step and setting can be codified so that it runs the same every time by everyone. Also, it's common, even with AutoML, to need to manipulate the source data before training the model with it. By using a notebook, you can use common tools like pandas and numpy to preprocess the data in the same workflow. Finally, you have the option of creating a model with another framework, and ensemble that together with the AutoML model, for potentially better results.

https://cloud.google.com/blog/products/ai-machine-learning/u...

This sounds like the sort of thing that would be useful outside of data science. Which leads to the question of whether it needs to be generalized, or redone differently for different specializations. Which in turn seems like the sort of question that it's tricky to answer with AI.
> This sounds like the sort of thing that would be useful outside of data science.

The instruction/operation costing or the computational essay/notebook environment setup?

Ethereum ("gas") and EOS have per-instruction costing. SingularityNET is a marketplace for AI solutions hosted on a blockchain, where you pay for AI/ML services with the SingularityNET AGI token. E.g. GridCoin and CureCoin compensate compute resource donations with their own tokens; which also have a floating exchange rate.

TLJH: "The Littlest JupyterHub" describes how to setup multi-user JupyterHub with e.g. Docker spawners that isolate workloads running with shared resources like GPUs and TPUs: http://tljh.jupyter.org/en/latest/

"Zero to BinderHub" describes how to setup BinderHub on a k8s cluster: https://binderhub.readthedocs.io/en/latest/zero-to-binderhub...

The notebook/procedure thing. Like, doesn't everybody everywhere operate on a basis of mixed manual/automated procedures, where it needs to fluidly transition from one to another, yet be controlled and recorded and verified and structured?
REES is one solution to reproducibility of the computational environment.

> BinderHub ( https://mybinder.org/ ) creates docker containers from {git repos, Zenodo, FigShare,} and launches them in free cloud instances also running JupyterLab by building containers with repo2docker (with REES (Reproducible Execution Environment Specification)). This means that all I have to do is add an environment.yml to my git repo in order to get Binder support so that people can just click on the badge in the README to launch JupyterLab with all of the dependencies installed.

> REES supports a number of dependency specifications: requirements.txt, Pipfile.lock, environment.yml, aptSources, postBuild. With an environment.yml, I can install the necessary CPython/PyPy version and everything else.

REES: https://repo2docker.readthedocs.io/en/latest/specification.h...

REES configuration files: https://repo2docker.readthedocs.io/en/latest/config_files.ht...

Storing a container built with repo2docker in a container registry is one way to increase the likelihood that it'll be possible to run the same analysis pipeline with the same data and get the same results years later.

...

Pachyderm ( https://pachyderm.io/platform/ ) does Data Versioning, Data Pipelines (with commands that each run in a container), and Data Lineage (~ "data provenance"). What other platforms are there for versioning data and recording data provenance?

...

Recording manual procedures is an area where we've somewhat departed from the "write in a lab notebook with a pen" practice. CoCalc records all (collaborative) inputs to the notebook with a timeslider for review.

In practice, people use notebooks for displaying generated charts, manual exploratory analyses (which does introduce bias), for demonstrating APIs, and for teaching.

Is JupyterLab an ideal IDE? Nope, not by a longshot. nbdev makes it easier to write a function in a notebook, sync it to a module, edit it with a more complete data-science IDE (like RStudio, VSCode, Spyder, etc), and then copy it back into the notebook. https://github.com/fastai/nbdev

I guess this is a job-safety type comment?
Autodetect data health issues and auto fix them

Funy you say that cos my company is actually developing something along those lines

Poor data scientists, now whose heads get cut when things go wrong and companies lose billions?
In the days when Sussman was a novice, Minsky once came to him as he sat hacking at the PDP-6.

“What are you doing?”, asked Minsky.

“I am training a randomly wired neural net to play Tic-Tac-Toe” Sussman replied.

“Why is the net wired randomly?”, asked Minsky.

“I do not want it to have any preconceptions of how to play”, Sussman said.

Minsky then shut his eyes.

“Why do you close your eyes?”, Sussman asked his teacher.

“So that the room will be empty.”

At that moment, Sussman was enlightened.

With all due respect to Minsky, I find this zen style story a little silly. If Minsky want to say something informative why don't he use formal concepts like Jeffreys priors, mixing time, high dimensional varieties, minimun description length, entropy, etc. Is that style of telling stories a projection from a high dimensional mind to a zero dimensional dumb style space?, is that a PCA reduction from ideas to cliches? I apologize in advance from being harsh, but I am entitle to speak from my heart and I reiterate my appreciation for Minsky's work.

It should be nice using a more informative language for giving advice. If this story is tagged as "popular story for dummies" I would feel we are making real progress.

Just one of Minsky great ideas related to reinforcement learning: The credit assignment problem:How do you distribute credit for success among the many decisions that may have been involved in producing it?, in "Steps Toward Artificial Intelligence" (Minsky, 1961): All of the methods we discuss in this book are, in a sense, directed toward solving this problem.

That book is linked from HN and it has just one comment, so I think that NDNS, no dumb nerd stories, will never become popular.

(1) https://news.ycombinator.com/item?id=10972522

More from (2) Minsky in 1951 built the world's first “randomly wired neural network learning machine,” called the stochastic neural-analog reinforcement computer (snarc)

https://www.geek.com/blurb/marvin-minsky-ai-has-been-brain-d...

A fair paper: Exploring Randomly Wired Neural Network for Image Recognition.

Was Sussan at the edge of envisioning deep learning?, then in fact the room has dissapeared!

Is this an argument in favor of unjustified magic constant arbitrary priors?
A sufficiently large amount of random data contains all the magic constants you could want.
Yeah, but cryptographic hashes have some entropy.
I miss the codeless code. Wish someone would take up that mantle.
- Auto-negotiate proper metrics to use with stakeholders.