Hacker News new | ask | show | jobs
by gradstudent 3191 days ago
I have a PhD in AI and I'm having trouble figuring out exactly what's being announced here. Like, check this out:

> Building on advanced research in program synthesis (PROSE) and data cleaning, we have created a data wrangling experience (Figure 1) that drastically reduces the time that data scientists have to spend in transforming data for machine learning.

Huh?

> Figure 1: AI-powered data wrangling in the workbench learns from examples and automatically synthesizes code for data transformations using program synthesis technology.

What?

> Models can be containerized in Docker and deployed to network edge devices, allowing models to score closer to the event and in real-time. Local docker deployments can be used for debugging, while for scaled out production serving of AI, these containers can be managed with Kubernetes, using Azure Container Services.

English, Microsoft. Do you speak it?

8 comments

Do you have enterprise experience as well? I have an ML Ph.D. and 20 years of enterprise AI, and from my perspective these do make sense.
I've seen this demo at a meetup and its actually quite cool. You use a graphical interface to connect boxes of data transformations, play with the settings, and it will generate a python script that will perform the exact operations you specify to the data in question.

What you're describing are tools for data engineering - sanitizing real-world datasets to be fed into models. Researchers do not have to deal with this task as much because they work with well-defined datasets to provide fair comparisons of the algorithms they develop. Industry is the opposite - the algorithms are usually formulaic and well-defined, but the data itself is not.

Honestly 'data wrangling' is indeed an unfortunately large part of ML work, so I'm interested. I have wondered how tractable making a data transformer that simply shaped and filtered the data based on example and a few constraints would be. Remains to be seen how good a job this has done.
For the deployment of ML models in Docker containers have a look at this tutorial https://github.com/Azure/ACS-Deployment-Tutorial
Getting data into a reasonable format is (unfortunately!) a huge part of the machine learning process; that was the motivation for tools like pandas (in Python) and dplyr (for R). Joseph is describing a machine-learning-enabled way to automate some of that data cleaning, which is pretty cool.

Check out the PROSE SDK (including an interactive playground) here. I particularly like its ability to extract JSON to something resembling a dataframe: https://microsoft.github.io/prose/documentation/extraction-j...

Here is a link to the PROSE SDK on Github https://github.com/Microsoft/prose

This will probably be more illuminating than the quoted sentence. :)

"WE have added links to azure in excel". "Finance types married to bloomberg terminals, rejoice for you may now feel secure in excel once again"

Well, thats my tongue in cheek summary of the whole thing.

I'm reasonably certain that a lot of this is sales jargon. They'll point enterprise CIOs with degrees in music to this site and they'll see a lot of buzzwords and buy.