Hacker News new | ask | show | jobs
by dellinspiron 2236 days ago
I think people in the comments are completely missing the point of this work. As I understand it, and take this with a large grain of salt because I haven't read the paper, the idea of Jukebox is to take a certain style of music by a certain musician and have the algorithm sing, karaoke-style, the lyrics that are listed in the examples to the tune of that music. Think of it as a really jazzy version of Google text-to-speech. The lyrics are not written by this algorithm, it's just singing in the style of Sinatra or Lady Gaga some words that have been prewritten. It's fun to listen to and really amazing to watch it read the lyrics and decide where to put emphasis, and where not to - dragging out certain words and letting others be mumbled. Comparing this to something like IBM's rendition of a "Bicycle built for two" showcases how utterly mind-blowing this work is!

Finally, can we stop treating ever single piece of work by neural networks as a "failure" because it isn't GAI? Just because it doesn't "say something about the human experience", doesn't make it bad engineering. It's hilarious how as soon as there's some new AI work done everyone starts wailing, "where's the humanity!"

3 comments

> It's hilarious how as soon as there's some new AI work done everyone starts wailing, "where's the humanity!"

Lay-people think AI refers to ALife.

Most of the talking heads would be immediately satisfied—giving none of these complaints—if they were shown an "AI" program that responds to stimuli by entering emotional states, and which learns to associate stimuli with the emotional states it has been in in the past, such that those stimuli will then become triggers for those states, and for memories associated with those states.

Such an agent wouldn't even need to use ML techniques, necessarily. It'd just need to be a high-concept tamagotchi that can respond to operant conditioning. That would already be an advance over the state of the art.

But, AFAIK, nobody's really working on ALife in the sense of "making an individual agent with a complex-enough internal model that it can statefully respond to you the way a pet does." ALife is only really studied at the very low level (C. Elegans connectome simulation) or the very high level (sociological/economic simulations using simple goal-driven agents); nobody's really working in the space "in between." (Except for the people trying to make chat bots seem friendlier, but they're mostly trying to fake it, rather than creating actual persistence-of-memory.)

I wonder why nobody's interested in medium-scale ALife research these days? It used to be a hot topic, back when it was conflated with robotics under the banner of "embodied cognition."

So basically, most talking heads would be better off playing The Sims. They'll have agents there that enter emotional states in response to stimuli. Even though it's just a fuzzy state machine.

Now, is A[rtificial] Life the correct term to use here? I feel it isn't - I'd expect ALife to be more concerned with implementing simulacra of bacteria or worms in silico, not with reasoning or emotions.

ALife is fundamentally concerned with the research on the kind of control systems that govern how organic life responds to stimuli, how those systems plan in order to maintain long-term homeostasis, how they select goals, how they allocate attention, etc.

One might say that ALife is to an event loop as AI is to a one-time query-response. AI can evaluate, but you need an ALife system in order to "think" in a continuous way.

There's really no sense in which an ALife researcher cares about recreating a full-fidelity model of biology in silico; the point is to specifically study the thinking and decision process of real agents, and figure out how to model those, in a way that the model makes the same series of decisions the real agent does in the same situations (and, therefore, must also be keeping and updating analogous internal state to the kind the real agent keeps.)

Some of those models are attempts to recreate real brains/nervous systems, but these models aren't fundamentally biological. A "low level" connectome simulation doesn't contain any model of cellular inflammatory response, cellular waste and its clearance, etc. It's basically just a brain-as-actor-model with neurons as stateful processes and electrochemical signals as messages.

An ALife researcher cares about as much about biology below the level of intracellular pharmacodynamics (sodium channels et al), as a race-car-chassis engineer cares about physics below the level of fluid dynamics. They don't need to go any lower, because they've found an encapsulating abstraction that makes all the predictions they're interested in making, without needing any lower-level information.

You misunderstand critically that this is not "singing along", it's generating the music and voice. Conditioning on lyrics is optional, and done "unaligned", eg by arbitrarily encoding the lyrics and passing them as additional input.
Indeed, the extent of generation is obvious in the ‘continuation’ mode on any track that is rather familiar for the listener (ahem Rick Astley). Besides, in the full sample browser there are tracks without lyrics.
At the risk of sounding crazy, I think this is a pretty big milestone towards some semblance of AGI (or at the very least ASI that writes songs). The fact that neural networks are even capable of producing such outputs (even when cherry-picked) is surprising. In a sense, showing off the promise behind this kind of technology provides a cohesive vision of what this could be in its final form, which in turn inspires people to work on it and fix the pending issues.

Just think about how GANs were viewed when they were first published. The common sentiment was that it was as an interesting "research contribution" that could never live up to the hype. However, the promise behind it inspired people to continue to work on it and now we're able to produce realistic human faces that humans can't tell are fake.