Hacker News new | ask | show | jobs
by 0_____0 928 days ago
It won't replace high-end talent, I don't think models can replicate the nuance for a long time, however the entire low-to-mid end of the market is going to get nuked from low earth orbit
4 comments

It won’t replace it, but it’s very likely to supplant it, just about destroying the segment by reducing demand by being good enough and so much cheaper, especially as people get more used to it.

Typesetting. Music engraving. Bookbinding. The quality of all these fields have been materially harmed by advancements.

Computer typesetting has, by and large, been a significant regression, though the gap has largely been made up now if you make the right choices.

Published music scores used to be set by experts. Now they’re set by novices using software that is mechanical in method and generally quite insipid. Most are atrocious compared to the old masters, and mediocre at best compared to the typical published scores from a hundred years ago; and very few popular scores are really good (… and if they are, there’s a reasonably high chance they’ve used GNU LilyPond, which has focused on this problem). But the barrier for entry is so much lower, and people have got used to the inferior results, so I don’t know if anyone engraves music the old way, and even people that know better largely just shrug and make do with the new. Like with computer typesetting, there is hope because things have slowly improved. But most will continue to be mediocre.

Books used to be bound with cold glue. It takes time to set, but the results are very good, supple and long-lasting. Then along came hot-melt glue, and it’s just so much friendlier for cheap manufacturing because books are finished within a few minutes instead of a day or two, that I don’t think anyone produces books the old way any more, even though the results are abysmal in comparison (compare the binding and reading experience of a paperback from the ’40s or ’50s with one from the turn of the century; no one after tasting the old will desire the new; for he says, the old is good). But they’re just (barely) good enough. Unlike the other two, I don’t think there’s any hope here—the regressive advancement crowded out the superior but dearer option so that no place was found for it.

You can still get relatively good published music scores from a few of the old German shops (Schirmer, Henle, etc.), but they are very expensive. They are a joy to use when playing, though, since the music is very clearly laid out and page turns are in the perfect place, etc. Finale and Sibelius are controllable enough that you can use them to do fantastic layout, but many people either do not understand how to make a score readable or don't care enough.
That, and what GP describes, is what I see as the overall trend of the market to hollow out the middle. It's not just about technology (though it plays a big role), as all optimization coming from competitive pressure - materials, processes, business models, marketing.

What seems to universally happen is, the market bifurcates - one part is in a race to the bottom, the other (much smaller) aims for super premium tier (overpriced quality), because only those two positions are sustainable, once the race-to-the-bottom side drags all the economies of scale with it. So as a consumer, you get to chose between cheap low-quality garbage that's barely fit for purpose, and rare, super-expensive, professional/elite high-end products. There is no option for "good value for reasonable price".

This has been happening to everything - software, furniture, construction, electronics, vehicles, food, you name it.

I'm using AI for training videos for my startup. Never going back to voice actors outside of primary marketing videos. tThe sheer convenience of write/listen/tweak cycle on scripts is insane. In minutes you can do a voiceover which would have have taken hours + days delay prior.

Sure the final result sounds slighty robotic. 99% of people wouldn't care, and you can get more training videos done, faster for a fraction of the cost.

[Edit] And I'll add the difference from 6 months ago is noticeable to today. I imagine every 6 months we can just re-download updated voiceovers and every 6 months will sound just slightly more polished..

I wonder which will happen first - AI evolves to work well at the high-end, or high-end humans retire and there’s nobody left in the low-to-mid end to fill their shoes…
Given the modern trend of on-screen actors doing voice work, I think there will be a supply of talent for at least a few more generations.
It will absolutely replace high-end talent. Anything that a human can do will be able to be done 10x better by a model -- especially in such a narrow and well defined domain.
Did you hear the output examples? Yeah, I think not. I mean, definitely on the way, but there's no way if you need quality acting in your dub that you're going with this.
These are models specially tuned and sized for near real-time, instant translation. It would be naive to think that there aren't technical creatives building and training models tuned for expressiveness and nuance in a more controlled environment.
Maybe not in the current state of the model, but judging by the rate of improvement we’re all seeing it’s just a matter of time (and data+compute+research obv).
That's what they gave us plebs. To think they don't have a superior one they can sell...
i think the key word is will.

a few more years of improvements if they happen could be disruptive