Hacker News new | ask | show | jobs
by nopriorarrests 2005 days ago
>recognize voices. All these things were not possible 5 years ago.

FTR: https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking

Dragon Systems released NaturallySpeaking 1.0 as their first continuous dictation product in 1997. <...> As of 2012 LG Smart TVs include voice recognition feature powered by the same speech engine as Dragon NaturallySpeaking.

2 comments

Yeah I played with dragon in 97 and it was awful - it didn’t work at all, completely unusable.

Today voice transcription is a solved problem and while their engine might be the same in name - I’d be surprised if the approach isn’t totally different than what they were doing in 97, either that or the LG tv voice transcription probably doesn’t work as well as everyone else’s.

The deep learning revolution and the applications we’ve seen since 2015 are a major step forward and something truly different. People pretending otherwise are just acting cynical in some attempt to project intelligence or seem wise, it doesn’t work.

Of course it was awful. It was 1997.

But you can't claim that something "wasn't possible 5 years" ago, if 7 years ago said feature was included in inexpensive consumer product (LG TV).

I'm not acting cynical, but it's tiresome for me to see people who claim that 20-30 years ago we all were living in a caves and catching bugs with wooden sticks, and now boom, ML!

Regarding "something truly different", well, my personal computing / mobile experience not changed that much from 2015. Honestly speaking, progress from 1995 to 2000 felt much more impressive and 'truly different'. I mean, think of it, during this timeframe we went from DOOM via V.34 modems to amazon.com and ordering pizza online.

It matters if it actually works and if it works broadly or just for limited use cases. As I said in the original comment, I think that you will see many applications in the near future. The last 5 years have been focused on research (comparable to 1990-1995), only now are we getting ready for commercial applications.
Yeah - entire classes of problems went from unsolvable to solved. Some of that is in the consumer space and some of it is not.

I feel like an AGI could accidentally wipe out half of humanity and there would still be people commenting on HN about how the exact same technology already existed in a roomba seven years ago.

Honest question, no snark --- which consumer space problems were solved, if I don't play Go and don't have FB account to recognize me on a group photos (both of these two statements are true)?
A couple quick things I can think of:

- Voice transcription

- Tesla Autopilot

- Facial recognition (photo sorting on iphones, better photos)

- Better graphical performance on Nvidia cards (https://developer.nvidia.com/dlss), also better compression for streaming.

- Much better translation

- Colorizing and repairing old photos

- Visual recognition allowing better search of images

I’m sure there are some I left out. I think we’ll see a lot more interesting applications (particularly around tooling) in the next few years.

https://medium.com/@karpathy/software-2-0-a64152b37c35

Outside of the consumer space, there are also things that hint at more generalizable intelligence.

Check out GPT-3’s performance on arithmetic tasks in the original paper (https://arxiv.org/abs/2005.14165)

Pages: 21-23, 63

Which shows some generality, the best way to accurately predict an arithmetic answer is to deduce how the mathematical rules work. That paper shows some evidence of that and that’s just from a relatively dumb predict what comes next model.

It’s hard to predict timelines for this kind of thing, and people are notoriously bad at it. Nobody would have predicted the results we’re seeing today in 2010. What would you expect to see in the years leading up to AGI? Does what we’re seeing look like failure?

https://deepmind.com/blog/article/muzero-mastering-go-chess-...

First, nobody is claiming that people were living in caves before ML. I understand you're exaggerating for effect -- but that's the same thing the parent comment is doing when they say something "wasn't possible" 5 years ago. They don't mean that it was literally impossible, they mean that it was sufficiently bad that a typical consumer would be unlikely to use it back then -- whereas now the quality has improved to the point where these things are ubiquitous.

Similarly, both Amazon [1] and online pizza ordering [2] existed before 1995. They were just not commonly used.

[1] https://en.wikipedia.org/wiki/Amazon_(company)

[2] https://www.zakon.org/robert/internet/timeline/index.html

>they mean that it was uncommon for a typical consumer to experience it back then.

Siri from Apple was launched in 2011, as some other commenter noted below. Also, "On June 14, 2011, Google announced at its Inside Google Search event that it would start to roll out Voice Search on Google.com during the coming days".

If it does not count as 'typical consumer to experience it', well, I do not know what counts then.

9 years ago, I mind you, not 5. And I think that 5 years ago voice recognition was more-or-less good already. In 4 years both Apple and Google acquired large enough datasets to learn from, afer initial launch of their products in 2011.

What we are still struggling with is proccesing of fuzzy queries, something among the lines of 'Siri tell me which restaurant in my area serves the most delicious sushi according to yelp reviews and also allows takeout', but this is not a voice recognition problem (though typical consumer can think it is).

> ‘Siri tell me which restaurant in my area serves the most delicious sushi according to yelp reviews and also allows takeout’

Siri stumbles at way less complex queries than that. Every year or so I retry using it, and give up due to the error rate. An accuracy of 99% and 10x slower is apparently preferable for me.

My experience has been very different. I use an Alexa purely to control lights and set alarms, and have enough misses at just those tasks that I don't consider it particularly good at them.

I'd take a literal clapper that hooked into smartbulbs over it at this point.

Live captions for English video or audio are nice, but it still doesn't work for music (not even rap), and it doesn't work for other languages. It might work in a lab setting, but doesn't in currently available phones.
Along the same vain, Siri was launched in 2011.