| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pesenti 4194 days ago
	Some context on the new services. They are built on technology that comes from IBM Research and has been moved into the Watson group in 2014. Some like speech, have been developed for more than 50 years. None of these technologies have overlap with the Watson Jeopardy stack (except for the Watson voice). We will release that stack later this year as a series of services allowing you to build a full Q&A/dialog application. All the Watson services are still in beta but will start going GA very soon (first one next month). If you have any questions, please fire up, the Watson team is ready to answer.

8 comments

pgeorgi 4194 days ago

> allowing you to build a full Q&A/dialog application.

> If you have any questions, please fire up, the Watson team is ready to answer.

So that's what you built Watson for :-)

link

pesenti 4194 days ago

We'd love to do an automated AMA. We are not yet there but if the community provided some training data I believe it's within reach. Give us a couple of years!

link

josu 4194 days ago

Is the Watson platform dependent on the hardware, or do you keep updating the hardware that it runs on?

link

pesenti 4194 days ago

Given that our strategy is to expose most Watson technology as cloud services we will keep updating the hardware underneath in a way that seamless to the user. We try to leverage the Power architecture as much as possible.

link

josu 4194 days ago

Thanks, that's what I had assumed, however seeing the hardware behind Watson on Jeopardy threw me off [1]. I'm guessing that that was just the first stage.

[1] http://www.kurzweilai.net/images/IBM-Watson.jpg

link

picheny 4194 days ago

Yeah it was. At the risk of waxing about old history, when we demoed the first large vocabulary speech recognition system back in 1984 it ran on a bunch of IBM mainframes. Within two years it was running on a PC with some special purpose cards. Today much more powerful recognizers run locally on smartphones. We have always found we can shrink something down once we solve the basic problem, and it is important not to let computational limitations prevent you from seeing the best solution.

link

Caligula 4194 days ago

I find it terribly confusing. It does not explain what instances are, do I need an instance to access some of the services?

I just want to access some services via API from my own servers. I think the documentation is not that good, there should be curl examples at least. For instance, for the STT or TTS include some curl examples.

Does the STT have speaker identification or does it output text in one stream?

I tried to access: https://gateway-s.watsonplatform.net:8443/speech-to-text-bet...

I used my bluemix l/p. It did not work. Are there other api credentials that are needed?

link

jsstylos 4194 days ago

Yes, the API credentials for the service are different from the Bluemix login. To get the API credentials, you have to create a service through Bluemix, bind it to a Bluemix application and get the credentials from the VCAP_SERVICES of that Bluemix app. There's a getting started page describing these steps at http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl... (We hope to make this process simpler in the future!)

link

Caligula 4194 days ago

Also in terms of usage, If we are using STT, how many simultaneous jobs can run on one watson app/service? Or is it a 1-1 ratio?

link

Caligula 4194 days ago

Thanks. Also, Does the STT have speaker identification/diaritization or does it output all merged text in one stream?

link

germanattanasio 4194 days ago

The code that is being used in the demo: https://speech-to-text-demo.mybluemix.net

is in the watson-developer-cloud organization in github: https://github.com/watson-developer-cloud/speech-to-text-nod...

In fact, all the samples have the code there.

link

vaibhava72 4194 days ago

At present it does not do diarization, all text is output in one stream.

link

jcfrei 4194 days ago

Do any of the Watson services allow for feedback to train them?

link

keelyw 4194 days ago

Yes, all services include a feedback API, and the demos also include a mechanism for providing feedback. As an example, see 4th paragraph in this doc, which also includes a link to the API docs: http://ibm.co/1yNfztF And here's a link to the demo, see the "Give us feedback" link: http://bit.ly/1EJllDF

link

picheny 4194 days ago

We want feedback on all our services. If you are speaking about using data to update the service, I know the speech services do not yet have this capability.

link

frik 4194 days ago

Do you plan to open source some of your stuff (voice recognition, speech synthesis, gazetteers, UIMA related code)?

Watson Jeopardy itself is built on top of Apache open source stack (Apache UIMA and Hadoop): http://en.wikipedia.org/wiki/UIMA

link

picheny 4194 days ago

honestly we have not gotten that far yet - at least on the speech technology side. Good discussion to have.

link

ubercore 4194 days ago

Are you working on any audio (non-speech) analysis services? I have no particular usecase in mind, but it's an area I'm always interested in!

link

picheny 4194 days ago

We have worked on audio analytics in the past for things such as outdoor sound detection and vehicle identification. We are currently focusing on speech-based analytics such as language ID and affect recognition. The statistical methodologies we are using for speech are easily extended to such domains. We hope that by puuting out these initial speech services we will get feedback from the community about related problems and welcome your suggestions.

link

JimBlizz 4189 days ago

You might want to check out Echonest's API - http://developer.echonest.com/

link

kastnerkyle 4194 days ago

What techniques are being used for text to speech? Is is something deep learning related or more standard HMM synthesis? Any paper references?

link

cypher543 4194 days ago

According to the documentation[1], it's a concatenative synthesizer using decision trees for prosody modeling and PSOLA for output.

[1]: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...

link

kastnerkyle 4194 days ago

Thanks! I am working in this area and have some ideas for deep learning type methods which move away from concatenative synthesis. It will be nice to compare to what they are using.

link

picheny 4193 days ago

We did some work on applying NNs to prosody prediction; see Fernandez, Raul, et al. "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks." Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). 2014.

link

woodson 4193 days ago

This paper (from ICASSP2013) may be of interest to you: https://static.googleusercontent.com/media/research.google.c...

link

devniel 4194 days ago

Great, I'm waiting for it, actually I can't do so much with the preloaded domain on Q&A service.

link

Q6T46nT668w6i3m 4194 days ago

Text-to-speech is very impressive. Thanks!

link