Hacker News new | ask | show | jobs
by pesenti 4147 days ago
Some context on the new services. They are built on technology that comes from IBM Research and has been moved into the Watson group in 2014. Some like speech, have been developed for more than 50 years. None of these technologies have overlap with the Watson Jeopardy stack (except for the Watson voice). We will release that stack later this year as a series of services allowing you to build a full Q&A/dialog application.

All the Watson services are still in beta but will start going GA very soon (first one next month). If you have any questions, please fire up, the Watson team is ready to answer.

8 comments

> allowing you to build a full Q&A/dialog application.

> If you have any questions, please fire up, the Watson team is ready to answer.

So that's what you built Watson for :-)

We'd love to do an automated AMA. We are not yet there but if the community provided some training data I believe it's within reach. Give us a couple of years!
Is the Watson platform dependent on the hardware, or do you keep updating the hardware that it runs on?
Given that our strategy is to expose most Watson technology as cloud services we will keep updating the hardware underneath in a way that seamless to the user. We try to leverage the Power architecture as much as possible.
Thanks, that's what I had assumed, however seeing the hardware behind Watson on Jeopardy threw me off [1]. I'm guessing that that was just the first stage.

[1] http://www.kurzweilai.net/images/IBM-Watson.jpg

Yeah it was. At the risk of waxing about old history, when we demoed the first large vocabulary speech recognition system back in 1984 it ran on a bunch of IBM mainframes. Within two years it was running on a PC with some special purpose cards. Today much more powerful recognizers run locally on smartphones. We have always found we can shrink something down once we solve the basic problem, and it is important not to let computational limitations prevent you from seeing the best solution.
I find it terribly confusing. It does not explain what instances are, do I need an instance to access some of the services?

I just want to access some services via API from my own servers. I think the documentation is not that good, there should be curl examples at least. For instance, for the STT or TTS include some curl examples.

Does the STT have speaker identification or does it output text in one stream?

I tried to access: https://gateway-s.watsonplatform.net:8443/speech-to-text-bet...

I used my bluemix l/p. It did not work. Are there other api credentials that are needed?

Yes, the API credentials for the service are different from the Bluemix login. To get the API credentials, you have to create a service through Bluemix, bind it to a Bluemix application and get the credentials from the VCAP_SERVICES of that Bluemix app. There's a getting started page describing these steps at http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl... (We hope to make this process simpler in the future!)
Also in terms of usage, If we are using STT, how many simultaneous jobs can run on one watson app/service? Or is it a 1-1 ratio?
Thanks. Also, Does the STT have speaker identification/diaritization or does it output all merged text in one stream?
The code that is being used in the demo: https://speech-to-text-demo.mybluemix.net

is in the watson-developer-cloud organization in github: https://github.com/watson-developer-cloud/speech-to-text-nod...

In fact, all the samples have the code there.

At present it does not do diarization, all text is output in one stream.
Do any of the Watson services allow for feedback to train them?
Yes, all services include a feedback API, and the demos also include a mechanism for providing feedback. As an example, see 4th paragraph in this doc, which also includes a link to the API docs: http://ibm.co/1yNfztF And here's a link to the demo, see the "Give us feedback" link: http://bit.ly/1EJllDF
We want feedback on all our services. If you are speaking about using data to update the service, I know the speech services do not yet have this capability.
Do you plan to open source some of your stuff (voice recognition, speech synthesis, gazetteers, UIMA related code)?

Watson Jeopardy itself is built on top of Apache open source stack (Apache UIMA and Hadoop): http://en.wikipedia.org/wiki/UIMA

honestly we have not gotten that far yet - at least on the speech technology side. Good discussion to have.
Are you working on any audio (non-speech) analysis services? I have no particular usecase in mind, but it's an area I'm always interested in!
We have worked on audio analytics in the past for things such as outdoor sound detection and vehicle identification. We are currently focusing on speech-based analytics such as language ID and affect recognition. The statistical methodologies we are using for speech are easily extended to such domains. We hope that by puuting out these initial speech services we will get feedback from the community about related problems and welcome your suggestions.
You might want to check out Echonest's API - http://developer.echonest.com/
What techniques are being used for text to speech? Is is something deep learning related or more standard HMM synthesis? Any paper references?
According to the documentation[1], it's a concatenative synthesizer using decision trees for prosody modeling and PSOLA for output.

[1]: http://www.ibm.com/smarterplanet/us/en/ibmwatson/developercl...

Thanks! I am working in this area and have some ideas for deep learning type methods which move away from concatenative synthesis. It will be nice to compare to what they are using.
We did some work on applying NNs to prosody prediction; see Fernandez, Raul, et al. "Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks." Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH). 2014.
This paper (from ICASSP2013) may be of interest to you: https://static.googleusercontent.com/media/research.google.c...
Great, I'm waiting for it, actually I can't do so much with the preloaded domain on Q&A service.
Text-to-speech is very impressive. Thanks!