| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dangrsmind 7096 days ago

I'd say I'm a little skeptical.

The first question I'd have is how fast they can parse video. The second is how much it costs to do it.

It seems you would have to be able to do recognition much faster than real-time for a realistic web video search capability (see for example http://ieeexplore.ieee.org/xpl/freeabs_all.jsp?arnumber=599600) and you would certainly need a lot of hardware to do this at scale for millions of video clips.

See also: http://www.newmediamusings.com/blog/2005/09/blinkx_a_citize.html

1 comments

jwp 7095 days ago

The first link you cited is spot on. The authors are from Univ of Cambridge, and work on HTK <http://htk.eng.cam.ac.uk/>.

That paper is 10 years old. As I'm sure you can imagine, there have been improvements in the field since then. To be completely honest, I don't stay on top of search applied to speech, but the keyword you want is "Spoken Document Retrieval" (SDR). Ciprian Chelba and TJ Hazen do cool stuff in this area; they are giving a tutorial at ICASSP this year SDR.

An aside. Both of these approaches use the fact that when you process speech, you essentially form a graph of words (or phonemes). Paths through the graph represent possible transcriptions. So, since graph is a denser, richer thing to search than the transcript, and we've got graph algorithms sitting around, there are neat tricks you can do to build a search engine index for speech...

I've recently been reading some interesting work that uses locality-sensitive hashing to search audio. The Google speech people are presenting a lot of it at ICASSP this year. See this post for more, and chase the links in their papers for even more: <http://googleresearch.blogspot.com/2007/02/hear-here-sample-of-audio-processing.html>

link

dangrsmind 7095 days ago

Thanks for the information and links. My background is in video and image processing, well originally multiple target tracking, sensor management, and sensor fusion, but now I work in biometrics and video analytics. Understood about processing the information into a graph.

Your point about Google raises one of the obvious questions about this company... if Google is doing leading edge research in this field it seems unlikely they need to buy a "video search destination" site employing lesser technologies, that is unless it gets really really big (i.e. YouTube). They might be interested in some deep technology, but my impression from the reading I've done and the links you've posted is that Blinkx is using standard well known techniques to achieve their results.

FWIW: I was applying Markov modeling to areas such as mission planning and modeling integrated air defense networks back almost twenty years ago now. We didn't call them HMMs, but there were some very similar ideas employed.

link

jwp 7095 days ago

Hmm, perhaps we should talk. Email me at e40.32313371@bloglines.com if you're interested.

link