| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by chimtim 3393 days ago
	what is the "video" bit here? This is just running image recognition on a bunch of frames.

2 comments

ramramanathan 3393 days ago

We are talking about our underlying tech at Next conference - https://goo.gl/3ihXth We are clearly using frame level annotations, but we also have additional models to aggregate visual and additional information to provide aggregate level entities at the shot level or video level. PM at Google

link

catshirt 3393 days ago

how do you know the implementation details?

it would be completely naive to implement it that way, considering there is an entirely new attribute video applies over images which of course is "time".

I don't know shit about ML- talking out of my ass here- but I'd be surprised if the algorithms didn't account for changes over time or canonical entity recognition (is this the same boat that was in the last image)?

link

chimtim 3393 days ago

The linked press release shows an animal is detected -- tiger etc. It does not say tiger running or hunting, which is where the time component would have been used.

link

catshirt 3393 days ago

the press release says:

> nouns such as “dog,” “flower” or “human” or verbs such as “run,” “swim" or “fly”

that out of the way... i suspect you wouldn't need video to detect those things...

and the screenshot you're referring to is an specific application of the API... not a kitchen sink:

> It can even provide contextual understanding of when those entities appear; for example, searching for “Tiger” would find all precise shots containing tigers across a video collection in Google Cloud Storage.

link

timc3 3393 days ago

I have seen it detect that a car is drifting..

link