| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by userhacker 1334 days ago

For revoldiv.com we have profiled, many gpus, the best one is 4090. We do a lot of intelligent chunking and detect word boundaries and run the model in parallel in multiple gpus and we get about 40 to 50 seconds for an hour long audio but without expect 7 minutes for an hour long audio on tesla t4

  on tesla-t4-30gb-memory-8vcpu google cloud
   on tiny and tiny.en
    for 10 minute = 30 seconds
   on medium
    for 10 minute = 1m 30s
    for 60 minute = 7m
   on large
    for 60 miutes = 13m
  on NVIDIA GeForce RTX 4090
   on tiny
    for 10-minute = 5.5 seconds
    for 60-minute = 35 seconds
   on base
    for 10-minute = 7 seconds
    for 60-minute = 50 seconds
   on small
    for 10-minute = 14 seconds
    for 60-minute = 1 min 35 sec
   on medium
    for 10-minute = 26 seconds
    for 60-minute = 3 mins
   on large
    for 10-minute = 40 seconds
    for 60-minute = 3 min 54 sec

1 comments

getcrunk 1334 days ago

thats crazy. 35 seconds for 60 min on base with a 4090. wow! thanks for the info! also btw i mentioned on another thread i was getting an error. but are you planning on offering this as a paid api?

link

userhacker 1334 days ago

Can you send me the audio that caused it, you can email me at team AT revoldiv .com. If there is going to be a lot of interest, yes we can provide it as an api service. Our service has some niceties like word level timestamp, paragraph separation, sound detection etc... for now it is a free service you can use as much as you want

link