|
|
|
|
|
by userhacker
1288 days ago
|
|
For revoldiv.com we have profiled, many gpus, the best one is 4090. We do a lot of intelligent chunking and detect word boundaries and run the model in parallel in multiple gpus and we get about 40 to 50 seconds for an hour long audio but without expect 7 minutes for an hour long audio on tesla t4 on tesla-t4-30gb-memory-8vcpu google cloud
on tiny and tiny.en
for 10 minute = 30 seconds
on medium
for 10 minute = 1m 30s
for 60 minute = 7m
on large
for 60 miutes = 13m
on NVIDIA GeForce RTX 4090
on tiny
for 10-minute = 5.5 seconds
for 60-minute = 35 seconds
on base
for 10-minute = 7 seconds
for 60-minute = 50 seconds
on small
for 10-minute = 14 seconds
for 60-minute = 1 min 35 sec
on medium
for 10-minute = 26 seconds
for 60-minute = 3 mins
on large
for 10-minute = 40 seconds
for 60-minute = 3 min 54 sec
|
|