|
|
|
|
|
by cwooley
127 days ago
|
|
Interesting methodology. How much of this translates to the newer speech-to-speech models (like GPT-4o realtime) where there's no separate STT step? Seems like Phase 1 (Transcription Analysis) becomes less relevant when the model is processing audio natively. Does that make injection harder or just different? |
|