| HN Mirror

This works better because it gives a secondary set of conditions for which the decoder (generating text) is conditioning its generation. Assume instead of their demo you are doing Speech2Text for Oncologists. Out of the Box Whisper is terrible because the words are new and rare, especially in YouTube videos. If you just run ASR through it and run NER, it will generate regular words over cancer names. Instead, if you condition generation on topical entities the generation space is constrained and performance will improve. Especially when you can tell the model what all the drug names are because you have a list (https://www.cancerresearchuk.org/about-cancer/treatment/drug...)