|
|
|
|
|
by vkkhare
1828 days ago
|
|
How do you train newer models then? From what I read you use public datasets to train your models but what about in future? You would need some kind of data collection mechanism? Gpt-2 and gpt-3 are great but the datasets they are trained would soon get old. |
|
There are public sources of search data that we can use with transfer learning against large scale language models like GPT-3, and that are updated regularly. Transfer learning works well without needing massive data sets with this sort of data (phrases mapped to intents).
Having said that, the app tracks the intents and topic profiles of searches (not the search itself, just for example FoodPlaceSearchIntent) and whether the execution was likely a good result or not based on signals (like whether the search was likely repeated or rephrased - again without recording the actual search), and the models learn from that. We're adding signals including anonymized upvote/downvote as well.
Approaches like differential privacy are something we want to pursue more in future. We are still very early days!