|
|
|
|
|
by streetcat1
2509 days ago
|
|
Thanks. I do assume that the data format is different (alas I also assume that they are all some sort of a text file with known fields and types). But after you setup the dataset definition and defined the schema, the rest can be based on neural search? Moreover, isn't there a state of the art architecture for each of the task. E.g. Seq2Seq for machine translation. Can you just use that as a base line, and let the NAS engine search hyper param, etc? |
|
Most of our problem don't cleanly map to existing NLP tasks. State of the art often isn't as high as you think in many tasks. For example, the machine translation in relation to beta feature we're building that lets you ask the question of arbitrary single tables (kind of like wiki-tables) but we don't the know the schemas in advance or the questions the user may ask about. Outside of having the issue of having quality annotated data (which we often don't - cold start problem), we need to do more than simple model tuning. It requires building custom architectures.
But even when you consider known tasks, state of the art models do not often produce those same results on real-world data. If you put aside data quality issues (which is another huge challenge for us), in the context of question answering, the training data rarely captures the distribution of the natural language in the wild. People ask questions differently and use language that doesn't match the content in our knowledge base.
I could go on. But short answer, it's not as straightforward as you think. Even at google scale, machine learning is not solved. For everyone else with fewer data and domain-specific use cases, it's even harder.