|
|
|
|
|
by elliptic
3938 days ago
|
|
Can someone explain the following comments, for someone with some knowledge of ML but none of NLP?
"First, it's really much better to use Averaged Perceptron, or some other method which can be trained in an error-driven way. You don't want to do batch learning. Batch learning makes it difficult to train from negative examples effectively, and this makes a very big difference to accuracy"
I thought that it was typical for suitably regularized batch methods to modestly outperform or at least match (in terms of accuracy) online methods, whose main advantage is their speed. |
|
The reason is that what we're really doing here is predicting a structure (a parse tree), but we've encoded the problem as a series of local steps. Think of this like, what we want to do is navigate to a goal, and we'll do this by predicting a series of local actions.
Try stepping through the decision process.[1] This should give you a feel for the local decisions, and how they build the larger structure.
If we use an online learner, we can take advantage of an analytic method introduced in 2012 of calculating the global loss of a local action (the "dynamic oracle"), to do imitation learning.
Specifically, during training we generate examples with the parser, and label them with this "dynamic oracle". A large batch size means we're generating the examples with a model that's "out of date".
[1] http://spacy.io/displacy/?manual=Shift%20words%20onto%20the%....