Hacker News new | ask | show | jobs
by jean- 2792 days ago
Absolutely. If your only input is the subject line, then you're dealing with a single-sentence classification task. You'd need to take the "class label" vector from the top layer of BERT (labelled "C" in Fig 2b of the paper) and then feed that to your own classifier.

For the experiments in paper they actually fine-tuned BERT on the downstream task, but I reckon you'd get acceptable performance by just keeping it fixed and using its outputs as features for a shallow classifier.