|
|
|
|
|
by jonathanstray
2561 days ago
|
|
Heh. It’s my work, so maybe I can clarify the goals. This is meant to be a proof of concept. I took a week and was able to show that relatively simple deep learning techniques are capable of generalizing over unseen form types with high accuracy. I also showed that tokens-plus-geometry is a viable format, and that hand-crafted feature engineering is still necessary (and still used in SOTA approaches). I also believe that preparing and cleaning this data set, and bringing a challenging investigative journalism problem to the attention of other researchers, would be valuable even if I hadn’t done any work on this baseline solution. This is a problem that journalists currently expend a huge amount of time and money on, which reduces the effectiveness of transparency around political ad spending information. |
|