| Thanks! Comparison between news outlets is on the radar, once I improve the tool based on feedback from this Show HN. My backend uses spaCy (NER/NLP), textacy [1] for extracting subject-verb-object triples, and coreferee [2] for coreference resolution. I did build a custom spaCy model for which I manually annotated over 700 crash-related news articles in Label Studio [3]. This custom model identifies counterfactuals, framing (e.g. thematic elements), and what's called CARLIKE, because vehicles can be referred to in so many ways (year-make-model, year-model, color model, generic terms like pickup/sedan/truck, nicknames like Chevy instead of Chevrolet, etc. Coreference resolution was probably my favorite part of the NLP analyzer. For example: "A woman was injured after being struck by a vehicle. She was walking on Washington Street when the incident took place." Now we can identify "woman" as a pedestrian because "she" was walking. Implementing that coreference resolution felt magical, because now the tool can pick up so many more issues that it couldn't before when it only looked at individual sentences. I'll be writing an in-depth article about my implementation and journey, and I'll be sure to shoot you an email to chat with you some time. [1] https://github.com/chartbeat-labs/textacy [2] https://github.com/msg-systems/coreferee [3] https://labelstud.io/ |