|
|
|
|
|
by plaidfuji
557 days ago
|
|
I’m not sure I would call this a failure.. more just something you tried out of curiosity and abandoned. Happens to literally everyone. “Failed” to me would imply there was something fundamentally broken about the approach or the dataset, or that there was an actual negative impact to the unrealized result. It’s very hard to finish long-running side projects that aren’t generating income, attention, or driven by some quasi-pathological obsession. The fact you even blogged about it and made HN front page qualifies as a success in my book. > If I would have finished the project, this dataset would then have been released and used for a number of analyses using Python. Nothing stopping you from releasing the raw dataset and calling it a success! > Back then, I would have trained a specialised model (or used a pretrained specialised model) but since LLMs made so much progress during the runtime of this project from 2020-Q1 to 2024-Q4, I would now rather consider a foundational model wrapped as an AI agent instead; for example, I would try to find a foundation model to do the job of for example finding the right link on the Tagesschau website, which was by far the most draining part of the whole project. I actually just started (and subsequently —-abandoned—- paused) my own news analysis side project leveraging LLMs for consolidation/aggregation.. and yeah, the web scraping part is still the worst. And I’ve had the same thought that feeding raw HTML to the LLM might be an easier way of parsing web objects now. The problem is most sites are privy to scraping efforts and it’s not so much a matter of finding the right element but bypassing the weird click-thru screens, tricking the site that you’re on a real browser, etc… |
|
The piece reads to me like a direct and honest confrontation with failure. It means the author thinks they can do better and is working to identify unhelpful subconscious patterns and overcome them.
Personally, I found the author's laser focus on "data science projects" intriguing. I have a tendency to immediately go meta which biases towards eliding detail; however, even if overly narrow, the author's focus does end up precipitating out concrete, actionable hypotheses for improvement.
Bravo, IMHO.