Hacker News new | ask | show | jobs
by dagurp 1068 days ago
These days I just let chagpt generate a script that scrapes a site and spits out an rss file. Then I run it with cron.
2 comments

I’m guessing they paste a portion of the website’s source then tell ChatGPT to generate a script that can generate an RSS feed from that site.
Yeah I just copy the html that's relevant. There's some manual work involved but it doesn't take a lot of time.
Are you not limited by the cut off date of the content the model is trained off ?
1. the script is generated by the llm

2. the user runs the script that does the scraping

these are temporally separate actions

Fine, but it’s subject to html selectors brittleness no? Oh, you subject the raw html when you need it maybe?
Here's how I do it.

1. Tell chatGPT to create a python script that scrapes example.com and generate an rss file.

2. Paste a snippet of the html and tell it to modify the script to use that.

3. I do some minor tweaks myself to fix the date format.