Hacker News new | ask | show | jobs
by dopidopHN 1068 days ago
Are you not limited by the cut off date of the content the model is trained off ?
1 comments

1. the script is generated by the llm

2. the user runs the script that does the scraping

these are temporally separate actions

Fine, but it’s subject to html selectors brittleness no? Oh, you subject the raw html when you need it maybe?
Here's how I do it.

1. Tell chatGPT to create a python script that scrapes example.com and generate an rss file.

2. Paste a snippet of the html and tell it to modify the script to use that.

3. I do some minor tweaks myself to fix the date format.