|
|
|
|
|
by PaulHoule
570 days ago
|
|
Sounds like it would be straightforward to do, particularly because of how a weekly cadence would give time for stories to get voted and commented on which would give you some stats for selection. How are you categorizing articles? |
|
I'm scraping the posts every 30 minutes, and categorizing according to whether it is just a link to another page (a), or if there is actually content(b).
a => open link, and scrape said content b => is scraped from the beginning, open and scrape links if possible.
This effectively gives me an "enriched" database, so each week, I can use the "extra" data to do a "semantic search" like: Submissions that talk about Beauty, Spain, and Beauty in Spain, or different combinations of the topics. (RAG https://help.openai.com/en/articles/8868588-retrieval-augmen...).
The problem with only doing this once per week, is that content, that is within a niche topic, that didn't get sufficient upvotes, gets lost. But, I want to add the "weight" of upvotes and comments.
What do you think?