|
|
|
|
|
by Apreche
876 days ago
|
|
Most people using machine learning to make search engines are replacing the search paradigm with a prompt + answer format. I think there’s an easier way. Train an ML model to be able to tell apart legit web sites from garbage ones. It’ s just a binary classification. A site should either be blocked, or not. Legit web sites being ones created by actual humans with actual content. Few to no ads. No malware, phishing, or other security threats. No content farms or SEO sites. No sites generated by other ML models. No paywalls, no pop-ups or other annoyances. Just real web sites. You’re going to need a bunch of smart and trustworthy humans to spend hours and hours to help do this classification. But a model can help multiply the effectiveness of their efforts. If the model works, then yes. You can make a very simple search engine. You just tell all the web crawlers to check the model, and only add sites to the index if the model says they are good web sites and not garbage sites. |
|
Also, tools that attempt to detect ML generated content tend not to work, and will only become less effective over time, as LLMs improve.