Hacker News new | ask | show | jobs
by ilaksh 1083 days ago
Ok. I think I understand the assignment. I don't believe you need to fine tune any model.

You can probably just use the OpenAI ChatGPT model and ask it something like:

"Does this user's tweet say anything negative about the government of ______ or contradict any of these official party statements? __________"

You can probably just ask Falcon or Llama the same thing without any training. But if you decide you have to do the fine tuning then try with my link above using the A100 GPU nodes.

I think the whole thing is nonsense though. Because whoever the arbiter of truth is always has an agenda and often makes mistakes.

2 comments

Thanks! Btw, a link to resources would still be appreciated if I need to apply the knowledge to personal projects in the future.
If the assignment is to classify the type of misinformation (assuming each tweet is misinformation) then it’s essentially topic modeling which is very doable without fine tuning as well.
This is what I was thinking about using LLM for: 1. As a feature extractor. For example, given the text of misinformation agents, what are the characteristics? C1, C2, C3, etc. Then, do these characteristics appear in these new texts? Assign a label accordingly. 2. I'll give LLM the text on how they usually behave and ask if these new ones are behaving similarly. If so, label them accordingly. (There may also be the possibility to pass graph data in a graph-less way.) 3. Use the extracted information to enhance topology-driven classification
Those might work to some extent but keep in mind the model doesn’t have access to outside information, and it’s going to be nearly impossible to build a social graph given Twitter API limits.

IMO the easiest way to fine tune your model would be to use something like BERT embeddings fine tuned with triplet loss i.e. (example, positive, negative) to train the model to minimize distance between similar examples and maximize between dissimilar ones.

Very interesting! Thank you for the idea. I will try to figure out how to do that