The thing I worry about with this kind of thing is that without very careful UX integration, people take false negatives as proof that things are OK, rather than treating it as just one of many signals.
Have you tested it ? I suggest you test it with multiple screenshots (literal scam text, just a normal conversation (not a scam) and borderline doubtful conversation. The AI is pretty good at identifying them, with a confidence score and risk level.