> Our tests gave models the vulnerable function directly, often with contextual hints (e.g., "consider wraparound behavior").