It only looks like p-hacking if you go in with the belief that the results can not possibly be correct, and even then Hanlon's Razor would make me reluctant to accuse them of p-hacking.
If an effect only occurs in a subgroup which we have no explanation for, it's a smell that different subgroups were tried until a "statistically significant" one was found by chance.