I think it’s better we do know because that should lead to refinements in licences (eg a GPLv4 that explicitly allows humans to write original works conceptually based off content read but not derived from code from; while explicitly disallowing code use for machine learning except where the published work is also licensed GPL).