I thought the GIL was not held during execution of foreign code in python (at least that was one point given for why the GIL wasn't a big deal in practice).
No, it must be explicitly released. The GIL must be held to invoke almost all Python runtimes (main exceptions: acquiring the GIL, low-level allocator).
I'm fairly certain you're incorrect. With the GIL you don't have to lock shared memory because the assumption is that only one thread will be running at a time. For example shared data structures won't be changed while being being read/written to by multiple threads, because only one thread is actually running.
You are entirely mistaken, unless all you care about are the basic built in dict/list and some of the other built in data structures AND each thread only stores OR reads data (i.e. never reads and then stores it again), from a SINGLE container (you never care about consistent state between two different objects).
In my experience this is almost never the case. Moreover, this type of synchronization is trivial to accomplish with relatively little performance sacrifice.
What is much more complicated is getting more complex logic work correctly and performantly when you are interacting with multiple different data structures from something more than a saturated loop.