I'd be surprised if this question could have an off hand answer. Doesn't sound like something that could have scalability predictable enough to do back of the envelope calculations on.
Yes, good guess! That's the size we have after deduplication across projects at https://www.softwareheritage.org/ . We archive all the source code we can find; and would like to support some sort of full-text search on it at some point, so Glean looks interesting