| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vegabook 4055 days ago

First of all I don't think the print statement is a small issue. I bet 5% of my code is print statements. I love the print statement's unfussy "can do" mentality. Hey, the whole reason I got into Python was because it was utterly barebones. Print "hello world". that's it. You can almost feel the fun a young GvM might have had making it.

Then all this u"xx" stuff on strings. What's that about? I don't care about unicode. 256 ascii characters is fine for me. If I need Unicode I can do it, but I don't need it by default.

Iterators instead of ranges. Fine. Why again? To save memory? I've got 32 billion bytes. I don't need it. It's complicating something to please the computer scientists and it's messing with my mind which has far bigger problems to solve, so it's unpragmatic. If I see htop moaning about memory, I can easily change my code.

Generally, I just like the 2.0 attitude. It's carefree. It just works. Three is trying too hard. Python is just a tool for me. My really hard problems are my daily battles with nonconvex optimization of vast data sets.

3 comments

tedmiston 4055 days ago

> Then all this u"xx" stuff on strings. What's that about? I don't care about unicode. 256 ascii characters is fine for me. If I need Unicode I can do it, but I don't need it by default.

I felt this way before I started spending all of my time on web apps. It's reading user input data from some random public source, like Twitter, that forces it upon you. Then, so quickly it became the best practice to "unicode all the things". I think of analogous to how we store timestamps in UTC always.

link

jMyles 4055 days ago

One difference, though, is that time enjoys a certain natural and intrinsic consensus. For example, we all agree that observable time always flows forward at the same rate.

OTOH: Which characters do and don't belong in unicode and in what order? I don't fucking know. :-)

link

KMag 4055 days ago

> OTOH: Which characters do and don't belong in unicode and in what order? I don't fucking know. :-)

Should we use decimalized time or time based on the Babylonian base 60/12 system? Both have clear advantages. I don't fucking know. :-)

The world has standardized on Unicode, which (as a collection of expanding standards) defines the set of valid code points and their order. There's still some debate as to UTF-8 vs. UTF-16LE (and perhaps UTF-16 w/BOM and UTF-32) encodings, but Unicode has clearly won. It's not perfect, but it's silly to pretend Unicode hasn't won.

Source: I used to work as an engineer on the content converter portion of Google's indexing system, which took the world's web pages, PDFs, etc. and converted them into a unified format (the text portion of which is encoded as UTF-8) for the rest of the indexing system. Sure, we saw some percentage of EUC-KR, GB2312, Big5, and Win CP1252 text, but Unicode has clearly won and UTF-8 and UTF-16LE are steadily replacing all other encodings.

link

delluminatus 4055 days ago

I think you're confused about the strings. Python 3 uses UTF8 strings by default. The u"xx" syntax is only needed in python 2.x. In Python 3.x it's only supported for backwards compatibility.

link

vegabook 4055 days ago

right thanks for clarifying - I wasn't confused until Python 3 made it confusing. EDIT: Ah I remember. In one of my attempts to move to 3 I wrote a few modules which worked with strings, then I had to stick "u"s everywhere when I passed them to my 2 code base.

link

KMag 4055 days ago

> Then all this u"xx" stuff on strings. What's that about? I don't care about unicode. 256 ascii characters is fine for me. If I need Unicode I can do it, but I don't need it by default.

But plenty of people were writing code that conflated bytes and strings and broke in subtle ways on systems with non-English locales. Maybe you don't notice the bugs, so you don't care, but that doesn't mean that it's not a major flaw in Python 2. Here in Asia, plenty of people may be cursing your silently buggy Python 2 unicode-naive code.

It's similar to a pair of time-handling functions in a domain-specific language I use at work sometimes. The lazy way of converting Times to Strings (arguably reasonably) uses the current processes's time zone to render the Time. Likewise, Time's constructor uses the current processes's time zone when constructing a Time form a String. Someone decided to write a pair of functions that shift a given time forward and backward by the current processes's time zone's UTC offset on the day in question. The one function gives you a Time that if rendered using lazy String formatting (usually) gives you the same String you would have gotten if you had used a function that took a time zone as an argument and passed UTC as the time zone. The inverse function allows you to take a String representation of a UTC time, pass it to Time's constructor without specifying a time zone, and then shift that time so that you (usually) get the same Time you would have gotten had you specified the correct time zone in Time's constructor. This sounds absolutely insane, but it mostly works, except in corner cases that cross DST changes. (Note that 2014-03-09 02:31 America/New_York just doesn't exist, but 2014-03-09 02:31 UTC is a perfectly sensible time.) We now have a jslint-like program that issues a warning if you use either of these functions. Freely guessing at the correct conversion back and forth from byte arrays to code points is very much like this insanity of mostly-working functions to shift times to make their conversions to and from Strings mostly work without having to specify time zones in the Time or String constructors.

Similarly, if your language silently converted April 74th 2014 to June 3rd 2014, it would "just work" for 99.9% of use cases, but it would also hide some bugs. People would complain if the language's designers changed the language's next major version to stop accepting such non-sensical dates. Silently converting back and forth between bytes and code points is more subtle, but similarly insane in hiding bugs that 99.9% of the time would go unnoticed.

Sure, silently ignoring locale conversions of times and strings mostly just works, but that doesn't mean it's not terribly broken. PHP rightly gets a lot of criticism for often following the philosophy of "do what I probably meant, and it will just work 99.9% of the time and cause insane bugs 0.1% of the time".

link