| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zahlman 304 days ago
	Such languages do not have strings. Definitionally a string is a sequence of characters, and more than 256 characters exist. A byte sequence is just an encoding; if you are working with that encoding directly and have to do the interpretation yourself, you are not using a string. But if you do want a sequence of bytes for whatever reason, you can trivially obtain that in any version of Python.

1 comments

capitainenemo 304 days ago

My experience personally with python3 (and repeated interactions with about a dozen python programmers, including core contributors) is that python3 does not let you trivially work with streams of bytes, esp if you need to do character set conversions, since a tiny python2 script that I have used for decades for conversion of character streams in terminals has proved to be repeated unportable to python3. The last attempt was much larger, still failed, and they thought they could probably do it, but it would require far more code and was not worth their effort.

I'll probably just use rust for that script if python2 ever gets dropped by my distro. Reminds me of https://gregoryszorc.com/blog/2020/01/13/mercurial%27s-journ...

link

zahlman 304 days ago

> a tiny python2 script that I have used for decades for conversion of character streams in terminals has proved to be repeated unportable to python3.

Show me.

link

capitainenemo 304 days ago

Heh. It always starts this way... then they confidently send me something that breaks on testing it, then half a dozen more iterations, then "python2 is doing the wrong thing" or, "I could get this working but it isn't worth the effort" but sure, let's do this one more time. Could be they were all missing something obvious - wouldn't know, I avoid python personally, apart from when necessary like with LLM glue. https://pastebin.com/j4Lzb5q1

This is a script created by someone on #nethack a long time ago. It works great with other things as well like old BBS games. It was intended to transparently rewrite single byte encodings to multibyte with an optional conversion array.

link

zahlman 303 days ago

> then they confidently send me something that breaks on testing it, then half a dozen more iterations, then "python2 is doing the wrong thing or, 'I could get this working but it isn't worth the effort'"

It almost works as-is in my testing. (By the way, there's a typo in the usage message.) Here is my test process:

  #!/usr/bin/env python
  import random, sys, time
  
  
  def out(b):
      # ASCII 0..7 for the second digit of the color code in the escape sequence
      color = random.randint(48, 55)
      sys.stdout.buffer.write(bytes([27, 91, 51, color, 109, b]))
      sys.stdout.flush()
  
  
  for i in range(32, 256):
      out(i)
      time.sleep(random.random()/5)
  
  
  while True:
      out(random.randint(32, 255))
      time.sleep(0.1)

I suppressed random output of C0 control characters to avoid messing up my terminal, but I added a test that basic ANSI escape sequences can work through this.

(My initial version of this didn't flush the output, which mistakenly lead me to try a bunch of unnecessary things in the main script.)

After fixing the `print` calls, the only thing I was forced to change (although I would do the code differently overall) is the output step:

  # sys.stdout.write(out.encode("UTF-8"))
  sys.stdout.buffer.write(out.encode("UTF-8"))
  sys.stdout.flush()

I've tried this out locally (in gnome-terminal) with no issue. (I also compared to the original; I have a local build of 2.7 and adjusted the shebang appropriately.)

There's a warning that `bufsize=1` no longer actually means a byte buffer of size 1 for reading (instead it's magically interpreted as a request for line buffering), but this didn't cause a failure when I tried it. (And setting the size to e.g. `2` didn't break things, either.)

I also tried having my test process read from standard input; the handling of ctrl-C and ctrl-D seems to be a bit different (and in general, setting up a Python process to read unbuffered bytes from stdin isn't the most fun thing), but I generally couldn't find any issues here, either. Which is to say, the problems there are in the test process, not in `ibmfilter`. The input is still forwarded to, and readable from, the test process via the `Popen` object. And any problems of this sort are definitely still fixable, as demonstrated by the fact that `curses` is still in the standard library.

Of course, keys in the `special` mapping need to be defined as bytes literals now. Although that could trivially be adapted if you insist.

link

capitainenemo 303 days ago

Sorry, I'm not a python guy, do you have a script you'd like me to run against python3? Just toss me a pastebin link, and ideally the version of python3 to run, since half the python3 scripts on my system seem to require a different version of python3 from the other half and a variety of isolated sets of python libs in virtual environments (heck, pip even warns you not to try installing libs globally so everyone can use same set these days). I'd rather not try to follow a set of suggestions and then be told I did it wrong.

As for typo, yep. But then, I've left this script essentially untouched for a couple of decades since I was given it.

link

zahlman 303 days ago

> do you have a script you'd like me to run against python3? Just toss me a pastebin link, and ideally the version of python3 to run

Here's a diff:

  diff --git a/ibmfilter b/ibmfilter
  index 245d32c..2633335 100755
  --- a/ibmfilter
  +++ b/ibmfilter
  @@ -1,6 +1,5 @@
  -#!/usr/bin/python2 -tt
  -# vim:set fileencoding=utf-8
  - 
  +#!/usr/bin/python3
  +
   from subprocess import *
   import sys 
   import os, select
  @@ -10,8 +9,8 @@ special = {
   }
    
   if len(sys.argv) < 2:
  -    print "usage: ibmfilter [command]"
  -    print "Runs command in a subshell and translates its output from ibm473 codepage to UTF-8."
  +    print("usage: ibmfilter [command]")
  +    print("Runs command in a subshell and translates its output from ibm473 codepage to UTF-8.")
       sys.exit(0)
    
   handle = Popen(sys.argv[1:], stdout=PIPE, bufsize=1)
  @@ -26,8 +25,10 @@ while buf != '':
           os.kill(handle.pid)
           os.system('reset')
           raise Exception("Timed out while waiting for stdout to be writeable...")
  -    sys.stdout.write(out.encode("UTF-8"))
  - 
  +    sys.stdout.buffer.write(out.encode("UTF-8"))
  +    sys.stdout.flush()
  +
       buf = handle.stdout.read(1)
    
   handle.wait()

I already have tested it and it works fine as far as I can tell on every version since at least 3.3 through 3.13 inclusive. There's really nothing version specific here, except the warning I mentioned which is introduced in 3.8. If you encounter a problem, some more sophisticated diagnostics would be needed, and honestly I'm not actually sure where to start with that. (Although I'm mildly impressed that you still have access to a 2.7 interpreter in /usr/bin without breaking anything else.)

If you want to add overrides, you must use bytes literals for the keys. That looks like:

  b'\xff': 'X'

> (heck, pip even warns you not to try installing libs globally so everyone can use same set these days)

Some Python programs have mutually incompatible dependencies, and you can't really have two versions of the same dependency loaded in the same runtime. This has always been a problem; you're just looking at the current iteration of pip trying to cooperate with Linux distros to help you not break your system as a result.

"Using the same set" is not actually desirable for development.

link