Hacker News new | ask | show | jobs
by pyre 4270 days ago
> I have neither the experience, nor the inclination for internationalization of software.

Taking a piece of software and making all of the UI language localized is one thing. Making sure that your program doesn't blow up if it encounters UTF-8 is another thing. Nowadays if your program chokes on UTF-8, I think it's safe to just consider it broken.

In any case, looks like this is really where the issue may lie:

  # for non-English characters
  def getRealLengh(str):
      length = len(str)
      for s in str:
          if ord(s) > 256:
              length += 1
      return length
and:

          for val in shn.row_values(n):
              try: val = val.replace('\n',' ')
              except: pass
              val = isinstance(val,  basestring) and val.strip() or str(val).strip()
              line += val + ' ' * (30 - getRealLengh(val))
          vim.current.buffer.append(line)
In accounting for the fix-width layout of non-ASCII characters.
2 comments

Are UTF-8 encoded Excel documents actually common? Do they even exist? I thought Excel used CP 1252 on English Windows and the corresponding code pages on other language versions?
There are two types of formats generally recognized as XLS: Excel 5.0/95 "BIFF5" and Excel 97-2003 "BIFF8". The former uses a language-specific codepage like 1252 and the latter can use a language-specific codepage or the more general 1200 (UTF16LE).

Here is the master list of codepages used by Excel: https://github.com/SheetJS/js-codepage/blob/master/excel.csv (disclaimer: I built this as part of the in-browser XLS parser https://github.com/SheetJS/js-xls)

I'm pretty sure that xlrd decodes it all to unicode() in Python, so that should be a moot point. You would only need to worry about passing it as utf-8 to Vim at that point.
How would it save a document containing multiple languages, then?
Excel 97-2003 (XLS) actually uses UTF16LE in that case, not UTF8. Excel 2007+ XLSB exclusively uses UTF16LE -- there is no way to force it to use a codepage
Interesting definition of broken. It seems to work perfectly for me and the creator.
> Interesting definition of broken.

Consider this: A medical device that people's lives depend on. It only fails in 1/1000 cases causing death. Many people could state, "it works for me, can't be broken!" On the other hand, the families of the dead could argue that it is broken. Who is right?

Obviously this isn't such an extreme case. No one has their life depending on a Vim plugin, but it illustrates a point. "Works for me" doesn't necessarily imply "isn't broken."

I think the point is that for many people building free software for fun, "works for me" is all that matters. Testing use cases that you know you will never encounter is not interesting or challenging (at least in this case), but it takes time, and you're not making a product, nobody is relying on you. Why bother?

The author of this plugin isn't trying to make a spreadsheet competitor, they just released it publicly because other people might find it useful or interesting.

I think that we can still consider it broken without demanding that the author 'do it better.' People make broken things all of the time.
We can consider it incomplete, or disagree with the design choices, but broken means it doesn't do what it's supposed to do. Here, it does everything the author intended it to do, and everything the description says it does. When you say it's broken, it sounds at least to me like you're imposing your own requirements on a project you have nothing to do with. It's like calling Microsoft Office broken because it can't handle Open Office files. It's not broken, it just doesn't have all the features you would have included.
It's totally cool to release some software that has some bugs. It's also sad to program in an environment that isn't Unicode-friendly.
This is an imperfect analogy. The medical device is not being given away for free.
The analogy very well might be imperfect. It doesn't seem like this particular distinction is all too valid though.