[rescue] UTF-8 [was T5220 update]
jp at celestrion.net
Tue Oct 31 13:21:20 CDT 2017
On Tue, 31 Oct 2017, Mouse wrote:
> Most character encodings degrade poorly when you throw away any
> significant fraction of the data.
Indeed, most of them fare far worse.
> I'm perpetually depressed by the number of people who seem to think it's
> reasonable for them to generate UTF-8 all the time and that it's
> everyone else's duty to handle it they way they intend,
I can't speak for Lionel's MUA, but I'd lay good money on it being
up-front about the message encoding in the MIME headers. I'd lay blame on
the software that sanitized the content without running it through iconv.
> as if Unicode were some kind of God-given One True Character Set and
> UTF-8 its One True Encoding.
UTF-8 is a reasonable compromise in a world of mutually-incompatible human
scripts. It'd be Really Nice if the characters were the same width, but
that means weighing lots of 0-bytes in text versus freezing out anyone
whose languages aren't expressible in the 8-bit Latin encodings. The old
school of bickering code-pages can remind us how that goes.
For its faults, UTF-8 and Unicode are _FAR_ better than their
predecessors. There are plenty of email threads at my day job that
would be inexpressible in the older encodings because of the ways that
Big5, Shift-JIS, and CP-1252 collide.
Now, if the Unicode corporate folks could keep their politics and "Emoji"
out of it, that'd sure be nice.
> This is bad enough anywhere, but especially surprising on lists which
> are, like this one, populated with people who routinely use hardware and
> software older than a year or two.
Thompson and Pike were presenting talks on UTF-8 in the early-to-mid
1990s. Even my crufty HP-UX 11.0 boxes have UTF-8 support (although not
my 10.20 daily driver, unfortunately). Basic support should be a solved
More information about the rescue