email charsets again, Encode.pm

Hello Again, I'm the guy who wants utf-8 interface and iso-8859-2 emails. :-P So, I dug into 2.0beta4 now (who says never say it's too late?) and I see Encode got a facelift. Still I believe it isn't what's planned, for me it looks misdesigned. I am attaching a _preliminary_ module to handle "any charsets". The internal working is simple: everything is handled as UTF-8 internally since it can handle everything, things get converted on entry and exit. Every input to the object has to be either utf-8 already (perl's internal), or being anything else, in which case object should be told what encoding is it in, apart from counting on the default. Object recodes it to utf8 and store it. Every output could be in any encoding, either utf-8 (in which case there is no recoding) or anything else, in which case internal utf-8 representation gets recoded. Coding errors are handled differently than OTRS does: 1) if nobody screws up anything (eg. the default encoding is the same as non-utf8 input encoding) nothing can go wrong :) 2) if input encoding was screwed output _will_ be generated without error messages in apache log :) using perlQQ encoding (which basically \x{NNNN} for every broken chars). This could be changed to HTML/XML-style encoding in the code if anyone likes it better (both looks ugly anyway) 3) check_get() could be used to check brokenness. The code doesn't contain anything for OTRS, no checking of Encode ability, no handling of fileio :utf8 handles, simply because I don't have the time. If Martin/OTRS guys find it useful, feel free to use it, incorporate it to Encode.pm or whatever. * * * Main reason is my old patch which uses EmailCharsetForced setting to create 8859-2 emails while using utf-8 interface. I still need it, and my universal encode object makes it pretty easy to drop unicode messages in and get 8859-2 (or whatever) out an email it. I don't think I'll have time to rewrite Encode.pm right now :( Peter -- Now using M2, Opera's e-mail client: http://www.opera.com/m2/

On Mon, 18 Jul 2005 15:15:28 +0200, Peter Gervai
Main reason is my old patch which uses EmailCharsetForced setting to create 8859-2 emails while using utf-8 interface.
Something pretty similar patch attached for beta4. Header encoding (which was reported somewhere in bugzilla) seems to be screwed due to a weirdness in OTRS (seems to give iso-8859-x encoded string to MIME::Words) and MIME::Words itself (seems to ignore source encoding, and ignore some utf8 sequences in encoding process); my patch does not touch that, so it stays broken (I didn't have the time to figure out what changes utf-8 encoded subject line to 8859-1 or -2 for mime::words which screws it up then based on wrong encoding). Patch uses global config EmailForcedCharset, plus the already posted UniString (renamed to a more proper UniStringEnc) in the Kernel root (to make it easier for me to upgrade otrs :); could be moved anywhere later.) Works For Me(tm). Th-th-th-that's all folks! Peter -- Now using M2, Opera's e-mail client: http://www.opera.com/m2/
participants (1)
-
Peter Gervai