Re: [dev] Unicode sorting

in perlop documentation (http://perldoc.perl.org/perlop.html):
Binary "cmp" returns -1, 0, or 1 depending on whether the left argument is stringwise less than, equal to, or greater than the right argument.
"lt", "le", "ge", "gt" and "cmp" use the collation (sort) order specified by the current locale if a legacy use locale (but not use locale ':not_characters' ) is in effect. See perllocale. Do not mix these with Unicode, only with legacy binary encodings. The standard Unicode::Collate and Unicode::Collate::Locale modules offer much more powerful solutions to collation issues.
What I understand from here is that if it finds a "use locale", it will try and sort binary strings with the locale indicated. Otherwise, it will use a default locale. But it won't do Unicode sorting. For unicode sorting, you need Unicode::Collate.
Our main concern was that default sorting would separate accented letters (e.g., 'a' and 'á', 'u' and 'ü', etc.) which is really confusing for our users, although our system's locale is 'es_ES.UTF-8' in every line of 'locale' output.
Kind regards,
Juan Clavero
-----Mensaje original-----
Date: Mon, 05 Nov 2012 15:03:30 +0100
From: Martin Gruner
Hi all,
since OTRS 3.1, OTRS works internally with Unicode Strings but leaves character sorting to basic perl 'cmp', which is not really unicode aware...
as explained in Perl Unicode Cookbook [http://www.perl.com/pub/2012/06/perlunicook-case--and-accent-insensitive-sor...], I think it would be best to use Unicode::Collate's sort.
I've done this in Layout.pm -> _BuildSelectionDataRefCreate and it really has improved usability to our users (agents and customers both). This is the diff to the OTRS layout.pm (v 1.381.2.11 2012/06/22), in case anyone wants to use it:
16a17,22 > ##################### > ## Unicode Sorting ## > ##################### > use Unicode::Collate; > ##################### > 4667a4674,4684 > ##################### > ## Unicode Sorting ## > ##################### > ## Unicode Sorting: added sorting by Unicode::Collate > my $Collate = Unicode::Collate->new(level => 1); > # Level 1 ignores case and diacritics > # Level 2 adds diacritic comparisons to the ordering algorithm. > # Level 3 adds case ordering. > # Level 4 adds a tiebreaking comparison of probably more detail than most people will ever care to know. > # Level 4 is default > 4677c4694,4695 < @SortKeys = sort( keys %{ $Param{Data} } ); --- > ## Unicode Sorting ## > @SortKeys = $Collate->sort( keys %{ $Param{Data} } ); 4686c4704,4705 < @SortKeys = sort { $SortHash{$a} cmp $SortHash{$b} } ( keys %SortHash ); --- > ## Unicode Sorting ## > @SortKeys = sort { $Collate->cmp($SortHash{$a}, $SortHash{$b}) } ( keys %SortHash ); 4696c4715,4716 < push @SortKeys, sort { $List{$a} cmp $List{$b} } ( keys %List ); --- > ## Unicode Sorting ## > push @SortKeys, sort { $Collate->cmp($List{$a}, $List{$b}) } ( keys %List ); 4702a4723 > ## Unicode Sorting ## 4704c4725 < = sort { $Param{Data}->{$a} cmp $Param{Data}->{$b} } ( keys %{ $Param{Data} } ); --- > = sort { $Collate->cmp($Param{Data}->{$a}, $Param{Data}->{$b}) } ( keys %{ $Param{Data} } ); 4706a4728 > #####################
Kind regards, Juan Clavero
_______________________________________________ OTRS mailing list: dev - Webpage: http://otrs.org/ Archive: http://lists.otrs.org/pipermail/dev To unsubscribe: http://lists.otrs.org/cgi-bin/listinfo/dev
-- Martin Gruner Senior Developer R&D OTRS AG Europaring 4 94315 Straubing T: +49 (0)6172 681988 0 F: +49 (0)9421 56818 18 I: www.otrs.com/ Gesch?ftssitz: Bad Homburg, Amtsgericht: Bad Homburg, HRB 10751, USt-Nr.: DE256610065 Aufsichtsratsvorsitzender: Burchard Steinbild, Vorstand: Andr? Mindermann (Vorsitzender), Christopher Kuhn, Sabine Riedel It's raining... OTRS Feature Add-Ons! Bis zu 10 kostenlose OTRS Feature Add-Ons und professionellen Hersteller-Support ? Werden Sie jetzt Service-Subscription-Kunde!
participants (1)
-
Juan Manuel Clavero Almirón