Username:

Password:

Author Topic: Unicode issues in Perl  (Read 333 times)

rockstar1234

  • Newbie
  • *
  • Karma: +0/-0
  • Posts: 45
Unicode issues in Perl
« on: February 20, 2011, 04:07:23 AM »
Unicode is supposed to make character handling easy, but living with a legacy system that uses multibyte encoding makes things complicated. This article looks at  the real world of Perl Unicode integration and exposes a set of problems that also occurs in other languages and systems.
 
Windows stores filenames in Unicode, encoded in UTF16.
Every Windows installation is fully Unicode capable but keeps compatibility with non-Unicode applications by using the “System Locale” as it was called under Windows 2000 or “Language for non-Unicode programs” as it is called in later Windows versions such as XP.
The setting is found in the Advanced tab of the Regional and Language Options dialog in the Control Panel.



__________________
Sightline Payments
Scratch Cards
Logged

cashcars

  • Full Member
  • ***
  • Karma: +0/-0
  • Posts: 196
Re: Unicode issues in Perl
« Reply #1 on: January 28, 2012, 10:06:26 PM »
magine two simple variables with Unicode text in it. And you print those variables to standard output.  What may be easier?..

    #!/usr/bin/perl

    my $ustring1 = "Hello \x{263A}!\n"; 
    my $ustring2 = <DATA>;

    print "$ustring1$ustring2";
    __DATA__
    Hello ☺!
You could apparently fix things by avoiding concatenation:

    #!/usr/bin/perl

    my $ustring1 = "Hello \x{263A}!\n"; 
    my $ustring2 = <DATA>;

    print $ustring1, $ustring2;
    __DATA__
    Hello ☺!
There is a distiction between bytes and characters. Characters are Unicode characters.  One character may be represented by several bytes, when stored, printed or sent over network.  That depends on a particular encoding used. UTF-8 is just one of the ways to do represent Unicode data.

Perl has a “utf8” flag for every scalar value, which may be “on” or “off”. “On” state of the flag tells perl to treat the value as a string of Unicode characters.

If you take a string with utf8 flag off and concatenate it with a string that has utf8 flag on, perl converts the first one to Unicode.