• MORE:
  • Perl Camel
  • rss
  • GitHub logo
Perl.com
  •   ABOUT
  •   AUTHORS
  •   CATEGORIES
  • #
      TAGS

Perl Unicode Cookbook: Match Unicode Properties in Regex

May 16, 2012 by Tom Christiansen

unicode

℞ 25: Match Unicode properties in regex with \p, \P Every Unicode codepoint has one or more properties, indicating the rules which apply to that codepoint. Perl’s regex engine is aware of these properties; use the \p{} metacharacter sequence to…

Read it

Perl Unicode Cookbook: Disable Unicode-awareness in Builtin Character Classes

May 14, 2012 by Tom Christiansen

unicode

℞ 24: Disabling Unicode-awareness in builtin charclasses Many regex tutorials gloss over the fact that builtin character classes include far more than ASCII characters. In particular, classes such as “word character” (\w), “word boundary” (\b), “whitespace” (\s), and “digit” (\d)…

Read it

Perl Unicode Cookbook: Get Character Categories

May 11, 2012 by Tom Christiansen

unicode

℞ 23: Get character category Unicode is a set of characters and a list of rules and properties applied to those characters. The Unicode Character Database collects those properties. The core module Unicode::UCD provides access to these properties. These general…

Read it

Perl Unicode Cookbook: Match Unicode Linebreak Sequence

May 10, 2012 by Tom Christiansen

unicode

℞ 22: Match Unicode linebreak sequence in regex Unicode defines several characters as providing vertical whitespace, like the carriage return or newline characters. Unicode also gathers several characters under the banner of a linebreak sequence. A Unicode linebreak matches the…

Read it

Perl Unicode Cookbook: Case-insensitive Comparisons

May 9, 2012 by Tom Christiansen

unicode

℞ 21: Unicode case-insensitive comparisons Unicode is more than an expanded character set. Unicode is a set of rules about how characters behave and a set of properties about each character. Comparing strings for equivalence often requires normalizing them to…

Read it

Perl Unicode Cookbook: Unicode Casing

May 8, 2012 by Tom Christiansen

unicode

℞ 20: Unicode casing Unicode casing is very different from ASCII casing. Some of the complexity of Unicode comes about because Unicode characters may change dramatically when changing from upper to lower case and back. For example, the Greek language…

Read it

Perl Unicode Cookbook: Specify a File's Encoding

May 4, 2012 by Tom Christiansen

unicode

℞ 19: Open file with specific encoding While setting the default Unicode encoding for IO is sensible, sometimes the default encoding is not correct. In this case, specify the encoding for a filehandle manually in the mode option to open…

Read it

Perl Unicode Cookbook: Make All I/O Default to UTF-8

May 3, 2012 by Tom Christiansen

unicode

℞ 18: Make all I/O and args default to utf8 The core rule of Unicode handling in Perl is “always encode and decode at the edges of your program”. If you’ve configured everything such that all incoming and outgoing data…

Read it

Perl Unicode Cookbook: Make File I/O Default to UTF-8

May 1, 2012 by Tom Christiansen

unicode

℞ 17: Make file I/O default to utf8 If you’ve ever had the misfortune of seeing the Unicode warning “wide character in print”, you may have realized that something forgot to set the appropriate Unicode-capable encoding on a filehandle somewhere…

Read it

Perl Unicode Cookbook: Decode Standard Filehandles as Locale Encoding

Apr 30, 2012 by Tom Christiansen

unicode

℞ 16: Declare STD{IN,OUT,ERR} to be in locale encoding Always convert to and from your desired encoding at the edges of your programs. This includes the standard filehandles STDIN, STDOUT, and STDERR. While it may be most common for modern…

Read it

Perl Unicode Cookbook: Decode Standard Filehandles as UTF-8

Apr 27, 2012 by Tom Christiansen

unicode

℞ 15: Declare STD{IN,OUT,ERR} to be UTF-8 Always convert to and from your desired encoding at the edges of your programs. This includes the standard filehandles STDIN, STDOUT, and STDERR. As documented in perldoc perlrun, the PERL_UNICODE environment variable or…

Read it

Perl Unicode Cookbook: Decode @ARGV as Local Encoding

Apr 26, 2012 by Tom Christiansen

unicode

℞ 14: Decode program arguments as locale encoding While it may be most common in modern operating systems for your command-line arguments to be encoded as UTF-8, @ARGV may use other encodings. If you have configured your system with a…

Read it

Perl Unicode Cookbook: Decode @ARGV as UTF-8

Apr 24, 2012 by Tom Christiansen

unicode

℞ 13: Decode program arguments as utf8 While the standard Perl Unicode preamble makes Perl’s filehandles use UTF-8 encoding by default, filehandles aren’t the only sources and sinks of data. The command-line arguments to your programs, available through @ARGV, may…

Read it

Perl Unicode Cookbook: Explicit encode/decode

Apr 23, 2012 by Tom Christiansen

unicode

℞ 12: Explicit encode/decode While the standard Perl Unicode preamble makes Perl’s filehandles use UTF-8 encoding by default, filehandles aren’t the only sources and sinks of data. On rare occasions, such as a database read, you may be given encoded…

Read it

Perl Unicode Cookbook: Names of CJK Codepoints

Apr 20, 2012 by Tom Christiansen

unicode

℞ 11: Names of CJK codepoints CJK refers to Chinese, Japanese, and Korean. In the context of Unicode, it usually refers to the Han ideographs used in the modern Chinese and Japanese writing systems. As you can expect, pictoral languages…

Read it

Perl Unicode Cookbook: Custom Named Characters

Apr 19, 2012 by Tom Christiansen

unicode

℞ 10: Custom named characters As several other recipes demonstrate, the charnames pragma offers considerable power to use and manipulate Unicode characters by their names. Its :alias option allows you to give your own lexically scoped nicknames to existing characters,…

Read it

Perl Unicode Cookbook: Unicode Named Character Sequences

Apr 17, 2012 by Tom Christiansen

unicode

℞ 9: Unicode named sequences Unicode includes the feature of named character sequences, which combine multiple Unicode characters behind a single name. The charnames pragma allows the use of these named sequences in literals, just as it allows the use…

Read it

Perl Unicode Cookbook: Unicode Named Characters

Apr 16, 2012 by Tom Christiansen

unicode

℞ 8: Unicode named characters Use the \N{charname} notation to get the character by that name for use in interpolated literals (double-quoted strings and regexes). In v5.16, there is an implicit use charnames qw(:full :short); But prior to v5.16, you…

Read it

Perl Unicode Cookbook: Get Character Number by Name

Apr 13, 2012 by Tom Christiansen

unicode

℞ 7: Get character number by name Unicode allows you to refer to characters by number or by name. Computers don’t care, but humans do. When you have a character name, you can translate it to its number with the…

Read it

Perl Unicode Cookbook: Get Character Names by Number

Apr 12, 2012 by Tom Christiansen

unicode

℞ 6: Get character name by number Unicode allows you to refer to characters by number or by name. Computers don’t care, but humans do. When you have a character number, you can translate it to its name with the…

Read it
« Older Posts
Newer Posts »
Page 2 of 5
Site Map
  • Home

  • About

  • Authors

  • Categories

  • Tags

Contact Us

To get in touch, send an email to perl.com-editor@perl.org, or submit an issue to perladvent/perldotcom on GitHub.

Perl Camel rss GitHub logo

License

This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.

Creative Commons License

Legal

Perl.com and the authors make no representations with respect to the accuracy or completeness of the contents of all work on this website and specifically disclaim all warranties, including without limitation warranties of fitness for a particular purpose. The information published on this website may not be suitable for every situation. All work on this website is provided with the understanding that Perl.com and the authors are not engaged in rendering professional services. Neither Perl.com nor the authors shall be liable for damages arising herefrom.