Perl Unicode Cookbook: Custom Character Properties

℞ 26: Custom character properties

Match Unicode Properties in Regex explained that ever Unicode character has one or more properties, specified by the Unicode consortium. You may extend these rule to define your own properties such that Perl can use them.

A custom property is a function given a name beginning with In or Is which returns a string conforming to a special format. The “User-Defined Character Properties” section of perldoc perlunicode describes this format in more detail.

To define at compile-time your own custom character properties for use in regexes:

 # using private-use characters
 sub In_Tengwar { "E000\tE07F\n" }

 if (/\p{In_Tengwar}/) { ... }

 # blending existing properties
 sub Is_GraecoRoman_Title {<<'END_OF_SET'}
 +utf8::IsLatin
 +utf8::IsGreek
 &utf8::IsTitle
 END_OF_SET

 if (/\p{Is_GraecoRoman_Title}/ { ... }

Previous: ℞ 25: Match Unicode Properties in Regex

Series Index: The Standard Preamble

Next: ℞ 27: Unicode Normalization

Tags

Feedback

Something wrong with this article? Help us out by opening an issue or pull request on GitHub