diff options
| author | Albrecht Schlosser <albrechts.fltk@online.de> | 2016-03-23 14:02:25 +0000 |
|---|---|---|
| committer | Albrecht Schlosser <albrechts.fltk@online.de> | 2016-03-23 14:02:25 +0000 |
| commit | 979740ce91525cc301a9173731d6aaf3004a6c88 (patch) | |
| tree | d312c7f444b758163bf3e1b13a7bf6da28d13b26 | |
| parent | 8d89d760fa2d01b7f6e38067793d1b480817ae78 (diff) | |
Enable definition of Unicode conv. options on compiler command line.
Three documented pre-processor variables can now be defined on the
compiler command line to avoid editing the FLTK src code. The default
values still apply unchanged.
Port of branch-1.3, svn r11404.
git-svn-id: file:///fltk/svn/fltk/branches/branch-1.3-porting@11406 ea41ed52-d2ee-0310-a9c1-e6b18d33e121
| -rw-r--r-- | documentation/src/unicode.dox | 12 | ||||
| -rw-r--r-- | src/fl_utf.c | 14 |
2 files changed, 18 insertions, 8 deletions
diff --git a/documentation/src/unicode.dox b/documentation/src/unicode.dox index 818d22b49..ecd9074bd 100644 --- a/documentation/src/unicode.dox +++ b/documentation/src/unicode.dox @@ -191,14 +191,14 @@ the following limitations: \section unicode_illegals Illegal Unicode and UTF-8 sequences -Three pre-processor variables are defined in the source code that +Three pre-processor variables are defined in the source code [1] that determine how %fl_utf8decode() handles illegal UTF-8 sequences: - if ERRORS_TO_CP1252 is set to 1 (the default), %fl_utf8decode() will assume that a byte sequence starting with a byte in the range 0x80 - to 0x9f represents a Microsoft CP1252 character, and will instead - return the value of an equivalent UCS character. Otherwise, it - will be processed as an illegal byte value as described below. + to 0x9f represents a Microsoft CP1252 character, and will return + the value of an equivalent UCS character. Otherwise, it will be + processed as an illegal byte value as described below. - if STRICT_RFC3629 is set to 1 (not the default!) then UTF-8 sequences that correspond to illegal UCS values are treated as @@ -210,6 +210,10 @@ determine how %fl_utf8decode() handles illegal UTF-8 sequences: byte value is returned unchanged, otherwise 0xFFFD, the Unicode REPLACEMENT CHARACTER, is returned instead. +[1] Since FLTK 1.3.4 you may set these three pre-processor variables on + your compile command line with -D"variable=value" (value: 0 or 1) + to avoid editing the source code. + %fl_utf8encode() is less strict, and only generates the UTF-8 sequence for 0xFFFD, the Unicode REPLACEMENT CHARACTER, if it is asked to encode a UCS value above U+10FFFF. diff --git a/src/fl_utf.c b/src/fl_utf.c index bfb5b4f02..7fd5686e3 100644 --- a/src/fl_utf.c +++ b/src/fl_utf.c @@ -73,7 +73,9 @@ to completely ignore character sets in your code because virtually everything is either ISO-8859-1 or UTF-8. */ -#define ERRORS_TO_ISO8859_1 1 +#ifndef ERRORS_TO_ISO8859_1 +# define ERRORS_TO_ISO8859_1 1 +#endif /*!Set to 1 to turn bad UTF-8 bytes in the 0x80-0x9f range into the Unicode index for Microsoft's CP1252 character set. You should @@ -81,7 +83,9 @@ available text (such as all web pages) are correctly converted to Unicode. */ -#define ERRORS_TO_CP1252 1 +#ifndef ERRORS_TO_CP1252 +# define ERRORS_TO_CP1252 1 +#endif /*!A number of Unicode code points are in fact illegal and should not be produced by a UTF-8 converter. Turn this on will replace the @@ -89,7 +93,9 @@ arbitrary 16-bit data to UTF-8 and then back is not an identity, which will probably break a lot of software. */ -#define STRICT_RFC3629 0 +#ifndef STRICT_RFC3629 +# define STRICT_RFC3629 0 +#endif #if ERRORS_TO_CP1252 /* Codes 0x80..0x9f from the Microsoft CP1252 character set, translated @@ -109,7 +115,7 @@ static unsigned short cp1252[32] = { (adding \e len to \e p will point at the next character). If \p p points at an illegal UTF-8 encoding, including one that - would go past \e end, or where a code is uses more bytes than + would go past \e end, or where a code uses more bytes than necessary, then *(unsigned char*)p is translated as though it is in the Microsoft CP1252 character set and \e len is set to 1. Treating errors this way allows this to decode almost any |
