Fix for PHP bug 35050
First patch version - zend_operators.c.diff.gz
Second patch version (after comments by Marcus Boerger)
zend_operators.c.v2.diff.gz
ascii_tolower() function was written by Björn JACKE for Lynx browser. I have contacted Björn and he confirmed that code can be used under any license.
> You wrote patch for Lynx > http://j3e.de/linux/patches/lynx2-8-5dev16-localefix-bj.diff > > Lynx is licensed under GPL. Can I use your ascii_tolower() function in PHP > under PHP license? http://www.opensource.org/licenses/php.php yes, you can use that patch under PHP license or any other license you want. Cheers Bjoern |
Patch is written for PHP 5.2.5-dev. It should work in 6.0-dev and PHP 5.2.1 or later version. Any version later than 2006-12-05.
Please note that I know when zend_tolower() was introduced in zend_operators.c, but I don't have information about all changes made by Stas in that commit. If you want to fix older PHP version, you will have to hunt all tolower() calls and replace them with zend_ascii_tolower().
According to PHP developers patch breaks other locales. I don't have information about any broken features and don't understand how it can break things. If PHP is used on Windows, zend_tolower() acts same way as patched version. On Windows PHP uses locale unaware _tolower_l() function instead of locale aware tolower(). It might break things, if you have 8bit classes and method names and expect that 8bit symbols are case insensitive. If you have such code, it is unportable and depends on some specific system locale.
If you need workaround for other PHP versions and can't change your PHP compilation, just set LC_CTYPE locale to C. It also deals with programming mistakes in PHP scripts. If LC_CTYPE locale is set to C, gettext translations must call bind_textdomain_codeset() for any used gettext domain.
setlocale(LC_ALL,'tr_TR.UTF-8'); setlocale(LC_CTYPE, 'C'); |
make test - unpatched, patched, difference.
Zend/bench.php - unpatched and patched.
turkish.php tests:
C locale - unpatched and patched
tr_TR.ISO8859-9 locale - unpatched and patched
tr_TR.UTF-8 locale - unpatched and patched