PHP 5.3.6 Hacking

Written By Mujianto akhmadi pratama on Saturday, October 20, 2012 | Saturday, October 20, 2012


It took me a while but here's a new toy. Today I publish my own PHP fork based on the PHP 5.3.6 code base with a few changes that make the everydays developer life more bearable. It includes some of the patches I've already published about 3 years ago, my defcon extension and also my infusion extension plus a good bunch of extra gimmickries.
In the MySQL-landscape, you can see that the server is forked again and again which result in a seperate project every time; such as Drizzle, MariaDB, OurDelta or the Percona server. I don't want to maintain my own PHP version, but it's fun to improve PHP's behaviour under the view of faster development and also faster execution.
Okay, get the source from github and see what has changed so far

Performance improvements

Hardcoded strlen() and count()

strlen() is both, often used and very slow at the same time. In various PHP performance instructions, you can read that isset() is much faster to determine if a string has at least a certain size. If you want to check the exact length, you end up with something like this: strlen($str) == 32 -> isset($str[31]) && !isset($str[32]). This is very ugly and hard to read. I added a new opcode for count() and strlen(), which results in up to 10 times faster function execution. A strlen() with a constant string like strlen("foo") is optimized away to a constant "3" at compile time; which is cool because this way more verbose code is not a problem anymore.

Hardcoded constants

The constants true, false and null are used also very often. Unfortunately, every usage of either of these constants invokes a constant lookup. There is no problem with this, constant lookups are fast, but I nevertheless implemented these constants directly in the parser to avoid these lookups.

Optimized smart strings

PHP makes use of the smart string library - an internal dynamic growing string library. I optimized the smart_str_append_long() function to add integer numbers much faster. I've also added a new function smart_str_append_const() to concatenate a smart string buffer with a constant string.

Time call optimization

For every time() call in PHP, there is also a time(NULL) call to the kernel - and a few more for internal handlers. I thought, that this is optimized away by using the SAPI layer. But the SAPI method to use a cached time(0) is implemented very spartan. So I removed all time(NULL) and time(0) calls with the internal SAPI handler and also implemented a SAPI hook for CGI/FCGI. You may know, that there is no chance of getting the time via CGI/FCGI. But I patched also lighttpd and nginx to send the time as RAW_TIME. There is a risk that this optimization breaks your script because you get an old cached time value if the script runs more than one second - for example inside of a daemon written in PHP. Thus, I've added a new ini variable called use_sapi_time to turn this optional optimization on.

strtr() table generation optimization

strtr() creates an internal lookup table to speed-up the character replacement. Unfortunately, gcc can not optimize this table generation away. So I hardcoded this table instead of beeing generated anew for each call.

Turn off $_REQUEST variable if it's not needed

Registering PHP's super globals consumes superfluous time. The $_REQUEST variable contains all request relevant parameters, but I never use it and use $_POST, $_COOKIE, $_GET directly under an OOP fashion instead. Anyway, I added "r" to the ini variable variables_order to make the filling ofthe $_REQUEST array optional.

New PHP functions

bool exists(mixed $var[, ...])

exists() is isset()'s little brother not taking null values into account. It just tests for the existence of variables and attributes. My old patch removed the null check of isset() but I wanted to keep backwards compatibility and added a new function language construct exists().
Example
if (exists($var, $var->attr)) {}

string str_random(int $len[, string $chars="0123...XYZ"]);

Generates a random string very quickly using the underlaying operating system.
Example
echo str_random(16);

int ob_fwrite(resource $fd[, int $len=0])

Writes the ob buffer to an opened file handle.
Example
ob_start();
$fd = fopen('/cache/site.txt', 'w+');
echo "This goes to the file";
ob_fwrite($fd);
ob_end_clean();

mixed timechop(int $time[, mixed $format=2, bool $is_array=false])

Chops a time into smaller pieces and returns it as formatted string or as array. The format is a mixed type and can be defined as integer as a number of entities or as string to define the units you want to get. The types for the selective mode are defined as:
k - decade
y - years
n - months
w - weeks
d - days
h - hours
m - minutes
s - seconds
The time value switches to the delta of the current time and the passed value if the value is too big and looks like a unix timestamp.
Example
var_dump(timechop(2392383, "ynwdms", true));

int xround(int $num)

Round to the next power of 10. This breaks down 10log(n) / log(10) by using a fast binary search.
Example
echo xround(2344);

double sigfig(double $num, int $figs)

Calculates the significant figures of a number.
Example
echo sigfig(123.34, 4);

int sgn(double $num)

Calculates the sign of a number.
Example
echo sgn(-0.23);

string strcut(string $str, int $num[, string $x='...'])

Cuts a string if it's longer then a max value and appends a given string. This function doesn't chop words.
Example
echo strcut("This is a very very very very long string which will be
truncated", 15);

string strcal(string $format, string $str[, int $len=-1])

String calibration to check, if the string is in a given format with a simple regexp format.
Example
if (strcal("a-z!", $str)) {}

string strical(string $format, string $str[, int $len=-1])

String calibration without care about upper and lower case.
Example
if (strical("a-z!", $str)) {}

string strmap(string $str, array $replace)

Brings a simple template parser to PHP. The idea comes from the C# printf() functionalitys.
Example
echo strmap("This is {first} and {second}", ["first" => "X",
"second" => "Y", "third" => "Z"]);

int bround(int $num, int $base)

Round to the next multiple of a certain base.
Example
echo bround(283, 5);

mixed bound(mixed $num, mixed $min[, mixed $max])

Limits a number to a specified lower min- and a upper max value
Example
echo bound(43, 22, 50);

Usability improvements

foreach() for strings

Writing parsers in PHP mostly result in for() + strlen() + substr() constructs. I modified foreach() to be able to loop through strings in order to get the characters and their index. This is prettier and also much faster then the previous method.
Example
// Simulating str_split($str)
$str = "PHP is cool!";
$arr = [];

foreach ($str as $k => $v) {
	$arr[$k] = $v;
}

Delete characters with strtr()

Deleting several characters from a string can cause multiple str_replace() calls. It's now possible to delete all characters at once using strtr().
Example
$demise = strtr("passion", "os", "");

Key implode

I need a list of the keys of an array very often. One way to do this is implode(array_keys($arr)), which is not that fast and looks not really nice. implode() now has a new parameter be able to return the keys instead of the value:
Example
$keys = implode(',', $arr, true);

Negative string offsets

What happens if you write $str[-5]? Right, you get a warning and the expression returns null. But why should we give it away? We could use negative string offsets in the same way as positive string offsets with the difference, that we start at the end of the string. So [0] is the first character of the string, [1] the second and [-1] is the last, [-2] the second last and so on. This is really intuitive, makes the code cleaner and avoid nasty strlen() baubles.
Example
$str[-1] == $str[strlen($str) - 1] == substr($str, -1, 1)

Binary numbers

In C# you can define binary numbers in a similar way you write hexadecimal numbers: 0x90. With this change you can define binary numbers with a 0b prefix like this: 0b01001. I don't know, if this feature is good for a common use, because, as you may know, there are after all only 10 persons who understand binary. But I use bit sets very often and this is a good and fast way to do this.
Example
0b101 << 1 == 0b1010

Short array

Programmers are such a lazy folks and writing array() is really annoying. Here is an attempt to make this more handy.
Example
$arr = [1, 2, [5 => "foo", 3.14159], 9];

Better chr() handling

Converting ascii-codes to real characters is possible with the chr()-function. Unfortunately, you only can pass one character at one go. Now it's possible to pass a list of ascii-codes via an array or via a variable parameter list of ascii-codes
Example
"Abc" == chr(65, 98, 99)

Microtime default parameter

A very useless default parameter is the one of microtime(). I can remember, with PHP 4 everyone used a explode() + subtractaction to work around microtime()'s return value. With PHP 5 it became possible to return the time as double, but this is not the default. I broke the API compatibility here and return the ยต-time as a double by default.
Example
$time = microtime();

UTF-8 and ENT_QUOTES as default

As most web applications should work with UTF-8 to make i18n more easy, it is a good idea to bring UTF-8 as default into the game. The same is true of ENT_QUOTES. Okay, I must admit, this change is also a little product of laziness because I hate writing ENT_QUOTES, "UTF-8" - thus this was the last time.
$encoded = htmlspecialchars($ugly);

Disable include warnings

A really annoying problem is when include-warnings spam the logfile, if you put aside file_exists() checks. You could add a @-sign in front of the include command, but this forces PHP to be silence for the entire file. PHP now has a new ini directive ignore_include_warning to be able to disable include warnings with ini_set() or globally.

Omitting quoting with json_encode()

Quoting is necessary to satisfy the json protocol. As an extension, it is sometimes nice to define callbacks in a json string. I added a new bit-mask constant namely JSON_CALLBACK_CHECK in addition to the undocumented JSON_NUMERIC_CHECK. If the callback-check flag is set, the prefix __cb of a string value indicates a not quoted callback string.
Example
$json = json_encode(array(
	"func" => "__cbRaise",
	"number" => "1234",
	"native" => 9876,
	"nocb" => "__cb is the beginning, but it isn't a Callback",
	"123" => "text"
), JSON_NUMERIC_CHECK | JSON_CALLBACK_CHECK);

MySQLi/mysqlnd changes

Native type casting turned on by default

I think it's a good idea to turn on native type casting by default. This reduces cache sizes of installations where people don't care about something like that and increases also the execution performance if numbers from databases are heavily involved in calculations.

mysqli_fetch_all() returns associative arrays by default

The MySQLi function mysqli_fetch_all() returns an indexed array by default. The performance benefit doing so is very low; using associative arrays should be better with regard of easy and readable code.

MySQLi matched rows

The MySQLi attribute matched_rows and the attendant procedural mysqli_matched_rows() function return the number of matched rows of the last SQL operation. If you updadate a table and the affected_rows number is e.g. 5, this doesn't mean that 5 is also the number of elements that have matched the WHERE clause. If you want to retrieve the number, you need to run another SELECT COUNT(1) query with the same condition or parse the mysqli_info() output for yourself instead.

mysqli_return($res,[$free=false])

The function mysqli_return() is the equivalent to mysql_result(). The difference of mysqli_return() to it's older pendant is, that the MySQLi version free's it's ressource after returning the value by default. You can turn off this behaviour, but I wanted a function which can be used to return and free a single value instantly.
Example
public function value($query) {
	$res = mysqli_query($this->db, $query);
	return mysqli_return($res);
}

Tidied up PHP

The PHP fork is rid of the followin old and depricated functionalities in order to make the code base smaller and to improve the execution time. This may limit the usage of PHP under some scenarios but first read on at the next section.
  • Deleted define_syslog_variables
  • Deleted magic quotes
  • Deleted register globals
  • Deleted ASP-tags
  • Deleted short open tags and <?php= is the new <?=
  • Deleted allow_call_time_pass_reference
  • Reduced the default memory limit to 16MB
  • Deleted safe mode
  • Deleted disable functions/classes

New ini file

I've added a new ini file in order to have more control over PHP. It's possible to define and delete constants, declare variables as SUPER and rename and delete functions and classes. The new ini file looks like this:
[Constant]

;;; General Config
string ADMINPASS        = "admin";
string ADMINMAIL        = "robert@xarg.org";

string PASSSALT         = "I'm a very good password salt";
int ONLINE_TIME         = 1200; == session.gc_maxlifetime

;;; DB Config
string DB_USER          = "root";
string DB_PASS          = "";
int CLUSTER_SIZE        = 31; (1 << 5) - 1

;;; Test
float TEST              = "333.32=ddd";
delete PHP_VERSION_ID;
delete CURLOPT_SSLCERTTYPE;

;;; SQL Shorthands
string SQL_USER         = "UID, UName, USex, UPic";
string SQL_NOLOG        = "SET SQL_LOG_BIN = 0";
string SQL_TIMEOUT      = "SET WAIT_TIMEOUT = 3600";
string SQL_DATE         = "'%d.%m.%Y'";
string SQL_TIME         = "'%H:%i'";
string SQL_DATETIME     = "'%d.%m.%Y %H:%i:%s'";

[Variable]
super test;
super time;

[Function]
;rename strlen          = abc;
;delete substr_count;
;delete substr;

[Class]
delete stdClass;
I named the file php-global.conf which can be defined with a php.ini variable like so
globals.filename = "/etc/php-global.conf"

Bug fixes

  • chroot() wasn't enabled for fpm, just for the cgi SAPI
  • Sending "private" with the nocache session cache limiter
  • Make "false" printable with print_r instead of an empty string

Some further ideas (not implemented yet)

  • A new preg_replace() modifier to upper and lower strings directly
  • unpack() returns an array even if the count of the array is 1. A mixed type would be save the array handling; internally and in the user space
  • If one character is used with arithmetic operation, it COULD be used as ascii-code instead of parsing the string
  • I wrote my own mysqli_real_escape() function based on the code of libmysqlclient. This function is strongly optimized and therefore faster. Additionally, it does not need a connection handle. I would be glad to see an escaping function, which can get the encoding from the local configuration instead of using a database handle.
  • I did not investigated time finding out if APC optimizes $i++ to ++$i if the value is read-only. If not, it would be cool having such a feature directly in the core to save some time. But maybe this is a better job for an opcode optimizer, which also reduces the number of redundant jumps and so on.

Ready for takeoff

If you like the features of the modified PHP 5.3.6, you can get it from github. I would be glad to hear further improvements that should get implemented and also hear what you think about the changes I made.

0 komentar:

Post a Comment

Popular Posts Today