Monday, August 3, 2009

SQL Injection

Many web developers are unaware of how SQL queries can be tampered with, and assume that an SQL query is a trusted command. It means that SQL queries are able to circumvent access controls, thereby bypassing standard authentication and authorization checks, and sometimes SQL queries even may allow access to host operating system level commands.

Direct SQL Command Injection is a technique where an attacker creates or alters existing SQL commands to expose hidden data, or to override valuable ones, or even to execute dangerous system level commands on the database host. This is accomplished by the application taking user input and combining it with static parameters to build a SQL query. The following examples are based on true stories, unfortunately.

Owing to the lack of input validation and connecting to the database on behalf of a superuser or the one who can create users, the attacker may create a superuser in your database. ????? 27-2. Splitting the result set into pages ... and making superusers (PostgreSQL)


$offset = $argv[0]; // beware, no input validation!
$query = "SELECT id, name FROM products ORDER BY name LIMIT 20 OFFSET $offset;";
$result = pg_query($conn, $query);

?>

Normal users click on the 'next', 'prev' links where the $offset is encoded into the URL. The script expects that the incoming $offset is a decimal number. However, what if someone tries to break in by appending a urlencode()'d form of the following to the URL


0;
insert into pg_shadow(usename,usesysid,usesuper,usecatupd,passwd)
select 'crack', usesysid, 't','t','crack'
from pg_shadow where usename='postgres';
--



If it happened, then the script would present a superuser access to him. Note that 0; is to supply a valid offset to the original query and to terminate it.

????: It is common technique to force the SQL parser to ignore the rest of the query written by the developer with -- which is the comment sign in SQL.

A feasible way to gain passwords is to circumvent your search result pages. The only thing the attacker needs to do is to see if there are any submitted variables used in SQL statements which are not handled properly. These filters can be set commonly in a preceding form to customize WHERE, ORDER BY, LIMIT and OFFSET clauses in SELECT statements. If your database supports the UNION construct, the attacker may try to append an entire query to the original one to list passwords from an arbitrary table. Using encrypted password fields is strongly encouraged. ????? 27-3. Listing out articles ... and some passwords (any database server)


$query = "SELECT id, name, inserted, size FROM products
WHERE size = '$size'
ORDER BY $order LIMIT $limit, $offset;";
$result = odbc_exec($conn, $query);

?>

The static part of the query can be combined with another SELECT statement which reveals all passwords:


'
union select '1', concat(uname||'-'||passwd) as name, '1971-01-01', '0' from usertable;
--



If this query (playing with the ' and --) were assigned to one of the variables used in $query, the query beast awakened.

SQL UPDATE's are also susceptible to attack. These queries are also threatened by chopping and appending an entirely new query to it. But the attacker might fiddle with the SET clause. In this case some schema information must be possessed to manipulate the query successfully. This can be acquired by examining the form variable names, or just simply brute forcing. There are not so many naming conventions for fields storing passwords or usernames. ????? 27-4. From resetting a password ... to gaining more privileges (any database server)

$query = "UPDATE usertable SET pwd='$pwd' WHERE uid='$uid';";
?>

But a malicious user sumbits the value ' or uid like'%admin%'; -- to $uid to change the admin's password, or simply sets $pwd to "hehehe', admin='yes', trusted=100 " (with a trailing space) to gain more privileges. Then, the query will be twisted:



// $uid == ' or uid like'%admin%'; --
$query = "UPDATE usertable SET pwd='...' WHERE uid='' or uid like '%admin%'; --";

// $pwd == "hehehe', admin='yes', trusted=100 "
$query = "UPDATE usertable SET pwd='hehehe', admin='yes', trusted=100 WHERE
...;";

?>



A frightening example how operating system level commands can be accessed on some database hosts. ????? 27-5. Attacking the database hosts operating system (MSSQL Server)


$query = "SELECT * FROM products WHERE id LIKE '%$prod%'";
$result = mssql_query($query);

?>

If attacker submits the value a%' exec master..xp_cmdshell 'net user test testpass /ADD' -- to $prod, then the $query will be:



$query = "SELECT * FROM products
WHERE id LIKE '%a%'
exec master..xp_cmdshell 'net user test testpass /ADD'--";
$result = mssql_query($query);

?>


MSSQL Server executes the SQL statements in the batch including a command to add a new user to the local accounts database. If this application were running as sa and the MSSQLSERVER service is running with sufficient privileges, the attacker would now have an account with which to access this machine.

????: Some of the examples above is tied to a specific database server. This does not mean that a similar attack is impossible against other products. Your database server may be similarly vulnerable in another manner.

Avoiding techniques
You may plead that the attacker must possess a piece of information about the database schema in most examples. You are right, but you never know when and how it can be taken out, and if it happens, your database may be exposed. If you are using an open source, or publicly available database handling package, which may belong to a content management system or forum, the intruders easily produce a copy of a piece of your code. It may be also a security risk if it is a poorly designed one.

These attacks are mainly based on exploiting the code not being written with security in mind. Never trust any kind of input, especially that which comes from the client side, even though it comes from a select box, a hidden input field or a cookie. The first example shows that such a blameless query can cause disasters.


Never connect to the database as a superuser or as the database owner. Use always customized users with very limited privileges.

Check if the given input has the expected data type. PHP has a wide range of input validating functions, from the simplest ones found in Variable Functions and in Character Type Functions (e.g. is_numeric(), ctype_digit() respectively) and onwards to the Perl compatible Regular Expressions support.

If the application waits for numerical input, consider verifying data with is_numeric(), or silently change its type using settype(), or use its numeric representation by sprintf(). ????? 27-6. A more secure way to compose a query for paging


settype($offset, 'integer');
$query = "SELECT id, name FROM products ORDER BY name LIMIT 20 OFFSET $offset;";

// please note %d in the format string, using %s would be meaningless
$query = sprintf("SELECT id, name FROM products ORDER BY name LIMIT 20 OFFSET %d;",
$offset);

?>



Quote each non numeric user supplied value that is passed to the database with the database-specific string escape function (e.g. mysql_escape_string(), sql_escape_string(), etc.). If a database-specific string escape mechanism is not available, the addslashes() and str_replace() functions may be useful (depending on database type). See the first example. As the example shows, adding quotes to the static part of the query is not enough, making this query easily crackable.

Do not print out any database specific information, especially about the schema, by fair means or foul. See also Error Reporting and Error Handling and Logging Functions.

You may use stored procedures and previously defined cursors to abstract data access so that users do not directly access tables or views, but this solution has another impacts.

Besides these, you benefit from logging queries either within your script or by the database itself, if it supports logging. Obviously, the logging is unable to prevent any harmful attempt, but it can be helpful to trace back which application has been circumvented. The log is not useful by itself, but through the information it contains. More detail is generally better than less

Encrypted Storage Model

SSL/SSH protects data travelling from the client to the server, SSL/SSH does not protect the persistent data stored in a database. SSL is an on-the-wire protocol.

Once an attacker gains access to your database directly (bypassing the webserver), the stored sensitive data may be exposed or misused, unless the information is protected by the database itself. Encrypting the data is a good way to mitigate this threat, but very few databases offer this type of data encryption.

The easiest way to work around this problem is to first create your own encryption package, and then use it from within your PHP scripts. PHP can assist you in this with several extensions, such as Mcrypt and Mhash, covering a wide variety of encryption algorithms. The script encrypts the data before inserting it into the database, and decrypts it when retrieving. See the references for further examples of how encryption works.

In case of truly hidden data, if its raw representation is not needed (i.e. not be displayed), hashing may also be taken into consideration. The well-known example for the hashing is storing the MD5 hash of a password in a database, instead of the password itself. See also crypt() and md5().

????? 27-1. Using hashed password field


// storing password hash
$query = sprintf("INSERT INTO users(name,pwd) VALUES('%s','%s');",
pg_escape_string($username), md5($password));
$result = pg_query($connection, $query);

// querying if user submitted the right password
$query = sprintf("SELECT 1 FROM users WHERE name='%s' AND pwd='%s';",
pg_escape_string($username), md5($password));
$result = pg_query($connection, $query);

if (pg_num_rows($result) > 0) {
echo 'Welcome, $username!';
} else {
echo 'Authentication failed for $username.';
}

Database Security

Nowadays, databases are cardinal components of any web based application by enabling websites to provide varying dynamic content. Since very sensitive or secret information can be stored in a database, you should strongly consider protecting your databases.

To retrieve or to store any information you need to connect to the database, send a legitimate query, fetch the result, and close the connection. Nowadays, the commonly used query language in this interaction is the Structured Query Language (SQL). See how an attacker can tamper with an SQL query.

As you can surmise, PHP cannot protect your database by itself. The following sections aim to be an introduction into the very basics of how to access and manipulate databases within PHP scripts.

Keep in mind this simple rule: defense in depth. The more places you take action to increase the protection of your database, the less probability of an attacker succeeding in exposing or abusing any stored information. Good design of the database schema and the application deals with your greatest fears.

Designing Databases
The first step is always to create the database, unless you want to use one from a third party. When a database is created, it is assigned to an owner, who executed the creation statement. Usually, only the owner (or a superuser) can do anything with the objects in that database, and in order to allow other users to use it, privileges must be granted.

Applications should never connect to the database as its owner or a superuser, because these users can execute any query at will, for example, modifying the schema (e.g. dropping tables) or deleting its entire content.

You may create different database users for every aspect of your application with very limited rights to database objects. The most required privileges should be granted only, and avoid that the same user can interact with the database in different use cases. This means that if intruders gain access to your database using your applications credentials, they can only effect as many changes as your application can.

You are encouraged not to implement all the business logic in the web application (i.e. your script), instead do it in the database schema using views, triggers or rules. If the system evolves, new ports will be intended to open to the database, and you have to re-implement the logic in each separate database client. Over and above, triggers can be used to transparently and automatically handle fields, which often provides insight when debugging problems with your application or tracing back transactions.

Magic Quotes

Magic Quotes is a process that automagically escapes incoming data to the PHP script. It's preferred to code with magic quotes off and to instead escape the data at runtime, as needed.

What are Magic Quotes
When on, all ' (single-quote), " (double quote), \ (backslash) and NULL characters are escaped with a backslash automatically. This is identical to what addslashes() does.

There are three magic quote directives:


magic_quotes_gpc

Affects HTTP Request data (GET, POST, and COOKIE). Cannot be set at runtime, and defaults to on in PHP.

See also get_magic_quotes_gpc().

magic_quotes_runtime

If enabled, most functions that return data from an external source, including databases and text files, will have quotes escaped with a backslash. Can be set at runtime, and defaults to off in PHP.

See also set_magic_quotes_runtime() and get_magic_quotes_runtime().

magic_quotes_sybase

If enabled, a single-quote is escaped with a single-quote instead of a backslash. If on, it completely overrides magic_quotes_gpc. Having both directives enabled means only single quotes are escaped as ''. Double quotes, backslashes and NULL's will remain untouched and unescaped.

See also ini_get() for retrieving its value.

Using Register Globals

Perhaps the most controversial change in PHP is when the default value for the PHP directive register_globals went from ON to OFF in PHP 4.2.0. Reliance on this directive was quite common and many people didn't even know it existed and assumed it's just how PHP works. This page will explain how one can write insecure code with this directive but keep in mind that the directive itself isn't insecure but rather it's the misuse of it.

When on, register_globals will inject your scripts with all sorts of variables, like request variables from HTML forms. This coupled with the fact that PHP doesn't require variable initialization means writing insecure code is that much easier. It was a difficult decision, but the PHP community decided to disable this directive by default. When on, people use variables yet really don't know for sure where they come from and can only assume. Internal variables that are defined in the script itself get mixed up with request data sent by users and disabling register_globals changes this. Let's demonstrate with an example misuse of register_globals:

????? 29-1. Example misuse with register_globals = on

// define $authorized = true only if user is authenticated
if (authenticated_user()) {
$authorized = true;
}

// Because we didn't first initialize $authorized as false, this might be
// defined through register_globals, like from GET auth.php?authorized=1
// So, anyone can be seen as authenticated!
if ($authorized) {
include "/highly/sensitive/data.php";
}
?>



When register_globals = on, our logic above may be compromised. When off, $authorized can't be set via request so it'll be fine, although it really is generally a good programming practice to initialize variables first. For example, in our example above we might have first done $authorized = false. Doing this first means our above code would work with register_globals on or off as users by default would be unauthorized.

Another example is that of sessions. When register_globals = on, we could also use $username in our example below but again you must realize that $username could also come from other means, such as GET (through the URL).

????? 29-2. Example use of sessions with register_globals on or off

// We wouldn't know where $username came from but do know $_SESSION is
// for session data
if (isset($_SESSION['username'])) {

echo "Hello {$_SESSION['username']}";

} else {

echo "Hello Guest
";
echo "Would you like to login?";

}
?>



It's even possible to take preventative measures to warn when forging is being attempted. If you know ahead of time exactly where a variable should be coming from, you can check to see if the submitted data is coming from an inappropriate kind of submission. While it doesn't guarantee that data has not been forged, it does require an attacker to guess the right kind of forging. If you don't care where the request data comes from, you can use $_REQUEST as it contains a mix of GET, POST and COOKIE data. See also the manual section on using variables from outside of PHP.

????? 29-3. Detecting simple variable poisoning

if (isset($_COOKIE['MAGIC_COOKIE'])) {

// MAGIC_COOKIE comes from a cookie.
// Be sure to validate the cookie data!

} elseif (isset($_GET['MAGIC_COOKIE']) || isset($_POST['MAGIC_COOKIE'])) {

mail("admin@example.com", "Possible breakin attempt", $_SERVER['REMOTE_ADDR']);
echo "Security violation, admin has been alerted.";
exit;

} else {

// MAGIC_COOKIE isn't set through this REQUEST

}
?>



Of course, simply turning off register_globals does not mean your code is secure. For every piece of data that is submitted, it should also be checked in other ways. Always validate your user data and initialize your variables! To check for uninitialized variables you may turn up error_reporting() to show E_NOTICE level errors.

Filesystem Security

PHP is subject to the security built into most server systems with respect to permissions on a file and directory basis. This allows you to control which files in the filesystem may be read. Care should be taken with any files which are world readable to ensure that they are safe for reading by all users who have access to that filesystem.

Since PHP was designed to allow user level access to the filesystem, it's entirely possible to write a PHP script that will allow you to read system files such as /etc/passwd, modify your ethernet connections, send massive printer jobs out, etc. This has some obvious implications, in that you need to ensure that the files that you read from and write to are the appropriate ones.

Consider the following script, where a user indicates that they'd like to delete a file in their home directory. This assumes a situation where a PHP web interface is regularly used for file management, so the Apache user is allowed to delete files in the user home directories.

????? 26-1. Poor variable checking leads to....

// remove a file from the user's home directory
$username = $_POST['user_submitted_name'];
$homedir = "/home/$username";
$file_to_delete = "$userfile";
unlink ("$homedir/$userfile");
echo "$file_to_delete has been deleted!";
?>

Since the username is postable from a user form, they can submit a username and file belonging to someone else, and delete files. In this case, you'd want to use some other form of authentication. Consider what could happen if the variables submitted were "../etc/" and "passwd". The code would then effectively read: ????? 26-2. ... A filesystem attack

// removes a file from anywhere on the hard drive that
// the PHP user has access to. If PHP has root access:
$username = "../etc/";
$homedir = "/home/../etc/";
$file_to_delete = "passwd";
unlink ("/home/../etc/passwd");
echo "/home/../etc/passwd has been deleted!";
?>

There are two important measures you should take to prevent these issues.


Only allow limited permissions to the PHP web user binary.

Check all variables which are submitted.

Here is an improved script: ????? 26-3. More secure file name checking

// removes a file from the hard drive that
// the PHP user has access to.
$username = $_SERVER['REMOTE_USER']; // using an authentication mechanisim

$homedir = "/home/$username";

$file_to_delete = basename("$userfile"); // strip paths
unlink ($homedir/$file_to_delete);

$fp = fopen("/home/logging/filedelete.log","+a"); //log the deletion
$logstring = "$username $homedir $file_to_delete";
fwrite ($fp, $logstring);
fclose($fp);

echo "$file_to_delete has been deleted!";
?>

However, even this is not without it's flaws. If your authentication system allowed users to create their own user logins, and a user chose the login "../etc/", the system is once again exposed. For this reason, you may prefer to write a more customized check: ????? 26-4. More secure file name checking

$username = $_SERVER['REMOTE_USER']; // using an authentication mechanisim
$homedir = "/home/$username";

if (!ereg('^[^./][^/]*$', $userfile))
die('bad filename'); //die, do not process

if (!ereg('^[^./][^/]*$', $username))
die('bad username'); //die, do not process
//etc...
?>



Depending on your operating system, there are a wide variety of files which you should be concerned about, including device entries (/dev/ or COM1), configuration files (/etc/ files and the .ini files), well known file storage areas (/home/, My Documents), etc. For this reason, it's usually easier to create a policy where you forbid everything except for what you explicitly allow.

Security

PHP is a powerful language and the interpreter, whether included in a web server as a module or executed as a separate CGI binary, is able to access files, execute commands and open network connections on the server. These properties make anything run on a web server insecure by default. PHP is designed specifically to be a more secure language for writing CGI programs than Perl or C, and with correct selection of compile-time and runtime configuration options, and proper coding practices, it can give you exactly the combination of freedom and security you need.

As there are many different ways of utilizing PHP, there are many configuration options controlling its behaviour. A large selection of options guarantees you can use PHP for a lot of purposes, but it also means there are combinations of these options and server configurations that result in an insecure setup.

The configuration flexibility of PHP is equally rivalled by the code flexibility. PHP can be used to build complete server applications, with all the power of a shell user, or it can be used for simple server-side includes with little risk in a tightly controlled environment. How you build that environment, and how secure it is, is largely up to the PHP developer.

This chapter starts with some general security advice, explains the different configuration option combinations and the situations they can be safely used, and describes different considerations in coding for different levels of security.

Sunday, July 5, 2009

string wordwrap ( string source [, int width [, string break [, boolean cut]]])

string number_format ( float number [, int decimal_places])

string number_format ( float number, int decimal_places, string decimal_point, string thousands_seperator)

Number_format() is a remarkably helpful function that takes a minimum of one parameter, the number to format, and returns that same number with grouped thousands. There are two function prototypes for number_format() as you either pass it one, two, or four parameters - passing it one or two fits the first prototype, and passing four fits the second.

So, if you pass number_format() a parameter of "1234567", it will return "1,234,567". By default, number_format() rounds fractions - 1234567.89 becomes 1,234,568. However, you can change this by specifying the second parameter, which is the number of decimal places to include. Parameter three allows you to choose the character to use as your decimal point, and parameter four allows you to choose the character to use as your thousands separator. Here is how it all looks in PHP:
$num = 12345.6789;
$a = number_format($num);
$b = number_format($num, 3);
$c = number_format($num, 4, ',', '.');
?>

After running that script, $a will be set to 12,346, $b will be set to 12,345.679, and $c will be set to 12.345,6789 (periods used to separate thousands, and commas used for the decimal point, east European-style).

As you can imagine, number_format() is incredibly useful when it comes to formatting money for checkout pages in shopping baskets, although it is useful anywhere you need to represent large numbers - adding a thousand separator invariably makes things easier to read.

Wrapping your lines

string wordwrap ( string source [, int width [, string break [, boolean cut]]])

Although web pages wrap text automatically, there are two situations when you might want to wrap text yourself:

*

When printing to a console as opposed to a web page, text does not wrap automatically. Therefore, unless you want your users to scroll around a lot, it is best to wrap text for them.
*

When printing to a web page that has been designed to exactly accommodate a certain width of text, allowing browsers to wrap text whenever they want will likely lead to the design getting warped.

In either of these situations, the wordwrap() function comes to your aid. If you pass a sentence of text into wordwrap() with no other parameters, it will return that same string wrapped at the 75-character mark using "\n" for new lines. However, you can pass both the size and new line marker as parameters two and three if you want to, like this:
$text = "Word wrap will split this text up into smaller lines, which makes for easier reading and neater layout.";
$text = wordwrap($text, 20, "
");
print $text;
?>

Running that script will give you the following output:
Word wrap will split
this text up into
smaller lines, which
makes for easier
reading and neater
layout.

As you can see, wordwrap() has used
, a HTML new line marker, and split up words at the 20-character mark. Note that wordwrap() always pessimistically wraps words - that is, if you set the second parameter to 20, wordwrap() will always wrap when it hits 20 characters or under - not 21, 22, etc. The only exception to this is if you have words that are individually longer than 20 characters - wordwrap() will not break up a word, and so may return larger chunks than the limit you set.

If you really want your limit to be a hard maximum, you can supply 1 as a fourth parameter, which enables "cut" mode - words over the limit will be cut up if this is enabled. Here is an example of cut mode in action:
$text = "Micro-organism is a very long word.";
$text = wordwrap($text, 6, "\n", 1);
print $text;
?>

That will output the following:
Micro-
organi
sm is
a very
long
word.

Finding a string within a string

Strpos(), and its case-insensitive sibling stripos(), returns the index of the first occurrence of a substring within a string. It is easier to explain in code, so here goes:
$string = "This is a strpos() test";
print strpos($string, "a") . "\n";
?>

That will return 8, because the first character in "This is a strpos() test" that is a lowercase A is at index 8. Remember that PHP considers the first letter of a string to be index 0, which means that the A strpos() found is actually the ninth character.

You can specify whole words in parameter two, which will make strpos() return the first position of that word within the string, for example strpos($string, "test") would return 19 - the index of the first letter in the matched word.

If the substring sent in parameter two is not found in parameter one, strpos() will return false. Consider this script:
$string = "This is a strpos() test";
$pos = strpos($string, "This");
if ($pos == false) {
print "Not found\n";
} else {
print "Found!\n";
}
?>

If you try executing that, you will find that it outputs "Not found", despite "This" quite clearly being in $string. Is it another case sensitivity problem? Not quite. This time the problem lies in the fact that "This" is the first thing in $string, which means that strpos() will return 0. However, PHP considers 0 to be the same value as false, which means that our if statement cannot tell the difference between "Substring not found" and "Substring found at index 0" - quite a problem!

Luckily, PHP comes to the rescue with the === operator, which, if you recall, means "is identical to", which means $pos must be equal to false and of the same type as false (boolean). If the "This" is found in $string, strpos()will return 0, but it will be of type integer . If we change our if statement to use === rather than ==, PHP will check the value of 0 and false and find they match (both false), then check the types of 0 and false, and find that they do not match - the former is an integer, and the latter is a boolean.

So, the corrected version of the script is this:
$string = "This is a strpos() test";
$pos = strpos($string, "This");
if ($pos === false) {
print "Not found\n";
} else {
print "Found!\n";
}
?>

Now, consider this next script, which tries to match the "i" in "is":
$string = "This is a strpos() test";
$pos = strpos($string, "i");
if ($pos === false) {
print "Not found\n";
} else {
print "Found at $pos!\n";
}
?>

The problem there is that strpos() matches the first "i" it comes across, which will be in "This". Fortunately there is a third parameter to strpos() that allows us to specify where to start from. As the "i" in "This" is at index 2, we just need to specify one place after that (3) as the start position for strpos(), and it will report back the next "i" after it. For example:
$string = "This is a strpos() test";
$pos = strpos($string, "i", 3);
if ($pos === false) {
print "Not found\n";
} else {
print "Found at $pos!\n";
}
?>

This time that will print "found at 5!", which is the position of the "i" in "is".

Padding out a string

string str_pad ( string input, int pad_length [, string pad_string [, int pad_type]])

Next up, str_pad() makes a given string (parameter one) larger by X number of characters (parameter two) by adding on spaces. For example:
$string = "Goodbye, Perl!";
$newstring = str_pad($string, 10);
?>

That code would leave " Goodbye, Perl! " in $newstring, which is the same string from $string except with five spaces on either side, equalling the 10 we passed in as parameter two.

Str_pad() has an optional third parameter that lets you set the padding character to use, so:
$string = "Goodbye, Perl!";>
$newstring = str_pad($string, 10, 'a');
?>

That would put "aaaaaGoodbye, Perl!aaaaa" into $newstring.

We can extend the function even more by using it is optional fourth parameter, which allows us to specify which side we want the padding added to. The fourth parameter is specified as a constant, and you either use STR_PAD_LEFT, STR_PAD_RIGHT, or STR_PAD_BOTH:
$string = "Goodbye, Perl!";
$a = str_pad($string, 10, '-', STR_PAD_LEFT);
$b = str_pad($string, 10, '-', STR_PAD_RIGHT);
$c = str_pad($string, 10, '-', STR_PAD_BOTH);
?>

That code will set $a to be "----------Goodbye, Perl!", $b to be "Goodbye, Perl!----------", and $c to be "-----Goodbye, Perl!-----", as expected.

Note that HTML only allows a maximum of one space at any time. If you want to pad more, you will need to use " ", the HTML code for non-breaking space.

Parsing a string into variables

void parse_str ( string input [, array store])

Previously we looked at a handful of the variables set for you inside the superglobal arrays, of which one was QUERY_STRING. If you recall, this is the literal text sent after the question mark in a HTTP GET request, which means that if the page requested was "mypage.php?foo=bar&bar=baz", QUERY_STRING is set to "foo=bar&bar=baz".

The parse_str() function is designed to take a query string like that one and convert it to variables in the same way that PHP does when variables come in. The difference is that variables parsed using parse_str() are converted to global variables, as opposed to elements inside $_GET. So:
if (isset($foo)) {
print "Foo is $foo
";
} else {
print "Foo is unset
";
}

parse_str("foo=bar&bar=baz");

if (isset($foo)) {
print "Foo is $foo
";
} else {
print "Foo is unset
";
}
?>

That will print out "Foo is unset" followed by "Foo is bar", because the call to parse_str() will set $foo to "bar" and $bar to "baz". Optionally, you can pass an array as the second parameter to parse_str(), and it will put the variables into there. That would make the script look like this:
$array = array();

if (isset($array['foo'])) {
print "Foo is {$array['foo']}
";
} else {
print "Foo is unset
";
}

parse_str("foo=bar&bar=baz", $array);

if (isset($array['foo'])) {
print "Foo is {$array['foo']}
";
} else {
print "Foo is unset
";
}
?>

That script outputs the same as before, except that the variables found in the query string are placed into $array. As you can see, the variable names are used as keys in the array and their values are used as the array values.

Regular expression syntax examples

In order to give you a quick reference to the different patterns and what they will match, here's a comprehensive table of all we've covered. Column one contains example expressions, and column two contains what that expression will match.

Expr


Will match...

foo


the string "foo"

^foo


"foo" at the start of a line

foo$


"foo" at the end of a line

^foo$


"foo" when it is alone on a line

[Ff]oo


"Foo" or "foo"

[abc]


a, b, or c

[^abc]


d, e, f, g, h, etc - everything that is not a, b, or c (^ is "not" inside sets)

[A-Z]


any uppercase letter

[a-z]


any lowercase letter

[A-Za-z]


any letter

[A-Za-z0-9]


any letter of number

[A-Z]+


one or more uppercase letters

[A-Z]*


zero or more uppercase letters

[A-Z]?


zero or one uppercase letters

[A-Z]{3}


3 uppercase letters

[A-Z]{3,}


a minimum of 3 uppercase letters

[A-Z]{1,3}


1-3 uppercase letters

[^0-9]


any non-numeric character

[^0-9A-Za-z]


any symbol (not a number or a letter)

Fo*


F, Fo, Foo, Fooo, Foooo, etc

Fo+


Fo, Foo, Fooo, Foooo, etc

Fo?


F, Fo

.


any character except \n (new line)

\b


a word boundary. E.g. te\b matches the "te" in "late", but not the "te" in "tell".

\B


a non-word boundary. "te\B" matches the "te" in "tell" but not the "te" in "late".

\n


new line character

\s


any whitespace (new line, space, tab, etc)

\S


any non-whitespace character

Checking whether a function is available

bool function_exists ( string function_name)

If you're working with functions that are not part of the core of PHP, that is, functions that are from an extension that needs to be enabled by users, it's a smart move to use the function_exists() function. This takes a function name as its only parameter, and returns true if that function (either built-in, or one you've defined yourself) is available for use. Note that it only checks whether the function is available, not whether it will work - your system may not be configured properly for some functions.

Author's Note: If you ever want to know whether you have a function available to you, use the function_exists() function. This takes one string parameter that is the name of a function, and returns true if the function exists or false if it does not. Many people use function_exists() to find out whether they have an extension available, by calling function_exists() on a function of that extension. However, this is accomplished much more easily with the function extension_loaded() function covered later.

Changing string case

string strtoupper ( string source)

string strtolower ( string source)

string ucfirst ( string source)

string ucwords ( string source)

Strtoupper() is part of a small family of functions that affect the case of characters of strings. Strtoupper() takes one string parameter, and returns that string entirely in uppercase. Other variations include strtolower(), to convert the string to lowercase, ucfirst() to convert the first letter of every string to uppercase, and ucwords(), to convert the first letter of every word in the string to uppercase. They all take one parameter and return the converted result, so once you learn one you have learnt them all:
$string = "i like to program in PHP";
$a = strtoupper($string);
$b = strtolower($string);
$c = ucfirst($string);
$d = ucwords($string);
$e = ucwords(strtolower($string));
?>

Each of those variables get set to a slightly different value: $a becomes "I LIKE TO PROGAM IN PHP", $b becomes "i like to program in php", $c becomes "I like to program in PHP", $d becomes "I Like To Program In PHP", and $e becomes "I Like To Program In Php".

From that, you should be able to see that in calls such as ucwords(), PHP will not change existing capital letters to lowercase, which is why $d and $e are different - for $e, all the letters are lowercased first, then passed through ucwords() to make PHP into Php.

string strtoupper ( string source)

string strtolower ( string source)

string ucfirst ( string source)

string ucwords ( string source)

Strtoupper() is part of a small family of functions that affect the case of characters of strings. Strtoupper() takes one string parameter, and returns that string entirely in uppercase. Other variations include strtolower(), to convert the string to lowercase, ucfirst() to convert the first letter of every string to uppercase, and ucwords(), to convert the first letter of every word in the string to uppercase. They all take one parameter and return the converted result, so once you learn one you have learnt them all:
$string = "i like to program in PHP";
$a = strtoupper($string);
$b = strtolower($string);
$c = ucfirst($string);
$d = ucwords($string);
$e = ucwords(strtolower($string));
?>

Each of those variables get set to a slightly different value: $a becomes "I LIKE TO PROGAM IN PHP", $b becomes "i like to program in php", $c becomes "I like to program in PHP", $d becomes "I Like To Program In PHP", and $e becomes "I Like To Program In Php".

From that, you should be able to see that in calls such as ucwords(), PHP will not change existing capital letters to lowercase, which is why $d and $e are different - for $e, all the letters are lowercased first, then passed through ucwords() to make PHP into Php.

Trimming whitespace

string trim ( string source [, string charlist])

string ltrim ( string source [, string charlist])

string rtrim ( string source [, string charlist])

Trim() is a function to strip whitespace from either side of a string variable, with "whitespace" meaning spaces, new lines, and tabs. That is, if you have the string " This is a test " and pass it to trim() as its first parameter, it will return the string "This is a test" - the same thing, but with the spaces trimmed off the end.

You can pass an optional second parameter to trim() if you want, which should be a string specifying the characters you want it to trim(). For example, if we were to pass to trim the second parameter " tes" (that starts with a space), it would output "This is a" - the test would be trimmed, as well as the spaces. As you can see, trim() is again case sensitive - the T in "This" is left untouched.

Trim() has two minor variant functions, ltrim() and rtrim(), which do the same thing but only trim from the left and right respectively.

Here are some examples:
$a = trim(" testing ");
$b = trim(" testing ", " teng");
$c = ltrim(" testing ");
?>

$a will result in "testing", $b will result in "sti", and $c will result in "testing " - as expected, and not surprising because trim() et al are simple to use.

Return values

You're allowed to return one and only one value back from functions, and you do this by using the return statement. In our example, we could have used "return 'foo';" or "return 10 + 10;" to pass other values back, but "return 1;" is easiest, and usually the most common as it is the same as "return true;"

You can return any variable you want, as long as it is just one variable - it can be an integer, a string, a database connection, etc. The "return" keyword sets up the function return value to be whatever variable you use with it, then exits the function immediately. You can also just use "return;", which means "exit without sending a value back."

Consider this script:
function foo() {
print "In function";
return 1;
print "Leaving function...";
}

print foo();
?>

That will output "In function", followed by "1", and then the script will terminate. The reason we never see "Leaving function..." is because the line "return 1" passes one back then immediately exits - the second print statement in foo() is never reached.

If you want to pass more than one value back, you need to use an array - this is covered soon.

A popular thing to do is to return the value of a conditional statement, e.g.:
return $i > 10;

If $i is indeed greater than 10, the > operator will return 1, so it is the same as having "return 1", but if $i is less than or equal to ten, it is the same as being "return 0".

Variable functions

bool is_callable ( mixed function_name [, bool syntax_only [, string callable_name]])

mixed call_user_func ( callback function [, mixed parameter [, mixed ...]])

mixed call_user_func_array ( callback function [, array parameters])

As you have seen already, PHP has variable variables so it is not surprising we have variable functions. This particular piece of clever functionality allows you to write code like this:
$func = "sqrt";
print $func(49);
?>

PHP sees that you are calling a function using a variable, looks up the value of the variable, then calls the matching function. The code above will therefore return 7 - the square root of 49.

As variable functions are quite unusual and also easy to get wrong, there is a special PHP function, is_callable(), that takes a string as its only parameter and returns true if that string contains a function name that can be called using a variable function. Thus, our script becomes this:
$func = "sqrt";
if (is_callable($func)) {
print $func(49);
}
?>

As an alternative to variable functions, you can use call_user_func() and call_user_func_array(), which take the function to call as their first parameter. The difference between the two is that call_user_func() takes the parameters to pass into the variable function as multiple parameters to itself, whereas call_user_func_array() takes an array of parameters as its second parameter.

This next script demonstrates both of these two performing a functionally similar operation, replacing "monkeys" with "giraffes" in a sentence using str_replace():
$func = "str_replace";
$output_single = call_user_func($func, "monkeys", "giraffes", "Hundreds and thousands of monkeys\n");
$params = array("monkeys", "giraffes", "Hundreds and thousands of monkeys\n");
$output_array = call_user_func_array($func, $params);
echo $output_single;
echo $output_array;
?>

Although call_user_func() is essentially the same as using a variable function, call_user_func_array() is very helpful for functions that have complex and variable parameter requirements. One popular application for variable functions is to allow other developers using your code to register callbacks - they pass in the name of the function they want your code to call, then you can use call_user_func() to execute that.

Overriding scope with the GLOBALS array

At some point in your PHP programming career you will want to read a global variable inside a function - I can pretty much guarantee that, because it is a very popular thing to do. Luckily, it is made easy for you by PHP through the $GLOBALS superglobal array, which allows you to access global variables even from within functions. When it comes to the $GLOBALS array it is quite simple: all variables declared in the global scope are in the $GLOBALS array, which you can access anywhere in the script.

To demonstrate this in action, consider the following script:
function foo() {
$GLOBALS['bar'] = "wombat";
}

$bar = "baz";
foo();
print $bar;
?>

What do you think that will output this time? If you guessed "wombat", you would be correct - the foo() function literally alters a variable outside of its scope, so that even after it returns control back to the main script, its effect is still felt. You can of course read variables in the same way, like this:
$localbar = $GLOBALS['bar'];

However, that is quite hard on the eyes. PHP allows you to use a special keyword, GLOBAL, to allow a variable to be accessed locally. For example:
function myfunc() {
GLOBAL $foo, $bar, $baz;
++$baz;
}

That would allow a function to read the global variables $foo, $bar, and $baz. The ++$baz line will increment $baz by 1, and this will be reflected in the global scope also.

Variable scope in functions

As mentioned already, variables declared outside of functions and classes are considered "global" - generally available to the script. However, as functions are independent blocks, their variables are self-contained and do not affect variables in the main script. In the same way, variables from the main script are not implicitly made available inside functions. Take a look at this example script:
function foo() {
$bar = "wombat";
}

$bar = "baz";
foo();
print $bar;
?>

Execution of the script starts at the $bar = "baz" line, and then calls the foo() function. Now, as you can see, foo() sets $bar to "wombat", then returns control to the main script where $bar is printed out. Consider for a moment what you think that script will do, taking into account what I have just said regarding variable scope in functions.

There are, overall, three possibilities:

1.

The script will print "baz"
2.

The script will print "wombat"
3.

The script will print nothing

Possibility one would be the case if the $bar variable was set outside of the function, foo() was called and set its own, local version of $bar, which was deleted once the function ended, leaving the original $bar in place.

Possibility two would be the case if the $bar variable was set outside of the function, foo() was called, and changed the global copy of $bar, therefore printing out the new value once control returns to the main script.

Possibility three would be the case if variables are lost in between function calls.

It is quite simple to discount the third possibility - variables declared globally, that is, outside of functions, remain in the global scope, no matter what functions you call.

The second possibility would mean that variables declared globally are automatically made available inside functions, which we know is not the case. Therefore, the first possibility is in fact correct - foo() is called, and, having no knowledge that a $bar variable exists in the global scope, creates a $bar variable in its local scope. Once the function ends, all local scopes are tossed away, leaving the original $bar variable intact.

For many this procedure is second nature, however it does take a little getting used to if you are new to programming, which is why I have gone into so much depth. This explicit level of scope is something you will find is particularly important once you go beyond simple scripts.

Variable parameter counts

int func_num_args ( )

mixed func_get_arg ( int arg_num)

array func_get_args ( )

The printf() function we examined previously is able to take an arbitrary number of parameters. That is, it could take just one parameter, or five, or fifty, or five hundred - it can take as many as are passed into it by the user. This is known as a variable-length parameter list, and it is automatically implemented in your user functions. For example:
function some_func($a, $b) {
$j = 1;
}

some_func(1,2,3,4,5,6,7,8);
?>

Here the function some_func() is defined to only take two parameters, $a and $b, but we call it with eight parameters and the script should run without a problem. In that example, 1 will be placed into $a, and 2 will be placed into $b, but what happens to the other parameters?

Coming to your rescue are three functions: func_num_args(), func_get_arg(), and func_get_args(), of which the first and last take no parameters. To get the number of arguments that were passed into your function, call func_num_args() and read its return value. To get the value of an individual parameter, use func_get_arg() and pass in the parameter number you want to retrieve to have its value returned back to you. Finally, func_get_args() returns an array of the parameters that were passed in. Here's an example:
function some_func($a, $b) {
for ($i = 0; $i < func_num_args(); ++$i) {
$param = func_get_arg($i);
echo "Received parameter $param.\n";
}
}

function some_other_func($a, $b) {
$param = func_get_args();
$param = join(", ", $param);
echo "Received parameters: $param.\n";
}

some_func(1,2,3,4,5,6,7,8);
some_other_func(1,2,3,4,5,6,7,8);
?>

Using func_num_args() alone you can easily implement function error checking. You can, for example, start off each of your functions by checking to make sure func_num_args() is what you are expecting, and, if not, exit. Once you add func_get_arg() into the mix, however, you should easily be able to create your own functions that work with any number of parameters.

Functions

Functions, both ones built into PHP and ones you define yourself, make coding much easier - they take away lots of hard work because you can reuse other people's code, and they allow you to keep your scripts shorter and easier to maintain. As PHP 5 includes more than 2,500 functions, you might assume it's a very easy language indeed, but the truth is that each function needs to be used in different ways and so needs to be learnt individually. In this chapter you will learn your first PHP functions, with the most helpful and easy first.

Rather than writing pieces of code time after time whenever you want to execute the same functionality, PHP allows you to encapsulate code into a named function that you can call from elsewhere in your script.

PHP comes with hundreds of predefined functions that perform all manner of tasks from reading files and manipulating strings up to querying databases and connecting to an IRC server. If you find something is missing, you can add your own functions on a script by script basis, and these are called user functions .

In this section we will be covering a variety of the most important basic functions in PHP - more specialised functions can be found spread throughout the book under various sections, and should be looked up using the index.

Topics covered in this chapter are:

*

Working with date and time
*

Mathematical functions
*

String manipulation
*

Creating data hashes
*

Regular expressions
*

Extension handling
*

Writing your own functions
*

Recursive, variable, and callback functions

Saturday, July 4, 2009

Arrays

An array in PHP is actually an ordered map. A map is a type that maps values to keys. This type is optimized in several ways, so you can use it as a real array, or a list (vector), hashtable (which is an implementation of a map), dictionary, collection, stack, queue and probably more. Because you can have another PHP array as a value, you can also quite easily simulate trees.

Explanation of those data structures is beyond the scope of this manual, but you'll find at least one example for each of them. For more information we refer you to external literature about this broad topic.

Syntax
Specifying with array()
An array can be created by the array() language-construct. It takes a certain number of comma-separated key => value pairs.

array( [key =>] value
, ...
)
// key may be an integer or string
// value may be any value





$arr = array("foo" => "bar", 12 => true);

echo $arr["foo"]; // bar
echo $arr[12]; // 1
?>



A key may be either an integer or a string. If a key is the standard representation of an integer, it will be interpreted as such (i.e. "8" will be interpreted as 8, while "08" will be interpreted as "08"). Floats in key are truncated to integer. There are no different indexed and associative array types in PHP; there is only one array type, which can both contain integer and string indices.

A value can be of any PHP type.


$arr = array("somearray" => array(6 => 5, 13 => 9, "a" => 42));

echo $arr["somearray"][6]; // 5
echo $arr["somearray"][13]; // 9
echo $arr["somearray"]["a"]; // 42
?>



If you do not specify a key for a given value, then the maximum of the integer indices is taken, and the new key will be that maximum value + 1. If you specify a key that already has a value assigned to it, that value will be overwritten.


// This array is the same as ...
array(5 => 43, 32, 56, "b" => 12);

// ...this array
array(5 => 43, 6 => 32, 7 => 56, "b" => 12);
?>




?????
As of PHP 4.3.0, the index generation behaviour described above has changed. Now, if you append to an array in which the current maximum key is negative, then the next key created will be zero (0). Before, the new index would have been set to the largest existing key + 1, the same as positive indices are.


Using TRUE as a key will evaluate to integer 1 as key. Using FALSE as a key will evaluate to integer 0 as key. Using NULL as a key will evaluate to the empty string. Using the empty string as key will create (or overwrite) a key with the empty string and its value; it is not the same as using empty brackets.

You cannot use arrays or objects as keys. Doing so will result in a warning: Illegal offset type.

Creating/modifying with square-bracket syntax
You can also modify an existing array by explicitly setting values in it.

This is done by assigning values to the array while specifying the key in brackets. You can also omit the key, add an empty pair of brackets ("[]") to the variable name in that case. $arr[key] = value;
$arr[] = value;
// key may be an integer or string
// value may be any value

If $arr doesn't exist yet, it will be created. So this is also an alternative way to specify an array. To change a certain value, just assign a new value to an element specified with its key. If you want to remove a key/value pair, you need to unset() it.


$arr = array(5 => 1, 12 => 2);

$arr[] = 56; // This is the same as $arr[13] = 56;
// at this point of the script

$arr["x"] = 42; // This adds a new element to
// the array with key "x"

unset($arr[5]); // This removes the element from the array

unset($arr); // This deletes the whole array
?>



????: As mentioned above, if you provide the brackets with no key specified, then the maximum of the existing integer indices is taken, and the new key will be that maximum value + 1 . If no integer indices exist yet, the key will be 0 (zero). If you specify a key that already has a value assigned to it, that value will be overwritten.



?????
As of PHP 4.3.0, the index generation behaviour described above has changed. Now, if you append to an array in which the current maximum key is negative, then the next key created will be zero (0). Before, the new index would have been set to the largest existing key + 1, the same as positive indices are.



Note that the maximum integer key used for this need not currently exist in the array. It simply must have existed in the array at some time since the last time the array was re-indexed. The following example illustrates:


// Create a simple array.
$array = array(1, 2, 3, 4, 5);
print_r($array);

// Now delete every item, but leave the array itself intact:
foreach ($array as $i => $value) {
unset($array[$i]);
}
print_r($array);

// Append an item (note that the new key is 5, instead of 0 as you
// might expect).
$array[] = 6;
print_r($array);

// Re-index:
$array = array_values($array);
$array[] = 7;
print_r($array);
?>

The above example will output:

Array
(
[0] => 1
[1] => 2
[2] => 3
[3] => 4
[4] => 5
)
Array
(
)
Array
(
[5] => 6
)
Array
(
[0] => 6
[1] => 7
)



Useful functions
There are quite a few useful functions for working with arrays. See the array functions section.

????: The unset() function allows unsetting keys of an array. Be aware that the array will NOT be reindexed. If you only use "usual integer indices" (starting from zero, increasing by one), you can achieve the reindex effect by using array_values().


$a = array(1 => 'one', 2 => 'two', 3 => 'three');
unset($a[2]);
/* will produce an array that would have been defined as
$a = array(1 => 'one', 3 => 'three');
and NOT
$a = array(1 => 'one', 2 =>'three');
*/

$b = array_values($a);
// Now $b is array(0 => 'one', 1 =>'three')
?>



The foreach control structure exists specifically for arrays. It provides an easy way to traverse an array.

Array do's and don'ts
Why is $foo[bar] wrong?
You should always use quotes around a string literal array index. For example, use $foo['bar'] and not $foo[bar]. But why is $foo[bar] wrong? You might have seen the following syntax in old scripts:


$foo[bar] = 'enemy';
echo $foo[bar];
// etc
?>


This is wrong, but it works. Then, why is it wrong? The reason is that this code has an undefined constant (bar) rather than a string ('bar' - notice the quotes), and PHP may in future define constants which, unfortunately for your code, have the same name. It works because PHP automatically converts a bare string (an unquoted string which does not correspond to any known symbol) into a string which contains the bare string. For instance, if there is no defined constant named bar, then PHP will substitute in the string 'bar' and use that.

????: This does not mean to always quote the key. You do not want to quote keys which are constants or variables, as this will prevent PHP from interpreting them.


error_reporting(E_ALL);
ini_set('display_errors', true);
ini_set('html_errors', false);
// Simple array:
$array = array(1, 2);
$count = count($array);
for ($i = 0; $i < $count; $i++) {
echo "\nChecking $i: \n";
echo "Bad: " . $array['$i'] . "\n";
echo "Good: " . $array[$i] . "\n";
echo "Bad: {$array['$i']}\n";
echo "Good: {$array[$i]}\n";
}
?>


????: The above example will output:

Checking 0:
Notice: Undefined index: $i in /path/to/script.html on line 9
Bad:
Good: 1
Notice: Undefined index: $i in /path/to/script.html on line 11
Bad:
Good: 1

Checking 1:
Notice: Undefined index: $i in /path/to/script.html on line 9
Bad:
Good: 2
Notice: Undefined index: $i in /path/to/script.html on line 11
Bad:
Good: 2


More examples to demonstrate this fact:


// Let's show all errors
error_reporting(E_ALL);

$arr = array('fruit' => 'apple', 'veggie' => 'carrot');

// Correct
print $arr['fruit']; // apple
print $arr['veggie']; // carrot

// Incorrect. This works but also throws a PHP error of
// level E_NOTICE because of an undefined constant named fruit
//
// Notice: Use of undefined constant fruit - assumed 'fruit' in...
print $arr[fruit]; // apple

// Let's define a constant to demonstrate what's going on. We
// will assign value 'veggie' to a constant named fruit.
define('fruit', 'veggie');

// Notice the difference now
print $arr['fruit']; // apple
print $arr[fruit]; // carrot

// The following is okay as it's inside a string. Constants are not
// looked for within strings so no E_NOTICE error here
print "Hello $arr[fruit]"; // Hello apple

// With one exception, braces surrounding arrays within strings
// allows constants to be looked for
print "Hello {$arr[fruit]}"; // Hello carrot
print "Hello {$arr['fruit']}"; // Hello apple

// This will not work, results in a parse error such as:
// Parse error: parse error, expecting T_STRING' or T_VARIABLE' or T_NUM_STRING'
// This of course applies to using autoglobals in strings as well
print "Hello $arr['fruit']";
print "Hello $_GET['foo']";

// Concatenation is another option
print "Hello " . $arr['fruit']; // Hello apple
?>



When you turn error_reporting() up to show E_NOTICE level errors (such as setting it to E_ALL) then you will see these errors. By default, error_reporting is turned down to not show them.

As stated in the syntax section, there must be an expression between the square brackets ('[' and ']'). That means that you can write things like this:


echo $arr[somefunc($bar)];
?>


This is an example of using a function return value as the array index. PHP also knows about constants, as you may have seen the E_* ones before.

$error_descriptions[E_ERROR] = "A fatal error has occured";
$error_descriptions[E_WARNING] = "PHP issued a warning";
$error_descriptions[E_NOTICE] = "This is just an informal notice";
?>


Note that E_ERROR is also a valid identifier, just like bar in the first example. But the last example is in fact the same as writing:

$error_descriptions[1] = "A fatal error has occured";
$error_descriptions[2] = "PHP issued a warning";
$error_descriptions[8] = "This is just an informal notice";
?>


because E_ERROR equals 1, etc.

As we already explained in the above examples, $foo[bar] still works but is wrong. It works, because bar is due to its syntax expected to be a constant expression. However, in this case no constant with the name bar exists. PHP now assumes that you meant bar literally, as the string "bar", but that you forgot to write the quotes.

So why is it bad then?
At some point in the future, the PHP team might want to add another constant or keyword, or you may introduce another constant into your application, and then you get in trouble. For example, you already cannot use the words empty and default this way, since they are special reserved keywords.

????: To reiterate, inside a double-quoted string, it's valid to not surround array indexes with quotes so "$foo[bar]" is valid. See the above examples for details on why as well as the section on variable parsing in strings.

Converting to array
For any of the types: integer, float, string, boolean and resource, if you convert a value to an array, you get an array with one element (with index 0), which is the scalar value you started with.

If you convert an object to an array, you get the properties (member variables) of that object as the array's elements. The keys are the member variable names.

If you convert a NULL value to an array, you get an empty array.

Comparing
It is possible to compare arrays by array_diff() and by Array operators.

Examples
The array type in PHP is very versatile, so here will be some examples to show you the full power of arrays.



// this
$a = array( 'color' => 'red',
'taste' => 'sweet',
'shape' => 'round',
'name' => 'apple',
4 // key will be 0
);

// is completely equivalent with
$a['color'] = 'red';
$a['taste'] = 'sweet';
$a['shape'] = 'round';
$a['name'] = 'apple';
$a[] = 4; // key will be 0

$b[] = 'a';
$b[] = 'b';
$b[] = 'c';
// will result in the array array(0 => 'a' , 1 => 'b' , 2 => 'c'),
// or simply array('a', 'b', 'c')
?>



????? 11-6. Using array()

// Array as (property-)map
$map = array( 'version' => 4,
'OS' => 'Linux',
'lang' => 'english',
'short_tags' => true
);

// strictly numerical keys
$array = array( 7,
8,
0,
156,
-10
);
// this is the same as array(0 => 7, 1 => 8, ...)

$switching = array( 10, // key = 0
5 => 6,
3 => 7,
'a' => 4,
11, // key = 6 (maximum of integer-indices was 5)
'8' => 2, // key = 8 (integer!)
'02' => 77, // key = '02'
0 => 12 // the value 10 will be overwritten by 12
);

// empty array
$empty = array();
?>

????? 11-7. Collection

$colors = array('red', 'blue', 'green', 'yellow');

foreach ($colors as $color) {
echo "Do you like $color?\n";
}

?>

The above example will output:

Do you like red?
Do you like blue?
Do you like green?
Do you like yellow?



Changing values of the array directly is possible since PHP 5 by passing them as reference. Prior versions need workaround: ????? 11-8. Collection

// PHP 5
foreach ($colors as &$color) {
$color = strtoupper($color);
}
unset($color); /* ensure that following writes to
$color will not modify the last array element */

// Workaround for older versions
foreach ($colors as $key => $color) {
$colors[$key] = strtoupper($color);
}

print_r($colors);
?>

The above example will output:

Array
(
[0] => RED
[1] => BLUE
[2] => GREEN
[3] => YELLOW
)




This example creates a one-based array. ????? 11-9. One-based index

$firstquarter = array(1 => 'January', 'February', 'March');
print_r($firstquarter);
?>

The above example will output:

Array
(
[1] => 'January'
[2] => 'February'
[3] => 'March'
)




????? 11-10. Filling an array

// fill an array with all items from a directory
$handle = opendir('.');
while (false !== ($file = readdir($handle))) {
$files[] = $file;
}
closedir($handle);
?>


Arrays are ordered. You can also change the order using various sorting functions. See the array functions section for more information. You can count the number of items in an array using the count() function.

????? 11-11. Sorting an array

sort($files);
print_r($files);
?>


Because the value of an array can be anything, it can also be another array. This way you can make recursive and multi-dimensional arrays.

????? 11-12. Recursive and multi-dimensional arrays

$fruits = array ( "fruits" => array ( "a" => "orange",
"b" => "banana",
"c" => "apple"
),
"numbers" => array ( 1,
2,
3,
4,
5,
6
),
"holes" => array ( "first",
5 => "second",
"third"
)
);

// Some examples to address values in the array above
echo $fruits["holes"][5]; // prints "second"
echo $fruits["fruits"]["a"]; // prints "orange"
unset($fruits["holes"][0]); // remove "first"

// Create a new multi-dimensional array
$juices["apple"]["green"] = "good";
?>


You should be aware that array assignment always involves value copying. It also means that the internal array pointer used by current() and similar functions is reset. You need to use the reference operator to copy an array by reference.


$arr1 = array(2, 3);
$arr2 = $arr1;
$arr2[] = 4; // $arr2 is changed,
// $arr1 is still array(2, 3)

$arr3 = &$arr1;
$arr3[] = 4; // now $arr1 and $arr3 are the same
?>






????? ??? ???? ?????
Strings ????? Objects

Strings

A string is series of characters. In PHP, a character is the same as a byte, that is, there are exactly 256 different characters possible. This also implies that PHP has no native support of Unicode. See utf8_encode() and utf8_decode() for some Unicode support.

????: It is no problem for a string to become very large. There is no practical bound to the size of strings imposed by PHP, so there is no reason at all to worry about long strings.

Syntax
A string literal can be specified in three different ways.


single quoted

double quoted

heredoc syntax


Single quoted
The easiest way to specify a simple string is to enclose it in single quotes (the character ').

To specify a literal single quote, you will need to escape it with a backslash (\), like in many other languages. If a backslash needs to occur before a single quote or at the end of the string, you need to double it. Note that if you try to escape any other character, the backslash will also be printed! So usually there is no need to escape the backslash itself.

????: In PHP 3, a warning will be issued at the E_NOTICE level when this happens.

????: Unlike the two other syntaxes, variables and escape sequences for special characters will not be expanded when they occur in single quoted strings.


echo 'this is a simple string';

echo 'You can also have embedded newlines in
strings this way as it is
okay to do';

// Outputs: Arnold once said: "I'll be back"
echo 'Arnold once said: "I\'ll be back"';

// Outputs: You deleted C:\*.*?
echo 'You deleted C:\\*.*?';

// Outputs: You deleted C:\*.*?
echo 'You deleted C:\*.*?';

// Outputs: This will not expand: \n a newline
echo 'This will not expand: \n a newline';

// Outputs: Variables do not $expand $either
echo 'Variables do not $expand $either';
?>



Double quoted
If the string is enclosed in double-quotes ("), PHP understands more escape sequences for special characters:

???? 11-1. Escaped characters

sequence meaning
\n linefeed (LF or 0x0A (10) in ASCII)
\r carriage return (CR or 0x0D (13) in ASCII)
\t horizontal tab (HT or 0x09 (9) in ASCII)
\\ backslash
\$ dollar sign
\" double-quote
\[0-7]{1,3} the sequence of characters matching the regular expression is a character in octal notation
\x[0-9A-Fa-f]{1,2} the sequence of characters matching the regular expression is a character in hexadecimal notation

Again, if you try to escape any other character, the backslash will be printed too! Before PHP 5.1.1, backslash in \{$var} hasn't been printed.

But the most important feature of double-quoted strings is the fact that variable names will be expanded. See string parsing for details.

Heredoc
Another way to delimit strings is by using heredoc syntax ("<<<"). One should provide an identifier after <<<, then the string, and then the same identifier to close the quotation.

The closing identifier must begin in the first column of the line. Also, the identifier used must follow the same naming rules as any other label in PHP: it must contain only alphanumeric characters and underscores, and must start with a non-digit character or underscore.


?????
It is very important to note that the line with the closing identifier contains no other characters, except possibly a semicolon (;). That means especially that the identifier may not be indented, and there may not be any spaces or tabs after or before the semicolon. It's also important to realize that the first character before the closing identifier must be a newline as defined by your operating system. This is \r on Macintosh for example. Closing delimiter (possibly followed by a semicolon) must be followed by a newline too.

If this rule is broken and the closing identifier is not "clean" then it's not considered to be a closing identifier and PHP will continue looking for one. If in this case a proper closing identifier is not found then a parse error will result with the line number being at the end of the script.

It is not allowed to use heredoc syntax in initializing class members. Use other string syntaxes instead. ????? 11-3. Invalid example

class foo {
public $bar = <<bar
EOT;
}
?>




Heredoc text behaves just like a double-quoted string, without the double-quotes. This means that you do not need to escape quotes in your here docs, but you can still use the escape codes listed above. Variables are expanded, but the same care must be taken when expressing complex variables inside a heredoc as with strings. ????? 11-4. Heredoc string quoting example

$str = <<Example of string
spanning multiple lines
using heredoc syntax.
EOD;

/* More complex example, with variables. */
class foo
{
var $foo;
var $bar;

function foo()
{
$this->foo = 'Foo';
$this->bar = array('Bar1', 'Bar2', 'Bar3');
}
}

$foo = new foo();
$name = 'MyName';

echo <<My name is "$name". I am printing some $foo->foo.
Now, I am printing some {$foo->bar[1]}.
This should print a capital 'A': \x41
EOT;
?>



????: Heredoc support was added in PHP 4.

Variable parsing
When a string is specified in double quotes or with heredoc, variables are parsed within it.

There are two types of syntax: a simple one and a complex one. The simple syntax is the most common and convenient. It provides a way to parse a variable, an array value, or an object property.

The complex syntax was introduced in PHP 4, and can be recognised by the curly braces surrounding the expression.

Simple syntax
If a dollar sign ($) is encountered, the parser will greedily take as many tokens as possible to form a valid variable name. Enclose the variable name in curly braces if you want to explicitly specify the end of the name.


$beer = 'Heineken';
echo "$beer's taste is great"; // works, "'" is an invalid character for varnames
echo "He drank some $beers"; // won't work, 's' is a valid character for varnames
echo "He drank some ${beer}s"; // works
echo "He drank some {$beer}s"; // works
?>


Similarly, you can also have an array index or an object property parsed. With array indices, the closing square bracket (]) marks the end of the index. For object properties the same rules apply as to simple variables, though with object properties there doesn't exist a trick like the one with variables.


// These examples are specific to using arrays inside of strings.
// When outside of a string, always quote your array string keys
// and do not use {braces} when outside of strings either.

// Let's show all errors
error_reporting(E_ALL);

$fruits = array('strawberry' => 'red', 'banana' => 'yellow');

// Works but note that this works differently outside string-quotes
echo "A banana is $fruits[banana].";

// Works
echo "A banana is {$fruits['banana']}.";

// Works but PHP looks for a constant named banana first
// as described below.
echo "A banana is {$fruits[banana]}.";

// Won't work, use braces. This results in a parse error.
echo "A banana is $fruits['banana'].";

// Works
echo "A banana is " . $fruits['banana'] . ".";

// Works
echo "This square is $square->width meters broad.";

// Won't work. For a solution, see the complex syntax.
echo "This square is $square->width00 centimeters broad.";
?>


For anything more complex, you should use the complex syntax.

Complex (curly) syntax
This isn't called complex because the syntax is complex, but because you can include complex expressions this way.

In fact, you can include any value that is in the namespace in strings with this syntax. You simply write the expression the same way as you would outside the string, and then include it in { and }. Since you can't escape '{', this syntax will only be recognised when the $ is immediately following the {. (Use "{\$" to get a literal "{$"). Some examples to make it clear:


// Let's show all errors
error_reporting(E_ALL);

$great = 'fantastic';

// Won't work, outputs: This is { fantastic}
echo "This is { $great}";

// Works, outputs: This is fantastic
echo "This is {$great}";
echo "This is ${great}";

// Works
echo "This square is {$square->width}00 centimeters broad.";

// Works
echo "This works: {$arr[4][3]}";

// This is wrong for the same reason as $foo[bar] is wrong
// outside a string. In other words, it will still work but
// because PHP first looks for a constant named foo, it will
// throw an error of level E_NOTICE (undefined constant).
echo "This is wrong: {$arr[foo][3]}";

// Works. When using multi-dimensional arrays, always use
// braces around arrays when inside of strings
echo "This works: {$arr['foo'][3]}";

// Works.
echo "This works: " . $arr['foo'][3];

echo "You can even write {$obj->values[3]->name}";

echo "This is the value of the var named $name: {${$name}}";
?>


String access and modification by character
Characters within strings may be accessed and modified by specifying the zero-based offset of the desired character after the string using square array-brackets like $str[42] so think of a string as an array of characters.

????: They may also be accessed using braces like $str{42} for the same purpose. However, using square array-brackets is preferred.

????? 11-5. Some string examples

// Get the first character of a string
$str = 'This is a test.';
$first = $str[0];

// Get the third character of a string
$third = $str[2];

// Get the last character of a string.
$str = 'This is still a test.';
$last = $str[strlen($str)-1];

// Modify the last character of a string
$str = 'Look at the sea';
$str[strlen($str)-1] = 'e';

// Alternative method using {}
$third = $str{2};

?>



Useful functions and operators
Strings may be concatenated using the '.' (dot) operator. Note that the '+' (addition) operator will not work for this. Please see String operators for more information.

There are a lot of useful functions for string modification.

See the string functions section for general functions, the regular expression functions for advanced find&replacing (in two tastes: Perl and POSIX extended).

There are also functions for URL-strings, and functions to encrypt/decrypt strings (mcrypt and mhash).

Finally, if you still didn't find what you're looking for, see also the character type functions.

Converting to string
You can convert a value to a string using the (string) cast, or the strval() function. String conversion is automatically done in the scope of an expression for you where a string is needed. This happens when you use the echo() or print() functions, or when you compare a variable value to a string. Reading the manual sections on Types and Type Juggling will make the following clearer. See also settype().

A boolean TRUE value is converted to the string "1", the FALSE value is represented as "" (empty string). This way you can convert back and forth between boolean and string values.

An integer or a floating point number (float) is converted to a string representing the number with its digits (including the exponent part for floating point numbers).

Arrays are always converted to the string "Array", so you cannot dump out the contents of an array with echo() or print() to see what is inside them. To view one element, you'd do something like echo $arr['foo']. See below for tips on dumping/viewing the entire contents.

Objects are always converted to the string "Object". If you would like to print out the member variable values of an object for debugging reasons, read the paragraphs below. If you would like to find out the class name of which an object is an instance of, use get_class(). As of PHP 5, __toString() method is used if applicable.

Resources are always converted to strings with the structure "Resource id #1" where 1 is the unique number of the resource assigned by PHP during runtime. If you would like to get the type of the resource, use get_resource_type().

NULL is always converted to an empty string.

As you can see above, printing out the arrays, objects or resources does not provide you any useful information about the values themselves. Look at the functions print_r() and var_dump() for better ways to print out values for debugging.

You can also convert PHP values to strings to store them permanently. This method is called serialization, and can be done with the function serialize(). You can also serialize PHP values to XML structures, if you have WDDX support in your PHP setup.

String conversion to numbers
When a string is evaluated as a numeric value, the resulting value and type are determined as follows.

The string will evaluate as a float if it contains any of the characters '.', 'e', or 'E'. Otherwise, it will evaluate as an integer.

The value is given by the initial portion of the string. If the string starts with valid numeric data, this will be the value used. Otherwise, the value will be 0 (zero). Valid numeric data is an optional sign, followed by one or more digits (optionally containing a decimal point), followed by an optional exponent. The exponent is an 'e' or 'E' followed by one or more digits.


$foo = 1 + "10.5"; // $foo is float (11.5)
$foo = 1 + "-1.3e3"; // $foo is float (-1299)
$foo = 1 + "bob-1.3e3"; // $foo is integer (1)
$foo = 1 + "bob3"; // $foo is integer (1)
$foo = 1 + "10 Small Pigs"; // $foo is integer (11)
$foo = 4 + "10.2 Little Piggies"; // $foo is float (14.2)
$foo = "10.0 pigs " + 1; // $foo is float (11)
$foo = "10.0 pigs " + 1.0; // $foo is float (11)
?>


For more information on this conversion, see the Unix manual page for strtod(3).

If you would like to test any of the examples in this section, you can cut and paste the examples and insert the following line to see for yourself what's going on:


echo "\$foo==$foo; type is " . gettype ($foo) . "
\n";
?>



Do not expect to get the code of one character by converting it to integer (as you would do in C for example). Use the functions ord() and chr() to convert between charcodes and characters.




????? ??? ???? ?????
Floating point numbers ????? Arrays

Floating point numbers

Floating point numbers (AKA "floats", "doubles" or "real numbers") can be specified using any of the following syntaxes:


$a = 1.234;
$b = 1.2e3;
$c = 7E-10;
?>


Formally:

LNUM [0-9]+
DNUM ([0-9]*[\.]{LNUM}) | ({LNUM}[\.][0-9]*)
EXPONENT_DNUM ( ({LNUM} | {DNUM}) [eE][+-]? {LNUM})


The size of a float is platform-dependent, although a maximum of ~1.8e308 with a precision of roughly 14 decimal digits is a common value (that's 64 bit IEEE format).


Floating point precision
It is quite usual that simple decimal fractions like 0.1 or 0.7 cannot be converted into their internal binary counterparts without a little loss of precision. This can lead to confusing results: for example, floor((0.1+0.7)*10) will usually return 7 instead of the expected 8 as the result of the internal representation really being something like 7.9999999999....

This is related to the fact that it is impossible to exactly express some fractions in decimal notation with a finite number of digits. For instance, 1/3 in decimal form becomes 0.3333333. . ..

So never trust floating number results to the last digit and never compare floating point numbers for equality. If you really need higher precision, you should use the arbitrary precision math functions or gmp functions instead.


Converting to float
For information on when and how strings are converted to floats, see the section titled String conversion to numbers. For values of other types, the conversion is the same as if the value would have been converted to integer and then to float. See the Converting to integer section for more information. As of PHP 5, notice is thrown if you try to convert object to float.

Integers

Integers
An integer is a number of the set Z = {..., -2, -1, 0, 1, 2, ...}.

See also: Arbitrary length integer / GMP, Floating point numbers, and Arbitrary precision / BCMath

Syntax
Integers can be specified in decimal (10-based), hexadecimal (16-based) or octal (8-based) notation, optionally preceded by a sign (- or +).

If you use the octal notation, you must precede the number with a 0 (zero), to use hexadecimal notation precede the number with 0x. ????? 11-1. Integer literals

$a = 1234; // decimal number
$a = -123; // a negative number
$a = 0123; // octal number (equivalent to 83 decimal)
$a = 0x1A; // hexadecimal number (equivalent to 26 decimal)
?>

Formally the possible structure for integer literals is:


decimal : [1-9][0-9]*
| 0

hexadecimal : 0[xX][0-9a-fA-F]+

octal : 0[0-7]+

integer : [+-]?decimal
| [+-]?hexadecimal
| [+-]?octal



The size of an integer is platform-dependent, although a maximum value of about two billion is the usual value (that's 32 bits signed). PHP does not support unsigned integers.


?????
If an invalid digit is passed to octal integer (i.e. 8 or 9), the rest of the number is ignored. ????? 11-2. Octal weirdness

var_dump(01090); // 010 octal = 8 decimal
?>




Integer overflow
If you specify a number beyond the bounds of the integer type, it will be interpreted as a float instead. Also, if you perform an operation that results in a number beyond the bounds of the integer type, a float will be returned instead.


$large_number = 2147483647;
var_dump($large_number);
// output: int(2147483647)

$large_number = 2147483648;
var_dump($large_number);
// output: float(2147483648)

// it's true also for hexadecimal specified integers between 2^31 and 2^32-1:
var_dump( 0xffffffff );
// output: float(4294967295)

// this doesn't go for hexadecimal specified integers above 2^32-1:
var_dump( 0x100000000 );
// output: int(2147483647)

$million = 1000000;
$large_number = 50000 * $million;
var_dump($large_number);
// output: float(50000000000)
?>



?????
Unfortunately, there was a bug in PHP so that this does not always work correctly when there are negative numbers involved. For example: when you do -50000 * $million, the result will be -429496728. However, when both operands are positive there is no problem.

This is solved in PHP 4.1.0.



There is no integer division operator in PHP. 1/2 yields the float 0.5. You can cast the value to an integer to always round it downwards, or you can use the round() function.


var_dump(25/7); // float(3.5714285714286)
var_dump((int) (25/7)); // int(3)
var_dump(round(25/7)); // float(4)
?>



Converting to integer
To explicitly convert a value to integer, use either the (int) or the (integer) cast. However, in most cases you do not need to use the cast, since a value will be automatically converted if an operator, function or control structure requires an integer argument. You can also convert a value to integer with the function intval().

See also type-juggling.

From booleans
FALSE will yield 0 (zero), and TRUE will yield 1 (one).

From floating point numbers
When converting from float to integer, the number will be rounded towards zero.

If the float is beyond the boundaries of integer (usually +/- 2.15e+9 = 2^31), the result is undefined, since the float hasn't got enough precision to give an exact integer result. No warning, not even a notice will be issued in this case!


?????
Never cast an unknown fraction to integer, as this can sometimes lead to unexpected results.


echo (int) ( (0.1+0.7) * 10 ); // echoes 7!
?>


See for more information the warning about float-precision.

Booleans

This is the easiest type. A boolean expresses a truth value. It can be either TRUE or FALSE.

????: The boolean type was introduced in PHP 4.

Syntax
To specify a boolean literal, use either the keyword TRUE or FALSE. Both are case-insensitive.


$foo = True; // assign the value TRUE to $foo
?>



Usually you use some kind of operator which returns a boolean value, and then pass it on to a control structure.


// == is an operator which test
// equality and returns a boolean
if ($action == "show_version") {
echo "The version is 1.23";
}

// this is not necessary...
if ($show_separators == TRUE) {
echo "
\n";
}

// ...because you can simply type
if ($show_separators) {
echo "
\n";
}
?>



Converting to boolean
To explicitly convert a value to boolean, use either the (bool) or the (boolean) cast. However, in most cases you do not need to use the cast, since a value will be automatically converted if an operator, function or control structure requires a boolean argument.

See also Type Juggling.

When converting to boolean, the following values are considered FALSE:


the boolean FALSE itself

the integer 0 (zero)

the float 0.0 (zero)

the empty string, and the string "0"

an array with zero elements

an object with zero member variables (PHP 4 only)

the special type NULL (including unset variables)

SimpleXML objects created from empty tags

Every other value is considered TRUE (including any resource).

?????
-1 is considered TRUE, like any other non-zero (whether negative or positive) number!



var_dump((bool) ""); // bool(false)
var_dump((bool) 1); // bool(true)
var_dump((bool) -2); // bool(true)
var_dump((bool) "foo"); // bool(true)
var_dump((bool) 2.3e5); // bool(true)
var_dump((bool) array(12)); // bool(true)
var_dump((bool) array()); // bool(false)
var_dump((bool) "false"); // bool(true)
?>

PHP supports eight primitive types.

Four scalar types:


boolean

integer

float (floating-point number, aka 'double')

string

Two compound types:

array

object

And finally two special types:

resource

NULL

This manual also introduces some pseudo-types for readability reasons:

mixed

number

callback

You may also find some references to the type "double". Consider double the same as float, the two names exist only for historic reasons.

The type of a variable is usually not set by the programmer; rather, it is decided at runtime by PHP depending on the context in which that variable is used.

????: If you want to check out the type and value of a certain expression, use var_dump().

????: If you simply want a human-readable representation of the type for debugging, use gettype(). To check for a certain type, do not use gettype(), but use the is_type functions. Some examples:


$a_bool = TRUE; // a boolean
$a_str = "foo"; // a string
$a_str2 = 'foo'; // a string
$an_int = 12; // an integer

echo gettype($a_bool); // prints out: boolean
echo gettype($a_str); // prints out: string

// If this is an integer, increment it by four
if (is_int($an_int)) {
$an_int += 4;
}

// If $bool is a string, print it out
// (does not print out anything)
if (is_string($a_bool)) {
echo "String: $a_bool";
}
?>



If you would like to force a variable to be converted to a certain type, you may either cast the variable or use the settype() function on it.

Note that a variable may be evaluated with different values in certain situations, depending on what type it is at the time. For more information, see the section on Type Juggling. Also, you may be interested in viewing the type comparison tables, as they show examples of various type related comparisons.