Sunday, July 5, 2009

Finding a string within a string

Strpos(), and its case-insensitive sibling stripos(), returns the index of the first occurrence of a substring within a string. It is easier to explain in code, so here goes:
$string = "This is a strpos() test";
print strpos($string, "a") . "\n";
?>

That will return 8, because the first character in "This is a strpos() test" that is a lowercase A is at index 8. Remember that PHP considers the first letter of a string to be index 0, which means that the A strpos() found is actually the ninth character.

You can specify whole words in parameter two, which will make strpos() return the first position of that word within the string, for example strpos($string, "test") would return 19 - the index of the first letter in the matched word.

If the substring sent in parameter two is not found in parameter one, strpos() will return false. Consider this script:
$string = "This is a strpos() test";
$pos = strpos($string, "This");
if ($pos == false) {
print "Not found\n";
} else {
print "Found!\n";
}
?>

If you try executing that, you will find that it outputs "Not found", despite "This" quite clearly being in $string. Is it another case sensitivity problem? Not quite. This time the problem lies in the fact that "This" is the first thing in $string, which means that strpos() will return 0. However, PHP considers 0 to be the same value as false, which means that our if statement cannot tell the difference between "Substring not found" and "Substring found at index 0" - quite a problem!

Luckily, PHP comes to the rescue with the === operator, which, if you recall, means "is identical to", which means $pos must be equal to false and of the same type as false (boolean). If the "This" is found in $string, strpos()will return 0, but it will be of type integer . If we change our if statement to use === rather than ==, PHP will check the value of 0 and false and find they match (both false), then check the types of 0 and false, and find that they do not match - the former is an integer, and the latter is a boolean.

So, the corrected version of the script is this:
$string = "This is a strpos() test";
$pos = strpos($string, "This");
if ($pos === false) {
print "Not found\n";
} else {
print "Found!\n";
}
?>

Now, consider this next script, which tries to match the "i" in "is":
$string = "This is a strpos() test";
$pos = strpos($string, "i");
if ($pos === false) {
print "Not found\n";
} else {
print "Found at $pos!\n";
}
?>

The problem there is that strpos() matches the first "i" it comes across, which will be in "This". Fortunately there is a third parameter to strpos() that allows us to specify where to start from. As the "i" in "This" is at index 2, we just need to specify one place after that (3) as the start position for strpos(), and it will report back the next "i" after it. For example:
$string = "This is a strpos() test";
$pos = strpos($string, "i", 3);
if ($pos === false) {
print "Not found\n";
} else {
print "Found at $pos!\n";
}
?>

This time that will print "found at 5!", which is the position of the "i" in "is".

No comments:

Post a Comment