Archive
This post is archived and may contain outdated information. It has been set to 'noindex' and should stop showing up in search results.
Working with cross-platform newline characters & user form input
Sep 26, 2011ProgrammingComments (1)
If your site accepts user input, it's a good idea to understand the differences in newline characters between platforms and how to handle them (and it's just good to know in general). Say your site is running on a Unix host and receives comments or posts from Windows clients. Any HTML textarea form input sent from the Windows clients will contain newline characters that do not match those native to the Unix host. Likewise, if your site runs on a Windows host, the occasional Linux or Mac user will be sending non-native newline characters as well.

Here are the newline characters that each major system uses:

PlatformLine Break
Windows\r\n (Carriage Return & Line Feed)
Unix/Linux/MacOS X\n (Line Feed)
MacOS 9 and earlier\r (Carriage Return)

(See this Wiki and this Stack Overflow post for a run-down on what carriage returns and line feeds are)

A typical scenario is that you'll want to replace all newline characters with HTML break tags, or strip them out completely. You can't rely on PHP's constant PHP_EOL, as it only takes the form of the line break of the server it's running on. You have to search for each type of line break individually (or use a regular expression, but that may be slower).

What is easiest is to normalize all input your system receives, so you don't have to worry about your database containing different types of line breaks. The Unix \n is very common and only takes up one character (versus two for Windows \r\n) so that is a good one to normalize to.

Here is an example PHP script that will replace the other two line breaks with the Unix one:

$str = str_replace("\r\n", "\n", $str);
$str = str_replace("\r", "\n", $str);

Make sure you use double quotes, as PHP will not interpret escaped characters with single quotes. Also make sure that you look for "\r\n" first, and then "\r".

You can also do it this way:

$arr = array("\r\n", "\r")
$str = str_replace($arr, "\n", $str);

When using an array with str_replace, it will perform the replacements in the order that they appear in the array.

Since we're talking about line breaks with user input, you'll probably need a way to prevent users from submitting more than two line breaks in a row. Here is a simple preg_replace you can use after you've converted all line breaks to \n:

$str = preg_replace("!\n{3,}!", "\n\n", $str);
This matches any sequence of three or more newline characters in a row, and replaces them with two newline characters. It won't prevent a user from making lots of newline characters in between lots of short sections of text though.
Comments (1)
Add a Comment
Ayesh Karunaratne   Feb 19, 2012
Thanks for the cool post. I was testing some huge script without actually making sure that I used correct line endings in the input. Wasted more than 1 hour and finally found here! Thanks a million times!