I have a need to remove multiple lines except the first line in a string. For example, “line1\n\line2\nline3” should be converted to “line1”. At the beginning, I used the following regex:
s=s.replace(/\n.*$/,"");
Unfortunately, only the last line was removed. The result was “line1\nline2”. I heard about greedy match and lazy match about regex. Is it because the regex is not greedy enough to match the last two lines? I resorted to the following regular expression:
s=s.replace(/\n.*?$/,"");
Unfortunately, the result was the same, only the last line was deleted. In fact, the ? after the quantifier * is used to specify the lazy mode of the regex. Without ?, the regular expression is in greedy mode by default. So, why the regex cannot replace multiple lines in the subject string?
It turns out the dot character does not match the new line character. When the regular expression engine tries to find a match at the end of the first line, it claims to fail the current search when it reads the second new line character at the end of the second line(line2). The regular expression engine moves forward. Until it reaches the second new line character can it find a match, which is why only the last line was matched and replaced.
The correct way to match and replace multiple lines is to use the following regex:
s=s.replace(/\n[sS]*$/,"");
s in [] means white space, S in [] means non-whitespace. [sS] means matching any character including new line character. The regular expression engine will find a match at the first new line character, and all lines after the first line will be matched and removed.
This article is a good tutorial on the matching process and the greedy/lazy mode of javascript regular expression.