sed: do not ignore 'g' modifier when match starts with ^

It is perfectly valid to start a regex with ^ and have other patterns
with \| that can match more than once, e.g. the following example
should print ca, as illustrated with gnu sed:
$ echo 'abca' | sed -e 's/^a\|b//g'
ca

busybox before patch:
$ echo 'abca' | busybox sed -e 's/^a\|b//g'
bca

busybox after patch:
$ echo 'abca' | ./busybox sed -e 's/^a\|b//g'
ca

regcomp handles ^ perfectly well as illustrated with the second 'a' that
did not match in the example, we ca leave the non-repeating to it if
appropriate.
The check had been added before using regcomp and was required at the
time (f36635cec6da) but no longer makes sense now.

(tested with glibc and musl libc)

function                                             old     new   delta
add_cmd                                             1189    1176     -13

Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
This commit is contained in:
Dominique Martinet 2021-12-21 21:52:29 +09:00 committed by Denys Vlasenko
parent a05a3d5932
commit 4fe954c148
2 changed files with 7 additions and 2 deletions

View File

@ -435,8 +435,7 @@ static int parse_subst_cmd(sed_cmd_t *sed_cmd, const char *substr)
switch (substr[idx]) { switch (substr[idx]) {
/* Replace all occurrences */ /* Replace all occurrences */
case 'g': case 'g':
if (match[0] != '^') sed_cmd->which_match = 0;
sed_cmd->which_match = 0;
break; break;
/* Print pattern space */ /* Print pattern space */
case 'p': case 'p':

View File

@ -399,6 +399,12 @@ testing "sed uses previous regexp" \
"" \ "" \
"q\nw\ne\nr\n" "q\nw\ne\nr\n"
testing "sed ^ OR not^" \
"sed -e 's/^a\|b//g'" \
"ca\n" \
"" \
"abca\n"
# testing "description" "commands" "result" "infile" "stdin" # testing "description" "commands" "result" "infile" "stdin"
exit $FAILCOUNT exit $FAILCOUNT