It is perfectly valid to start a regex with ^ and have other patterns
with \| that can match more than once, e.g. the following example
should print ca, as illustrated with gnu sed:
$ echo 'abca' | sed -e 's/^a\|b//g'
ca
busybox before patch:
$ echo 'abca' | busybox sed -e 's/^a\|b//g'
bca
busybox after patch:
$ echo 'abca' | ./busybox sed -e 's/^a\|b//g'
ca
regcomp handles ^ perfectly well as illustrated with the second 'a' that
did not match in the example, we ca leave the non-repeating to it if
appropriate.
The check had been added before using regcomp and was required at the
time (f36635cec6da) but no longer makes sense now.
(tested with glibc and musl libc)
function old new delta
add_cmd 1189 1176 -13
Signed-off-by: Dominique Martinet <asmadeus@codewreck.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
The '%' character in a format specification may be followed by
one or more flags from the list "+- #0". BusyBox printf didn't
support the '0' flag or allow multiple flags to be provided.
As a result the formats '%0*d' and '%0 d' were considered to be
invalid.
The lack of support for '0' was pointed out by Andrew Snyder on the
musl mailing list:
https://www.openwall.com/lists/musl/2021/12/14/2
function old new delta
printf_main 860 891 +31
.rodata 99281 99282 +1
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 32/0) Total: 32 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Add some tests which coreutils realpath pass but BusyBox realpath
fails (bar one). Adjust xmalloc_realpath_coreutils() so the tests
pass:
- Expand symbolic links before testing whether the last path component
exists.
- When the link target is a relative path canonicalize it by passing
it through xmalloc_realpath_coreutils() as already happens for
absolute paths.
- Ignore trailing slashes when finding the last path component and
correctly handle the case where the only slash is at the start of
the path. This requires ignoring superfluous leading slashes.
- Undo all changes to the path so error messages from the caller show
the original filename.
function old new delta
xmalloc_realpath_coreutils 214 313 +99
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Allow ISO 8601 style dates to include a timezone offset. Like
the '@' format these dates aren't relative to the user's current
timezone and shouldn't be subject to DST adjustment.
- The implementation uses the strptime() '%z' format specifier.
This an extension which may not be available so the use of
timezones is a configuration option.
- The 'touch' applet has been updated to respect whether DST
adjustment is required, matching 'date'.
function old new delta
parse_datestr 624 730 +106
static.fmt_str 106 136 +30
touch_main 388 392 +4
date_main 818 819 +1
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 4/0 up/down: 141/0) Total: 141 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
A refactor of the awk printf code in
e2e3802987266c98df0efdf40ad5da4b07df0113
appears to have broken the printf interpretation of two percent signs,
which normally outputs only one percent sign.
The patch below brings busybox awk printf behavior back into alignment
with the pre-e2e380 behavior, the busybox printf util, and other common
(awk and non-awk) printf implementations.
function old new delta
awk_printf 626 672 +46
Signed-off-by: Daniel Thau <danthau at bedrocklinux.org>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
The input buffer is initialised to a reasonable size and extended
if necessary. When this happened the offset into the buffer wasn't
reset to zero so subsequent lines were appended to the long line.
Fix this and add some tests.
function old new delta
rev_main 377 368 -9
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-9) Total: -9 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
When the '-t DIR' option is used the loop over the remaining
arguments should terminate when a NULL pointer is reached.
function old new delta
mv_main 585 590 +5
cp_main 492 496 +4
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 9/0) Total: 9 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Testcase:
21 01 01 00 00 00 00 00 e7 01 01 01 ef 00 df b6
00 17 02 10 11 0f ff 00 16 00 00
Unfortunately, the bug is not reliably causing a segfault,
the behavior depends on what's in memory before the buffer.
function old new delta
unpack_lzma_stream 2762 2768 +6
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
The command 'nl -b n' should output no line numbers, just some
spaces as a placeholder followed by the actual file content.
Add tests for line numbering by cat and nl. The correct results
were obtained from coreutils.
function old new delta
print_numbered_lines 152 157 +5
.rodata 182456 182453 -3
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 1/1 up/down: 5/-3) Total: 2 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Treat the output of printf as binary rather than a null-terminated
string so that NUL characters can be output.
This is considered to be a GNU extension, though it's also available
in mawk and FreeBSD's awk.
function old new delta
evaluate 3487 3504 +17
awk_printf 504 519 +15
------------------------------------------------------------------------------
(add/remove: 0/0 grow/shrink: 2/0 up/down: 32/0) Total: 32 bytes
Signed-off-by: Ron Yorston <rmy@pobox.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
If we have a square, the speedup can be extremely large
(in best case example below, it's ~40000 times faster):
$ time ./busybox_old factor 18446743988964486098
18446743988964486098: 2 3037000493 3037000493
real 0m4.246s
$ time ./busybox factor 18446743988964486098
18446743988964486098: 2 3037000493 3037000493
real 0m0.000s
function old new delta
isqrt_odd - 57 +57
print_w - 36 +36
factorize 218 236 +18
print_h - 7 +7
factorize_numstr 65 72 +7
------------------------------------------------------------------------------
(add/remove: 3/0 grow/shrink: 2/0 up/down: 125/0) Total: 125 bytes
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
This is a recent change in GNU grep as well (after 3.1)
function old new delta
grep_file 1215 1228 +13
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Prior to the patch, both -f and --first-only are in all cases either
no-op or ignored.
Without --tabs, --first-only is the default so specifying it is a no-op.
With --tabs, --all is implied, and --first-only is intended to reset this.
function old new delta
expand_main 690 694 +4
Signed-off-by: Mark Edgar <medgar123@gmail.com>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
From POSIX.1-2008:
The pattern_list's value shall consist of one or more patterns
separated by <newline> characters;
As such, given patterns need to be split at newline characters. Without
doing so, busybox grep will interpret the newline as part of the pattern
which is not in accordance with POSIX.
See also: https://bugs.busybox.net/show_bug.cgi?id=12721
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>
Grep currently special-cased empty pattern file to be the same as
pattern file with one empty line (empty pattern). That does mirror how
GNU grep behaves, except when -x is provided. In that case .* pattern
needs to be used instead.
Signed-off-by: Gray Wolf <wolf@wolfsden.cz>
Signed-off-by: Denys Vlasenko <vda.linux@googlemail.com>