fix: rewrite GNU testsuite harness to run upstream test scripts directly
The previous harness tried to extract sed commands from GNU test scripts
via regex pattern matching, which produced false negatives (comparing
against empty expected output) and false positives. This led to inflated
test counts and unreliable pass/fail signals.
The new approach:
- Provides a lightweight shim for the gnulib test framework (init.sh)
with implementations of compare_, returns_, skip_, framework_failure_,
and all require_* functions
- Executes each .sh test script from the GNU testsuite directly,
injecting our Rust sed binary via PATH
- Uses a clean srcdir with symlinks to real test data files
- Adds per-test timeout (10s) to catch infinite loops, with SIGTERM
isolation so timeout signals don't kill the parent script
- Properly propagates exit codes (0=pass, 77=skip, 99=framework failure)
Results are now consistent with CI tracking (~12% pass rate) with
clear PASS/FAIL/SKIP/timeout categorization.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>