[ELF] Reject error-prone meta characters in input section description

The lexer is overly permissive. When parsing file patterns in an input
section description and there is a missing `)`, we would accept many
non-sensible tokens (e.g. `}`) as patterns, leading to confusion, e.g.
`*(SORT_BY_ALIGNMENT(SORT_BY_NAME(.text*)) } PROVIDE_HIDDEN(__code_end = .)`
(#81804).

Ideally, the lexer should be stateful to report more errors like GNU ld
and get rid of hacks like `ScriptLexer::maybeSplitExpr`, but that would
require a large rewrite of the lexer. For now, just reject certain
non-wildcard meta characters to detect common mistakes.

Pull Request: https://github.com/llvm/llvm-project/pull/84130
This commit is contained in:
Fangrui Song 2024-03-06 17:19:59 -08:00 committed by GitHub
parent 318bff6811
commit 551e20d190
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
2 changed files with 37 additions and 11 deletions

View File

@ -717,9 +717,19 @@ SmallVector<SectionPattern, 0> ScriptParser::readInputSectionsList() {
StringMatcher SectionMatcher;
// Break if the next token is ), EXCLUDE_FILE, or SORT*.
while (!errorCount() && peek() != ")" && peek() != "EXCLUDE_FILE" &&
peekSortKind() == SortSectionPolicy::Default)
while (!errorCount() && peekSortKind() == SortSectionPolicy::Default) {
StringRef s = peek();
if (s == ")" || s == "EXCLUDE_FILE")
break;
// Detect common mistakes when certain non-wildcard meta characters are
// used without a closing ')'.
if (!s.empty() && strchr("(){}", s[0])) {
skip();
setError("section pattern is expected");
break;
}
SectionMatcher.addPattern(unquote(next()));
}
if (!SectionMatcher.empty())
ret.push_back({std::move(excludeFilePat), std::move(SectionMatcher)});

View File

@ -91,26 +91,42 @@ SECTIONS {
.text : { *([.]abc .ab[v-y] ) }
}
## Test a few non-wildcard meta characters rejected by GNU ld.
## Test a few non-wildcard characters rejected by GNU ld.
#--- lbrace.lds
# RUN: ld.lld -T lbrace.lds a.o -o out
# RUN: not ld.lld -T lbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE --match-full-lines --strict-whitespace
# ERR-LBRACE:{{.*}}: section pattern is expected
# ERR-LBRACE-NEXT:>>> .text : { *(.a* { ) }
# ERR-LBRACE-NEXT:>>> ^
SECTIONS {
.text : { *(.a* { ) }
}
#--- lparen.lds
## ( is recognized as a section name pattern. Note, ( is rejected by GNU ld.
# RUN: ld.lld -T lparen.lds a.o -o out
# RUN: llvm-objdump --section-headers out | FileCheck --check-prefix=SEC-NO %s
#--- lbrace2.lds
# RUN: not ld.lld -T lbrace2.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LBRACE2 --match-full-lines --strict-whitespace
# ERR-LBRACE2:{{.*}}: section pattern is expected
# ERR-LBRACE2-NEXT:>>> .text : { *(.a*{) }
# ERR-LBRACE2-NEXT:>>> ^
SECTIONS {
.text : { *(.a* ( ) }
.text : { *(.a*{) }
}
#--- lparen.lds
# RUN: not ld.lld -T lparen.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-LPAREN --match-full-lines --strict-whitespace
# ERR-LPAREN:{{.*}}: section pattern is expected
# ERR-LPAREN-NEXT:>>> .text : { *(.a* ( ) }
# ERR-LPAREN-NEXT:>>> ^
SECTIONS {
.text : { *(.a* ( ) }
}
#--- rbrace.lds
# RUN: ld.lld -T rbrace.lds a.o -o out
# RUN: not ld.lld -T rbrace.lds a.o 2>&1 | FileCheck %s --check-prefix=ERR-RBRACE --match-full-lines --strict-whitespace
# ERR-RBRACE:{{.*}}: section pattern is expected
# ERR-RBRACE-NEXT:>>> .text : { *(.a* x = 3; } ) }
# ERR-RBRACE-NEXT:>>> ^
SECTIONS {
.text : { *(.a* } ) }
.text : { *(.a* x = 3; } ) }
}
#--- rparen.lds