Lexer¶
Basic functionality for tokenizing CMake code into its syntactic constituents.
- class cmake_parser.lexer.Token(kind, value, span, line, column)¶
Parser token.
Instances of this class are yielded by
tokenize()and represent the syntactic primitives of CMake code.- Parameters:
kind (
str) – the token type. This attribute can take one of the string constantsCOMMENT,RAW,QUOTED,BRACKETED,LPAREN,RPAREN, orSEMICOLON. Two additional token types,UNMATCHED_BRACKETandUNPARSEABLE, can occur in malformed CMake code.value (
str) – the literal text that comprises the token, without the enclosing quotes or brackets (if applicable).span (
slice) – the token location in the tokenized string.line (
int) – the line number where the token begins. Some tokens may span multiple lines.column (
int) – the column where the token begins.
- cmake_parser.lexer.tokenize(data)¶
Split CMake code into parser tokens.
Usually, you will not call this function directly but through
parse_raw()orparse_tree().tokenize()splits the input string into meaningful chunks for the parser. It will not resolve variable references nor split arguments yet; the dynamic nature of the CMake language requires this be handled at a later stage.- Parameters:
data (
str) – a string containing CMake code.- Return type:
Generator[Token,None,None]- Returns:
a generator that yields the parser tokens as they occur in the code.
>>> list((t.kind, t.value) for t in tokenize('foo("bar")')) [('RAW', 'foo'), ('LPAREN', '('), ('QUOTED', 'bar'), ('RPAREN', ')')]