Lexer¶
Basic functionality for tokenizing CMake code into its syntactic constituents.
- class cmake_parser.lexer.Token(kind, value, span, line, column)¶
Parser token.
Instances of this class are yielded by
tokenize()
and represent the syntactic primitives of CMake code.- Parameters:
kind (
str
) – the token type. This attribute can take one of the string constantsCOMMENT
,RAW
,QUOTED
,BRACKETED
,LPAREN
,RPAREN
, orSEMICOLON
. Two additional token types,UNMATCHED_BRACKET
andUNPARSEABLE
, can occur in malformed CMake code.value (
str
) – the literal text that comprises the token, without the enclosing quotes or brackets (if applicable).span (
slice
) – the token location in the tokenized string.line (
int
) – the line number where the token begins. Some tokens may span multiple lines.column (
int
) – the column where the token begins.
- cmake_parser.lexer.tokenize(data)¶
Split CMake code into parser tokens.
Usually, you will not call this function directly but through
parse_raw()
orparse_tree()
.tokenize()
splits the input string into meaningful chunks for the parser. It will not resolve variable references nor split arguments yet; the dynamic nature of the CMake language requires this be handled at a later stage.- Parameters:
data (
str
) – a string containing CMake code.- Return type:
Generator
[Token
,None
,None
]- Returns:
a generator that yields the parser tokens as they occur in the code.
>>> list((t.kind, t.value) for t in tokenize('foo("bar")')) [('RAW', 'foo'), ('LPAREN', '('), ('QUOTED', 'bar'), ('RPAREN', ')')]