Lexer¶

Basic functionality for tokenizing CMake code into its syntactic constituents.

class cmake_parser.lexer.Token(kind, value, span, line, column)¶

Parser token.

Instances of this class are yielded by tokenize() and represent the syntactic primitives of CMake code.

Parameters:

kind (str) – the token type. This attribute can take one of the string constants COMMENT, RAW, QUOTED, BRACKETED, LPAREN, RPAREN, or SEMICOLON. Two additional token types, UNMATCHED_BRACKET and UNPARSEABLE, can occur in malformed CMake code.
value (str) – the literal text that comprises the token, without the enclosing quotes or brackets (if applicable).
span (slice) – the token location in the tokenized string.
line (int) – the line number where the token begins. Some tokens may span multiple lines.
column (int) – the column where the token begins.

cmake_parser.lexer.tokenize(data)¶

Split CMake code into parser tokens.

Usually, you will not call this function directly but through parse_raw() or parse_tree(). tokenize() splits the input string into meaningful chunks for the parser. It will not resolve variable references nor split arguments yet; the dynamic nature of the CMake language requires this be handled at a later stage.

Parameters:: data (str) – a string containing CMake code.
Return type:: Generator[Token, None, None]
Returns:: a generator that yields the parser tokens as they occur in the code.

>>> list((t.kind, t.value) for t in tokenize('foo("bar")'))
[('RAW', 'foo'), ('LPAREN', '('), ('QUOTED', 'bar'), ('RPAREN', ')')]

Lexer¶

Table of Contents

Previous topic

Next topic

This Page