Lexer

Basic functionality for tokenizing CMake code into its syntactic constituents.

class cmake_parser.lexer.Token(kind, value, span, line, column)

Parser token.

Instances of this class are yielded by tokenize() and represent the syntactic primitives of CMake code.

Parameters:
  • kind (str) – the token type. This attribute can take one of the string constants COMMENT, RAW, QUOTED, BRACKETED, LPAREN, RPAREN, or SEMICOLON. Two additional token types, UNMATCHED_BRACKET and UNPARSEABLE, can occur in malformed CMake code.

  • value (str) – the literal text that comprises the token, without the enclosing quotes or brackets (if applicable).

  • span (slice) – the token location in the tokenized string.

  • line (int) – the line number where the token begins. Some tokens may span multiple lines.

  • column (int) – the column where the token begins.

cmake_parser.lexer.tokenize(data)

Split CMake code into parser tokens.

Usually, you will not call this function directly but through parse_raw() or parse_tree(). tokenize() splits the input string into meaningful chunks for the parser. It will not resolve variable references nor split arguments yet; the dynamic nature of the CMake language requires this be handled at a later stage.

Parameters:

data (str) – a string containing CMake code.

Return type:

Generator[Token, None, None]

Returns:

a generator that yields the parser tokens as they occur in the code.

>>> list((t.kind, t.value) for t in tokenize('foo("bar")'))
[('RAW', 'foo'), ('LPAREN', '('), ('QUOTED', 'bar'), ('RPAREN', ')')]