Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Lexical structure

A sequence of Unicode characters is translated into a sequence of tokens. The following rules apply during translation:

  1. Comments are treated as though they are a single space U+20.
  2. Spaces are ignored unless they appear between the opening and closing delimiters of a character or string literal. Unicode characters with the “White_Space” property are recognized as spaces.
  3. New-line delimiters are ignored unless they appear between the opening and closing delimiters of a character or string literal. Unicode characters with the “Line_Break” property are recognized as new-line delimiters.

Comments

The character sequence // starts a single-line comment, which terminates immediately before the next new-line delimiter.

The character sequences /* and */ are multiline comment opening and closing delimiters, respectively. A multiline comment opening delimiter starts a comment that terminates immediately after a matching closing delimiter. Each opening delimiter must have a matching closing delimiter. Multiline comments may nest and need not contain any new-line characters.

Note

The character sequences // have no special meaning in a multiline comment. The character sequences /* and */ have no special meaning in a single-line comment. The character sequences // and /* have no special meaning in a string literal. String and character literal delimiters have no special meaning in a comment.

Tokens

A token is a terminal symbol of the syntactic grammar. It falls into one of five categories:

CategoryDescriptionExamples
LiteralsFixed values in the source code42, "hello", true
KeywordsReserved words with special meaningif, else, while
IdentifiersNames given to entities (variables, functions, etc.)foo, bar, count
OperatorsSymbols representing computations or logic+, -, *, ==
DelimitersSymbols used for grouping and structure(, ), {, }

Note

The input a << b is translated to a sequence of 4 tokens: identifier, raw-operator, raw-operator, identifier.

Unless otherwise specified, the token recognized at a given lexical position is the one having the longest possible sequence of characters.


Literals

Literals are tokens representing fixed values in the source code. The language supports the following types of literals:

Literal TypeDescriptionStandard ExamplesSpecial/Escaped Examples
IntegerDecimal and hexadecimal formats1230x1A, 0x_FF_00
Floating-PointLike an integer literal but includes a decimal point .3.14159, 0.5
BooleanRepresent truth valuestrue, false
CharacterSingle Unicode scalar value enclosed in single quotes'a', 'R''\n', '\t', '\'', '\u{1F600}'
StringSequence of characters enclosed in double quotes"hello world"r"C:\Path", r#"He said, "Hello!""#
C-Style StringNull-terminated string literal identified by the @c qualifier@c"hello"@c"c style string"

Keywords

Keywords are reserved words that have special meaning in the language. They cannot be used as identifiers. The language defines the following keywords:

Strict Keywords
asenummatchtrue
breakfalsemodtype
constfnmutuse
continueforpubwhere
ifrefwhileelse
returnletstruct

Note: Some keywords might be reserved for future use or macro rules.

Identifiers

An identifier is a name used to identify a variable, function, class, module, or other user-defined item.

Identifiers must follow these rules:

  1. They must begin with a letter (a-z, A-Z) or an underscore _.
  2. Subsequent characters may be letters, digits (0-9), or underscores _.
  3. They cannot match one of the reserved keywords.
StatusExampleReason
Validmy_var, _private, Count1Follows all rules.
Invalid1stPlaceCannot start with a digit.
Invalidmy-varHyphens are not allowed (parsed as subtraction).
Invalidifif is a reserved keyword.

Operators

Operators are special symbols or combinations of symbols used to perform operations on values or variables.

CategoryOperatorsDescription
Arithmetic+, -, *, /, %, ++, --Standard mathematical operations, increment, and decrement.
Bitwise&, |, ^, <<, >>Operations on the bit level.
Logical&&, ||, !Boolean logic (AND, OR, NOT).
Comparison==, !=, <, >, <=, >=Relational equality and ordering.
Assignment=, +=, -=, *=, /=, etc.Assigns values, optionally compound.
Structural., =>Member access, matching.

Delimiters

Delimiters are punctuation characters used to group tokens, separate lists, or define structure.

DelimiterNamePrimary Usage
( )ParenthesesFunction calls, grouping expressions, tuples.
{ }BracesBlock expressions, struct definitions, matching bodies.
[ ]BracketsArray indexing, slices, array literals.
,CommaSeparating arguments, tuple elements, and list items.
;SemicolonTerminating statements.
:ColonType annotations, struct field initialization, return types.
::Double ColonPath separation (namespaces, modules).