What is lexemes?

In compiler design, lexemes are the smallest meaningful units or atomic elements in the source code that the lexical analyzer (lexer) recognizes. Let me explain this concept with examples and details.

A lexeme is the actual character sequence that matches a pattern defined by a token. For example:

counter = initial_value + 100

In this line, we have several lexemes:

  1. "counter" (identifier lexeme)
  2. "=" (assignment operator lexeme)
  3. "initial_value" (identifier lexeme)
  4. "+" (addition operator lexeme)
  5. "100" (numeric literal lexeme)

The lexical analyzer processes these lexemes and converts them into tokens, which typically include:

Here's how different lexemes map to token categories:

  1. Keywords: lexemes like "if", "while", "for", "class"
  2. Identifiers: variable names, function names
  3. Operators: +, -, *, /, =, ==, !=
  4. Literals:
  5. Delimiters: (, ), {, }, [, ]