Editors Choice

3/recent/post-list

CODASK PARSER - RECURSIVE DESCENT PARSING

Exploring the CODASK Parser: Non-Terminals and Consume Functions

What is Top-Down Recursive Descent Parsing?

Top-down recursive descent parsing is a method to check if code follows a programming language’s rules. It starts with the entire program and breaks it into smaller parts, like functions, loops, or variables. Each part is handled by a function that may call others to process nested structures, ensuring the code is valid by checking tokens one at a time. In CODASK, this approach efficiently validates the language’s syntax.

Non-Terminals as Functions

In the CODASK language, the grammar is defined by non-terminals (like “program,” “statement,” or “expression”). In top-down recursive descent parsing, each non-terminal is implemented as a function in the parser. For example, the “program” non-terminal is handled by parseProgram(), and “expression” is handled by parseExpression(). These functions check if the code matches the grammar rules, calling each other to handle nested structures, making the parser modular and capable of processing complex code.

Role of Consume Functions

The CODASK parser uses special “consume” functions to validate and process tokens during parsing. These functions ensure that the current token matches the expected token, type, or keyword, advancing the parser’s position if valid or throwing an error if not. They are essential for enforcing the grammar rules:

  • consume(expectedTokenName): Checks if the current token’s name (e.g., LPAREN, COLON) matches the expected name, advancing if valid or throwing an error.
  • consumeType(expectedTokenType): Verifies the current token’s type (e.g., IDENTIFIER, STRING) matches, advancing or throwing an error.
  • consumeKeyword(keyword): Ensures the current token is a specific keyword (e.g., MAIN, SUPPOSE), advancing if it matches or throwing an error.
  • consumeLiteral(): Confirms the current token is a literal (e.g., INTEGER, STRING), advancing or throwing an error.

Functions in the CODASK Scanner

  • isValidIdentifier(str): Checks if a string is a valid identifier and assigns a unique token name.
  • isValidFloat(str): Validates if a string represents a floating-point number.
  • isValidInt(str): Validates if a string represents an integer.
  • isBooleanLiteral(str): Checks if a string is a boolean literal (true/false).
  • isKeyword(str): Determines if a string is a reserved keyword.
  • isOperator(str): Identifies if a string is a valid operator or punctuation.
  • isString(str): Validates if a string is a properly quoted string literal.
  • isComment(str): Checks if a string is a comment enclosed in % symbols.

Functions in the CODASK Parser

  • currentToken(): Returns the current token, skipping comments.
  • consume(expectedTokenName): Consumes a token if it matches the expected name, else throws an error.
  • consumeType(expectedTokenType): Consumes a token if it matches the expected type, else throws an error.
  • consumeKeyword(keyword): Consumes a keyword token, ensuring it matches the specified keyword.
  • consumeLiteral(): Consumes a literal token (e.g., INTEGER, STRING).
  • parseProgram(): Parses the entire program, handling imports, function definitions, and the main function.
  • parseImportStmt(): Parses an import statement (IMPORT STRING :).
  • parseMainFunction(): Parses the main function definition.
  • parseParameters(): Parses a list of function parameters.
  • parseParameter(): Parses a single parameter (data type and identifier).
  • parseStatements(): Parses a sequence of statements within a block.
  • parseStatement(): Parses a single statement (e.g., declaration, assignment, control, etc.).
  • parseDataType(): Parses a data type keyword (e.g., INTEGER, STRING).
  • parseDeclaration(isConstant): Parses a variable or constant declaration.
  • parseAssignment(): Parses an assignment statement, including increment/decrement.
  • parseInputStmt(): Parses a read statement for input.
  • parseOutputStmt(): Parses a display statement for output.
  • parseExpression(): Parses an expression, starting with logical OR.
  • parseLogicalOrExpression(): Parses logical OR expressions (||).
  • parseLogicalAndExpression(): Parses logical AND expressions (&&).
  • parseEqualityExpression(): Parses equality expressions (==, !=).
  • parseRelationalExpression(): Parses relational expressions (<, >, <=, >=).
  • parseAdditiveExpression(): Parses additive expressions (+, -).
  • parseMultiplicativeExpression(): Parses multiplicative expressions (*, /, %).
  • parseUnaryExpression(): Parses unary expressions (literals, identifiers, function calls, etc.).
  • parseArrayAccess(): Parses array access expressions (e.g., arr[idx]).
  • parseControlStmt(): Parses control statements (SUPPOSE or MATCH).
  • parseSupposeStmt(): Parses a SUPPOSE (if) statement with optional OTHERWISE.
  • parseOtherwiseClause(): Parses an OTHERWISE (else/else if) clause.
  • parseMatchStmt(): Parses a MATCH (switch) statement.
  • parseMatchCases(): Parses cases within a MATCH statement.
  • parseMatchCase(): Parses a single MATCH case (OPTION or ESCAPE).
  • parseLoopStmt(): Parses loop statements (UNTIL, DO, ITERATE).
  • parseUntilLoop(): Parses an UNTIL (while) loop.
  • parseDoUntilLoop(): Parses a DO-UNTIL (do-while) loop.
  • parseIterateLoop(): Parses an ITERATE (for) loop.
  • parseArrayOpStmt(): Parses array operations (push, remove, display).
  • parseFunctionCall(): Parses a function call statement.
  • parseArguments(): Parses function call arguments.
  • parseSendbackStmt(): Parses a SENDBACK (return) statement.
  • parseFunctionDef(): Parses a function definition.
  • parseArray(): Parses array declarations (1D or 2D).
  • parseArrayElements(): Parses elements of a 1D array.
  • parseArray2dElements(): Parses elements of a 2D array.

Parsing Technique Used

The CODASK parser uses top-down recursive descent parsing. This technique maps each non-terminal in the grammar (like “program” or “statement”) to a function that checks the code against the grammar rules. The “consume” functions help enforce these rules by validating tokens, while parsing functions handle the structure recursively. Starting with the top-level non-terminal (program), the parser processes tokens from the scanner, ensuring the code is valid. This approach is modular, efficient, and well-suited for CODASK’s clear syntax.

Posted on June 1, 2025 | Explore the CODASK Language

Post a Comment

0 Comments