What is Semantic Analysis in Compiler: Symbol Table, Syntax vs Semantics

What is Semantic Analysis in Compiler: Symbol Table, Syntax vs Semantics, Semantic Analysis in C++
What is Semantic Analysis in Compiler: Symbol Table, Syntax vs Semantics, Semantic Analysis in C++

Learn what is semantic analysis in compiler design. Explore syntax vs semantics, symbol tables, and how semantic analysis in C++ ensures code correctness.

Recap

So far in our compiler journey, we’ve learned how to tokenize source code and parse it into a structured syntax tree. But there’s a catch — just because code is syntactically correct doesn’t mean it’s valid.

For example:

claim x = y + 2;

This is perfectly valid syntax. But wait — what is y? It’s not declared anywhere! This is where semantic analysis comes in — to check whether the code actually makes sense.

In this post, we’ll explore what semantic analysis is, how to implement it, and why it’s a critical part of any compiler.

 

What Is Semantic Analysis?

Semantic analysis is the phase after parsing that checks whether the code is meaningful. While the parser ensures the code follows the grammar of the language, the semantic analyzer ensures the logic is valid according to the rules of the language.

Syntax vs Semantics

  • Syntax = Grammar structure (e.g., claim x = 2 + 3;)
  • Semantics = Meaning (e.g., is x already declared? Are types compatible?)

Analogy:
Syntax is like forming a sentence:
“The dog sings.”

Grammatically correct (syntax), but semantically odd — dogs don’t sing!

 

Why Semantic Analysis Matters

Without semantic analysis:

  • You might use variables that were never declared.
  • Functions might be called with wrong arguments.
  • Type mismatches could slip into runtime.

This phase ensures early error detection, saving developers from confusing bugs later.

 

Symbol Tables: The Backbone of Semantic Checks

To analyze code meaningfully, we need a way to track all declared identifiers: variables, functions, parameters, etc. This is where symbol tables come in.

What is a Symbol Table?

A symbol table is a map that keeps track of:

  • Name (e.g., x)
  • Type (e.g., int, text)
  • Scope (local, global)
  • Extra info (e.g., function params, modifiability)

Think of it like a ledger:

┌────────┬───────────┬─────────┐
│  Name  │   Type    │  Scope  │
├────────┼───────────┼─────────┤
│   x    │   int     │  local  │
│   foo  │ function  │ global  │
└────────┴───────────┴─────────┘


Supporting Scopes

When functions or blocks introduce nested scopes, we use a stack of symbol tables:

  • Push a new table when entering a block
  • Pop it when exiting

This makes sure variables declared in a function don’t leak into global scope.

 

Types of Semantic Checks

Let’s walk through the most common semantic checks a compiler performs. We’ll use the Flare language syntax in examples.

1. Undeclared Variables

Trying to use a variable that was never declared is a semantic error.

claim x = y + 2;  // y not declared

Fix: Add y before using it.

 

2. Redeclarations

You shouldn’t declare the same variable twice in the same scope.

claim x = 2;
claim x = 5;  // x already declared


3. Type Mismatches

Using incompatible types in operations should be caught.

claim x: text = 5 + "hello"; //  Adding int and text

Fix: Use consistent types in expressions.


4. Function Call Errors

When calling a function:

  • It must be defined
  • Arguments must match parameter count and types
create add(a: int, b: int): int {
    drop a + b;
}
claim result = add(5);  //  Missing second argument


5. Return Statement Validation

Ensure functions return values matching their declared return type.

create greet(): text {
    drop 42; // Returning int instead of text
}


6. Control Flow Errors

Some constructs must be used in proper context.

break; //  Not inside a loop

Designing a Semantic Analyzer in C++

To perform semantic checks, we build a SemanticAnalyzer class that traverses the AST and verifies all rules.

We’ll use a visitor pattern where each AST node has a corresponding visit() method.

Class Skeleton:

class SemanticAnalyzer {
    std::vector<SymbolTable> scopes;
    std::vector<std::string> errors;
    Type currentFunctionReturnType;
public:
    void analyze(std::shared_ptr<ASTNode> root);
    void visit(VarDeclNode*);
    void visit(FuncDeclNode*);
    void visit(FuncCallNode*);
    ...
};

Each visit() method performs validation and updates symbol tables as needed.

 

Step-by-Step Semantic Checks

Let’s look at how each kind of node is checked:

1. Scopes

void SemanticAnalyzer::enterScope() {
    scopes.push_back(SymbolTable());
}
void SemanticAnalyzer::exitScope() {
    scopes.pop_back();
}


2. Variable Declarations

  • Check for redeclaration in the current scope
  • Add to symbol table
void SemanticAnalyzer::visit(VarDeclNode* node) {
    if (currentScope().has(node->name)) {
        reportError("Variable '" + node->name + "' already declared.");
    }
    currentScope().insert(node->name, node->type);
}


3. Expressions and Type Checking

  • Recursively resolve type of subexpressions
  • Compare types for operations
Type SemanticAnalyzer::visit(BinaryOpNode* node) {
    auto lhs = visit(node->left);
    auto rhs = visit(node->right);
    if (lhs != rhs) {
        reportError("Type mismatch in binary expression.");
        return Type::Error;
    }
    return lhs;
}


4. Function Calls

  • Check function exists
  • Check argument count and types
  • Handle modifiable parameters (>>)

 

5. Return Type Checks

  • Save expected return type when visiting function
  • Compare actual return value with it


6. Error Handling

We collect all errors in a list:

void SemanticAnalyzer::reportError(const std::string& message) {
    errors.push_back(message);
}

And print them at the end:

if (!errors.empty()) {
    for (const auto& err : errors)
        std::cerr << "Semantic Error: " << err << std::endl;
    exit(1);
}


Testing Semantic Analysis

You should write test cases like:

create main() {
    claim x = y + 5;  // Error: y not declared
    claim x = 10;
    claim x = 20;     // Error: x redeclared
}

The analyzer should print:

Semantic Error: Variable 'y' used before declaration.
Semantic Error: Variable 'x' already declared.


Integration into Compiler Pipeline

Semantic analysis runs after parsing, before code generation.

auto ast = parser.parse();
SemanticAnalyzer sema;
sema.analyze(ast);
if (sema.hasErrors()) {
    sema.printErrors();
    return 1;
}

This ensures only semantically correct ASTs reach the code generation phase.


Conclusion

Semantic analysis gives our compiler a brain. It goes beyond structure and checks if the program truly makes sense. From symbol tables to return type checks, this phase is vital for producing valid, reliable, and meaningful executables.

By building a robust semantic analyzer, we protect the compiler from bugs and the user from invalid code.


What’s Next?

Now that we have an AST validated for correctness, we’re ready for the final frontier: code generation.

In the next post, we’ll dive into LLVM-based code generation — how to turn an AST into actual machine code that can be executed. We’ll start with generating code for expressions, variables, and functions.

Stay tuned!