
Learn what is semantic analysis in compiler design. Explore syntax vs semantics, symbol tables, and how semantic analysis in C++ ensures code correctness.
Recap
So far in our compiler journey, we’ve learned how to tokenize source code and parse it into a structured syntax tree. But there’s a catch — just because code is syntactically correct doesn’t mean it’s valid.
For example:
claim x = y + 2;
This is perfectly valid syntax. But wait — what is y? It’s not declared anywhere! This is where semantic analysis comes in — to check whether the code actually makes sense.
In this post, we’ll explore what semantic analysis is, how to implement it, and why it’s a critical part of any compiler.
What Is Semantic Analysis?
Semantic analysis is the phase after parsing that checks whether the code is meaningful. While the parser ensures the code follows the grammar of the language, the semantic analyzer ensures the logic is valid according to the rules of the language.
Syntax vs Semantics
- Syntax = Grammar structure (e.g., claim x = 2 + 3;)
- Semantics = Meaning (e.g., is x already declared? Are types compatible?)
Analogy:
Syntax is like forming a sentence:
“The dog sings.”
Grammatically correct (syntax), but semantically odd — dogs don’t sing!
Why Semantic Analysis Matters
Without semantic analysis:
- You might use variables that were never declared.
- Functions might be called with wrong arguments.
- Type mismatches could slip into runtime.
This phase ensures early error detection, saving developers from confusing bugs later.
Symbol Tables: The Backbone of Semantic Checks
To analyze code meaningfully, we need a way to track all declared identifiers: variables, functions, parameters, etc. This is where symbol tables come in.
What is a Symbol Table?
A symbol table is a map that keeps track of:
- Name (e.g., x)
- Type (e.g., int, text)
- Scope (local, global)
- Extra info (e.g., function params, modifiability)
Think of it like a ledger:
┌────────┬───────────┬─────────┐
│ Name │ Type │ Scope │
├────────┼───────────┼─────────┤
│ x │ int │ local │
│ foo │ function │ global │
└────────┴───────────┴─────────┘
Supporting Scopes
When functions or blocks introduce nested scopes, we use a stack of symbol tables:
- Push a new table when entering a block
- Pop it when exiting
This makes sure variables declared in a function don’t leak into global scope.
Types of Semantic Checks
Let’s walk through the most common semantic checks a compiler performs. We’ll use the Flare language syntax in examples.
1. Undeclared Variables
Trying to use a variable that was never declared is a semantic error.
claim x = y + 2; // y not declared
Fix: Add y before using it.
2. Redeclarations
You shouldn’t declare the same variable twice in the same scope.
claim x = 2;
claim x = 5; // x already declared
3. Type Mismatches
Using incompatible types in operations should be caught.
claim x: text = 5 + "hello"; // Adding int and text
Fix: Use consistent types in expressions.
4. Function Call Errors
When calling a function:
- It must be defined
- Arguments must match parameter count and types
create add(a: int, b: int): int {
drop a + b;
}
claim result = add(5); // Missing second argument
5. Return Statement Validation
Ensure functions return values matching their declared return type.
create greet(): text {
drop 42; // Returning int instead of text
}
6. Control Flow Errors
Some constructs must be used in proper context.
break; // Not inside a loop
Designing a Semantic Analyzer in C++
To perform semantic checks, we build a SemanticAnalyzer class that traverses the AST and verifies all rules.
We’ll use a visitor pattern where each AST node has a corresponding visit() method.
Class Skeleton:
class SemanticAnalyzer {
std::vector<SymbolTable> scopes;
std::vector<std::string> errors;
Type currentFunctionReturnType;
public:
void analyze(std::shared_ptr<ASTNode> root);
void visit(VarDeclNode*);
void visit(FuncDeclNode*);
void visit(FuncCallNode*);
...
};
Each visit() method performs validation and updates symbol tables as needed.
Step-by-Step Semantic Checks
Let’s look at how each kind of node is checked:
1. Scopes
void SemanticAnalyzer::enterScope() {
scopes.push_back(SymbolTable());
}
void SemanticAnalyzer::exitScope() {
scopes.pop_back();
}
2. Variable Declarations
- Check for redeclaration in the current scope
- Add to symbol table
void SemanticAnalyzer::visit(VarDeclNode* node) {
if (currentScope().has(node->name)) {
reportError("Variable '" + node->name + "' already declared.");
}
currentScope().insert(node->name, node->type);
}
3. Expressions and Type Checking
- Recursively resolve type of subexpressions
- Compare types for operations
Type SemanticAnalyzer::visit(BinaryOpNode* node) {
auto lhs = visit(node->left);
auto rhs = visit(node->right);
if (lhs != rhs) {
reportError("Type mismatch in binary expression.");
return Type::Error;
}
return lhs;
}
4. Function Calls
- Check function exists
- Check argument count and types
- Handle modifiable parameters (>>)
5. Return Type Checks
- Save expected return type when visiting function
- Compare actual return value with it
6. Error Handling
We collect all errors in a list:
void SemanticAnalyzer::reportError(const std::string& message) {
errors.push_back(message);
}
And print them at the end:
if (!errors.empty()) {
for (const auto& err : errors)
std::cerr << "Semantic Error: " << err << std::endl;
exit(1);
}
Testing Semantic Analysis
You should write test cases like:
create main() {
claim x = y + 5; // Error: y not declared
claim x = 10;
claim x = 20; // Error: x redeclared
}
The analyzer should print:
Semantic Error: Variable 'y' used before declaration.
Semantic Error: Variable 'x' already declared.
Integration into Compiler Pipeline
Semantic analysis runs after parsing, before code generation.
auto ast = parser.parse();
SemanticAnalyzer sema;
sema.analyze(ast);
if (sema.hasErrors()) {
sema.printErrors();
return 1;
}
This ensures only semantically correct ASTs reach the code generation phase.
Conclusion
Semantic analysis gives our compiler a brain. It goes beyond structure and checks if the program truly makes sense. From symbol tables to return type checks, this phase is vital for producing valid, reliable, and meaningful executables.
By building a robust semantic analyzer, we protect the compiler from bugs and the user from invalid code.
What’s Next?
Now that we have an AST validated for correctness, we’re ready for the final frontier: code generation.
In the next post, we’ll dive into LLVM-based code generation — how to turn an AST into actual machine code that can be executed. We’ll start with generating code for expressions, variables, and functions.
Stay tuned!