Design a JSON Parser
What is the JSON Parser Problem?
Section titled “What is the JSON Parser Problem?”Design and implement a JSON parser that can parse JSON strings into corresponding data structures. The parser should handle nested objects and arrays, support all JSON data types (strings, numbers, booleans, null, objects, arrays), validate JSON format during parsing, and provide meaningful error messages with position information for invalid JSON.
In this problem, you’ll build a system that takes a raw string and converts it into a structured, language-native representation while ensuring strict grammar adherence.
Problem Overview
Section titled “Problem Overview”Design and implement a JSON parser that converts strings into usable data structures while validating syntax and providing clear error feedback.
Core Requirements
Section titled “Core Requirements”Functional Requirements:
- Data Type Support: Parse strings, numbers (integers/decimals), booleans, null, objects, and arrays.
- Deep Nesting: Handle objects and arrays nested to arbitrary depth without stack overflow.
- Strict Validation: Detect syntax errors and throw exceptions with meaningful messages.
- Precise Error Tracking: Report the exact line and column where a syntax error occurred.
- Encoding & Escapes: Correctly handle Unicode characters and escape sequences (e.g.,
\n,\t,\uXXXX). - Whitespace Management: Ignore spaces, tabs, and newlines between JSON elements.
- Empty Collections: Support parsing empty objects
{}and empty arrays[].
Non-Functional Requirements:
- Clean Architecture: Distinct separation between Lexical analysis (Tokenizer) and Syntactic analysis (Parser).
- Parsing Strategy: Use a Recursive Descent approach to handle JSON’s hierarchical nature.
- Efficiency: Single-pass parsing with optimized memory usage for large strings.
- Extensibility: Easy to add support for custom JSON extensions or output mappings.
- Maintainability: Clear code structure with robust error handling for aid in debugging.
- Robustness: Handle edge cases like empty objects
{}and arrays[].
What’s Expected?
Section titled “What’s Expected?”1. System Architecture
Section titled “1. System Architecture”The parser is divided into three primary layers: the Tokenizer, the Parser, and the Value Model.
2. Key Classes to Design
Section titled “2. Key Classes to Design”classDiagram
class JsonValue {
<<interface>>
+getType()
+asObject()
+asArray()
}
class JsonObject {
-Map members
+get(key)
+add(key, value)
}
class JsonArray {
-List elements
+get(index)
+add(value)
}
class Tokenizer {
-String input
-int pos
+nextToken() Token
}
class JsonParser {
-Tokenizer tokenizer
+parse(json) JsonValue
-parseObject() JsonObject
-parseArray() JsonArray
}
JsonValue <|-- JsonObject
JsonValue <|-- JsonArray
JsonValue <|-- JsonPrimitive
JsonParser --> Tokenizer
JsonParser --> JsonValue
System Flow
Section titled “System Flow”Parsing Flow
Section titled “Parsing Flow”Key Design Challenges
Section titled “Key Design Challenges”1. Handling Recursion
Section titled “1. Handling Recursion”JSON is inherently recursive. An object can contain an array, which contains an object, and so on.
Solution: Implement Recursive Descent Parsing. The parseValue() method can call parseObject() or parseArray(), which in turn call parseValue() for their children. This mapping of the JSON grammar to function calls is elegant and robust.
2. Efficient Tokenization
Section titled “2. Efficient Tokenization”Reading the entire input into a list of tokens can be memory-intensive for huge files.
Solution: Use a Lazy Tokenizer (Lexer). Instead of pre-tokenizing, the parser asks the tokenizer for the nextToken() only when needed. This keeps memory usage low and allows for early exits on syntax errors.
3. Error Reporting
Section titled “3. Error Reporting”Simply saying “Syntax Error” is frustrating for users.
Solution: Maintain line and column counters in the Tokenizer. When the Parser encounters an unexpected token, it can throw a ParseException that includes these coordinates, making it easy to find the bug in the JSON input.
What You’ll Learn
Section titled “What You’ll Learn”By solving this problem, you’ll master:
- ✅ Lexical Analysis - Building a state machine to scan tokens.
- ✅ Recursive Descent - Converting grammar rules into code.
- ✅ Composite Pattern - Managing hierarchical tree structures.
- ✅ Error Handling - Implementing precise diagnostic systems.
- ✅ String Manipulation - Handling escapes, Unicode, and whitespace.
View Complete Solution & Practice
Section titled “View Complete Solution & Practice”Ready to see the full implementation? Open the interactive playground to access:
- 🎯 Step-by-step guidance through the 8-step LLD approach
- 📊 Interactive UML builder to visualize your design
- 💻 Complete Code Solutions in Python, Java, C++, TypeScript, JavaScript, C#
- 🤖 AI-powered review of your design and code
Related Problems
Section titled “Related Problems”After mastering JSON Parser, explore more LLD problems in our practice playground.