Technical Requirements for C++ Database System Developer
- SQL parser and lexer
- Query analyzer and semantic checker
- Query optimizer (cost-based, rule-based)
- Query execution engine
- B+ tree and LSM-tree implementations
- Buffer pool management
- Page management and eviction strategies
- Write-Ahead Logging (WAL)
- Scan Operator / 扫描算子: Full table scan, index scan
- Aggregation Operator / 聚集算子: GROUP BY, DISTINCT, aggregate functions
- Join Operator: Nested loop, hash join, merge join
- Sort Operator: External sorting for large datasets
- Predicate Pushdown / 谓词下推: Push filters down to reduce data movement
- Constant Folding / 常量折叠: Evaluate constant expressions at compile time
- Join Reordering: Find optimal join order
- Index Selection: Choose best index for query
- Strong Two-Phase Locking / 强两阶段锁: Locks held until transaction commits
- Strict Two-Phase Locking / 严格两阶段锁: Write locks held until transaction ends
- Snapshot Isolation / 快照隔离: Read consistent snapshot of data
- Version chain management
- Garbage collection of old versions
- Undo Log: For rollback and MVCC
- Redo Log: For crash recovery
- Batch Write / 批量写入: Group commit for performance
- Checkpointing
- ARIES recovery algorithm
- Point-in-time recovery
- ZSTD: Facebook's Zstandard compression
- Snappy: Google's fast compression
- Dictionary compression
- Page-level compression
- Logical Replication: Row-based or statement-based
- Physical Replication: Block-level streaming replication
- Synchronous vs asynchronous replication
- Multi-master replication
索引是数据库中对表的字段进行排序的一种数据结构。
- B+ Tree: Default balanced tree, efficient for range queries
- Hash Index: O(1) lookup, not suitable for range queries / 哈希表不利于范围查找
- AVL Tree: Self-balancing binary search tree
- Red-Black Tree: Balanced BST, performance degrades with large data / 红黑树在数据量大的时候性能会下降
- Multiple column indexing
- Leftmost prefix matching
- Index intersection
- Modern C++ (C++11/14/17/20)
- Template metaprogramming for type-safe operators
- Lock-free data structures for high concurrency
- Memory pools for buffer management
- Zero-copy I/O techniques
- Cache-friendly data structures
- SIMD vectorization for data processing
- NUMA-aware memory allocation
- Async I/O (io_uring)
- RocksDB/LevelDB for storage engine reference
- PostgreSQL/MySQL internals study
- Jemalloc/Tcmalloc for memory allocation
- Protocol Buffers for internal communication