The following is a comprehensive, detailed lecture summary of MIT 6.006: Binary Trees, Part 1.
🌳 Binary Trees: Data Structures & Algorithms
Lecture: 6. Binary Trees, Part 1
Source: MIT OpenCourseWare
Goal: Implement a data structure that supports fast (logarithmic) operations for both Sequences (insert/delete in middle) and Sets (predecessor/successor queries).
- Motivation: The "Best of All Worlds"
Existing data structures have trade-offs. We want a single structure that performs efficiently across the board.
| Data Structure | Indexing get_at(i) | Insert/Delete (Ends) | Insert/Delete (Middle) | Find (Key) | Find Prev/Next |
|---|---|---|---|---|---|
| Array | O(1) | O(n) (shift) | O(n) | O(n) | O(n) |
| Sorted Array | O(1) | O(n) | O(n) | O(\log n) | O(\log n) |
| Linked List | O(n) | O(1) | O(1)* | O(n) | O(n) |
| Hash Table | N/A | O(1) | N/A | O(1) | O(n) |
| Binary Tree | O(\log n)** | O(\log n) | O(\log n) | O(\log n) | O(\log n) |
*If we already have a pointer to the location.
**Requires a balanced tree (Topic of Part 2).
The Goal: Achieve O(h) time for all operations, where h is the height of the tree. In a balanced tree, h = O(\log n).
- Anatomy of a Binary Tree
A Rooted Binary Tree is a set of nodes where each node has pointers to a parent, a left child, and a right child.
Node Structure
class BinaryNode:
def init(self, item):
self.item = item
self.parent = None
self.left = None
self.right = None
Key Definitions
-
Root: The unique node with no parent.
-
Leaf: A node with no children.
-
Subtree(x): The tree consisting of node x and all its descendants.
-
Depth(x): The number of edges in the path from the root down to x.
-
Height(x): The number of edges in the longest path from x down to a leaf.
- Note: The height of the tree is the height of the root.
Visualizing Depth vs. Height
graph TD
A((A)) --> B((B))
A --> C((C))
B --> D((D))
B --> E((E))
D --> F((F))
style A fill:#f9f,stroke:#333,stroke-width:2px
style F fill:#ff9,stroke:#333,stroke-width:2px
%% Annotations
A ---|Depth: 0
Height: 3| A_Label[Root]
F ---|Depth: 3
Height: 0| F_Label[Leaf]
- Traversal Order (In-Order)
The binary tree implicitly maintains a linear ordering of items called the Traversal Order (or In-Order Traversal).
Recursive Definition
For any node x, the order is:
-
All nodes in x.left (recursively)
-
x itself
-
All nodes in x.right (recursively)
Visualization
In the diagram below, the traversal order is: F → D → B → E → A → C
graph TD
A((A)) --> B((B))
A --> C((C))
B --> D((D))
B --> E((E))
D --> F((F))
linkStyle 0,1,2,3,4 stroke-width:2px;
-
Left Subtree Rule: Everything in the left subtree comes before the parent.
-
Right Subtree Rule: Everything in the right subtree comes after the parent.
- Tree Navigation Operations
All operations run in O(h) time.
4.1 Subtree First / Last
Find the first (or last) node in the traversal order within a specific subtree.
- Algorithm: To find the first, go left until you can't. To find the last, go right until you can't.
def subtree_first(node):
if node is None: return None
while node.left is not None:
node = node.left
return node
def subtree_last(node):
if node is None: return None
while node.right is not None:
node = node.right
return node
4.2 Successor / Predecessor
Find the node that immediately follows (or precedes) a given node in the global traversal order.
Logic for Successor(node):
- Case 1: Right child exists.
The successor is the first node in the right subtree.
\to subtree_first(node.right)
- Case 2: No right child.
The successor is an ancestor. Walk up the tree until you travel up a left branch (i.e., you are the left child of your parent). That parent is the successor.
def successor(node):
# Case 1: Right subtree exists
if node.right is not None:
return subtree_first(node.right)
# Case 2: No right subtree, walk up
while node.parent is not None and node is node.parent.right:
node = node.parent
return node.parent # Returns None if node is the last in the tree
- Dynamic Operations (Insert & Delete)
These operations modify the tree structure while maintaining the traversal order.
5.1 Insert After
Insert a new node new immediately after an existing node node in the traversal order.
- Case 1: node has no right child.
Make new the right child of node.
- Case 2: node has a right child.
Find the successor of node. The successor is guaranteed to have no left child (by definition of subtree_first). Make new the left child of the successor.
graph TD
subgraph Case 1: No Right Child
A((A)) -.->|Insert B after A| B((B))
end
subgraph Case 2: Right Child Exists
X((X)) --> Y((Y))
Y --> Z((Z))
Z -.->|Insert New after X| N((New))
note[Z is Successor of X]
end
def insert_after(node, new_node):
if node.right is None:
node.right = new_node
new_node.parent = node
else:
succ = successor(node) # In this case, succ is subtree_first(node.right)
succ.left = new_node
new_node.parent = succ
5.2 Delete
Remove a node from the tree.
-
Case 1: Node is a leaf.
Simply detach it from its parent.
-
Case 2: Node is not a leaf.
We cannot simply delete it (it would disconnect the tree).
- Find the node's predecessor (or successor).
- Swap the items (content) of the node and its predecessor.
- Now the node to delete is in a lower position (the physical node that used to be the predecessor).
- Recursively delete the node from its new position until it becomes a leaf or easily removable.
Why swap with Predecessor?
If a node has a left child, its predecessor is in its left subtree (specifically, subtree_last(left)). Moving the item down preserves the traversal order except for the local swap, which we immediately fix by deleting the swapped position.
def delete(node):
if node.left is None and node.right is None:
if node.parent:
if node.parent.left is node: node.parent.left = None
else: node.parent.right = None
return
Prefer swapping with predecessor if left child exists
if node.left is not None:
pred = predecessor(node)
# Swap items
node.item, pred.item = pred.item, node.item
delete(pred)
else:
succ = successor(node)
node.item, succ.item = succ.item, node.item
delete(succ)
- Applications: Sets vs. Sequences
The binary tree is a container. How we map data to the Traversal Order determines the data structure type.
Set (Binary Search Tree - BST)
- Mapping: Traversal Order = Sorted Order by Key.
- Invariant: For every node x, all keys in the left subtree are \le x.key, and all keys in the right subtree are \ge x.key.
- Find(k): Start at root. If k < root.key, go left. If k > root.key, go right.
- Time: O(h)
Sequence Interface
- Mapping: Traversal Order = Sequence Order (0, 1, 2, \dots, n).
- Structure: The tree structure is purely determined by the history of insertions.
- Get_At(i): Requires "subtree size" augmentation (covered in future lectures) to find the i-th node in O(h). Without augmentation, iterating to index i is O(i).
- Summary & Complexity
All operations discussed today run in time proportional to the height of the tree.
| Operation | Complexity | Description |
|---|---|---|
| subtree_first/last | O(h) | Walk down to leaf. |
| successor/predecessor | O(h) | Walk up/down (worst case root-to-leaf). |
| insert_after/before | O(h) | Uses successor + constant pointer changes. |
| delete | O(h) | Swaps item down to leaf level. |
| find (BST) | O(h) | Standard binary search path. |
The Problem:
In the worst case (e.g., inserting sorted data into a raw BST), the tree becomes a linked list where h = n.
- Next Lecture: We will maintain Balanced Binary Trees (AVL Trees) to guarantee h = O(\log n).