My favorite Data Structure: The binary search tree

{"protected":false,"rendered":"\n

Ever wondered how computers organize information? Data structures are the building blocks that store and manage data efficiently.

\n\n\n\n

Like filing cabinets for your computer, they decide how you can find, add, or remove information. Understanding data structures is key for developers, as they all have pros and cons, and choosing the right one can make or break your solution!

\n\n\n\n\n\n\n\n

Simple data structures

\n\n\n\n

One of the simplest data structures is the array. It is simply a list of elements, one after the other, arranged in a given space of memory, each with an index we can use to find an element quickly.

\n\n\n

\n            ┌───┐ ┌───┐ ┌───┐ ┌───┐      \n   VALUES   │ 7 │ │ 2 │ │ 5 │ │ 0 │      \n            └───┘ └───┘ └───┘ └───┘      \n   INDEXES    0     1     2     3        \n

\n\n\n

The diagram above represents an array in computer memory. The boxes are values, and below are the indexes we can use to retrieve any element quickly. In JavaScript, that would be:

\n\n\n

\nconst array = [7, 2, 5, 0];\n\n// retrieving element at position 1\nconst value = array[1];\n

\n\n\n

Complexity (a.k.a “Big O” notation) for data structures

\n\n\n\n

Data structures can give your program a filing system, but how quickly can you find that data? This is where Big O notation comes in. It’s a mathematical way to describe how the execution time (or sometimes memory usage) of an algorithm grows as the input size increases.

\n\n\n\n

Let’s use an analogy to understand what all of that means. Imagine you’re on a search and rescue mission for a lost tourist in a national park. The efficiency of your search strategy determines how long it takes to find the tourist as the search area expands. This is similar to how Big O notation analyzes algorithms.

\n\n\n\n

Big O notation uses terms like O(1) for constant time, O(n) for linear time (proportional to input size), and O(n^2) for quadratic time (grows faster than the input size).

\n\n\n\n

Constant Time (O(1)) – This would be like having a magical GPS tracker that instantly locates the tourist, regardless of the park’s size (similar to the way we fetch elements from an array, given an index).
Linear Time (O(n)) – You send out a search party that methodically combs through the park, section by section. The larger the park (more input data), the longer it takes to find the tourist (linear growth in execution time). This is common for algorithms that iterate through a list once.
Quadratic Time (O(n^2)) – With no map or plan, each searcher wanders randomly until they bump into another searcher, then they pair up and continue searching independently. As the park gets bigger (more data), the number of these random encounters grows much faster (quadratic growth in execution time) compared to a linear search. This can happen with nested loops.

\n\n\n\n

By understanding the Big O of a search strategy (or any algorithm), you can choose the most efficient approach depending on the size of the search area (input data). You can also predict how well it will scale for larger datasets. This is crucial for choosing efficient algorithms (data structures) and avoiding code that crawls to a halt when faced with big data.

\n\n\n\n

In a perfect world, all data structures would have O(1) for finding, inserting and deleting elements. But unfortunately, that’s not the case!

\n\n\n\n

Now it might be more apparent why understanding data structures is important, and why choosing the correct data structure can make or break your program.

\n\n\n\n

An example: Finding an element in an array

\n\n\n\n

Let’s assume we have millions of Person records, and as an application requirement, we need to be able to find a person by name.

\n\n\n\n

A simplistic implementation using an array could look as such:

\n\n\n

\nconst people = [\n  { name: 'Mary', age: 35 },\n  { name: 'John', age: 28 },\n  // ... and millions more rows below\n]\n

\n\n\n

Then, to search for people by name, we would have to do something like this:

\n\n\n

\nconst found = []\n\nfor (let i = 0; i < people.length; i++) {\n  if (people[i].name === 'Mary') {\n    found.push(people[i])\n  }\n}\n

\n\n\n

Now that solution would be O(n), because if we have 1 million people (n = 1000000) then we would need to do work one million times.

\n\n\n\n

Can we find a person by name in a more efficient way?

\n\n\n\n

Enter the binary tree

\n\n\n\n

In computer science, a tree is a structure made up of elements — called nodes — which can have other elements as children.

\n\n\n

\n         ┌───┐             \n         │ 7 │             \n         └─┬─┘             \n           │               \n           │               \n     ┌─────┴──────┐        \n     │            │        \n     │            │        \n   ┌─▼─┐        ┌─▼─┐      \n   │ 3 │        │ 1 │      \n   └───┘        └─┬─┘      \n                  │        \n                  │        \n                ┌─▼─┐      \n                │ 9 │      \n                └───┘      \n

\n\n\n

In the example above, the root of the tree is the node 7, that node then has
two children, 3 and 1. And then 1 has only one child: 9.

\n\n\n\n

Let’s create a tree data structure:

\n\n\n

\nclass Node<T> {\n  value: T;\n  children: Node<T>[];\n\n  constructor(value: T) {\n    this.value = value;\n    this.children = [];\n  }\n\n  add(value: Node<T>): void {\n    this.children.push(value);\n  }\n}\n

\n\n\n

Above is a Node class in TypeScript. A node has a value attribute, as well as a children attribute, which is an array of nodes.

\n\n\n\n

If we wanted to recreate the tree from the diagram, that would look like this:

\n\n\n

\nconst root = new Node<number>(7);\nconst three = new Node<number>(3);\nconst one = new Node<number>(1);\nconst nine = new Node<number>(9);\n\nroot.add(three);\nroot.add(one);\none.add(nine);\n

\n\n\n

Now, let’s consider how this can help us find a person faster than in an array (O(n)).

\n\n\n\n

Binary search

\n\n\n\n

A binary tree is a type of tree that can only have two children, one on the left, and the other one on the right.

\n\n\n\n

If we follow a few rules to organize the nodes, we can use a binary search algorithm to quickly look up values.

\n\n\n\n

Those rules will be:

\n\n\n\n

If a children is smaller than its parent, it must be the left child
Otherwise, it must be the right child

\n\n\n\n

Let’s take a look at an example:

\n\n\n

\n         ┌───┐             \n         │ 7 │             \n         └─┬─┘             \n           │               \n           │               \n     ┌─────┴──────┐        \n     │            │        \n     │            │        \n   ┌─▼─┐        ┌─▼─┐      \n   │ 1 │        │ 9 │      \n   └───┘        └───┘      \n

\n\n\n

In this example the root is 7, because 1 is smaller than 7, it goes to the left. Also, because 9 is greater than 7, it goes to the right.

\n\n\n\n

A bigger tree could look like this:

\n\n\n

\n                 ┌───┐             \n                 │ 7 │             \n                 └─┬─┘             \n                   │               \n                   │               \n         ┌─────────┴─────────┐     \n         │                   │     \n         │                   │     \n       ┌─▼─┐               ┌─▼─┐   \n       │ 3 │               │ 9 │   \n       └─┬─┘               └─┬─┘   \n         │                   │     \n         │                   │     \n   ┌─────┴──────┐      ┌─────┘     \n   │            │      │           \n   │            │      │           \n ┌─▼─┐        ┌─▼─┐  ┌─▼─┐         \n │ 1 │        │ 5 │  │ 8 │         \n └───┘        └───┘  └───┘         \n

\n\n\n

In the example above, 7 has two children. 3 is on the left because is smaller. At the same time, 3 has two children as well, 1 is on the left because is smaller, and 5 on the right, as it’s bigger.

\n\n\n\n

Here’s something very important: All nodes on the left of 7 must be smaller than 7. In this case, 3, 1 and 5 are all smaller than 7, so it’s valid for our needs. The same applies for the right side of the 7, all values must be bigger than 7, independent of where they are in particular. Also, this rule applies recursively down the tree.

\n\n\n\n

Notice that the 9 doesn’t have a right child. That’s fine, not all nodes must have children, and they can also only have one.

\n\n\n\n

Now, let’s say we want to find the number 5. How can we go about it? An algorithm could be:

\n\n\n\n

Start at the root of the tree
Is it the number we want?
If so, return it
Otherwise\n
1. Is the value we are looking for smaller than the value of the current node?
2. If so, repeat the process for the left node if it exist, otherwise,
  return null
3. Otherwise, repeat the process of the right node if it exist, otherwise,
  return null
\n

\n\n\n\n

For the diagram above, this algorithm will first see the 7. Because the value we are looking for (5) is not 7, it will continue to search. Also, because 5 is smaller than 7, it will continue by searching the left node, which is 3. Now, because 5 is not 3, it will continue to search. It will use the right child of 3 because 5 is greater than 3. It finally arrives at the node with the value 5, and because both the node value and the value we are looking for match, it returns it!

\n\n\n\n

Notice that if our tree contains 1 million nodes, the search will not actually have to look through 1 million nodes, it will skip most of them. Actually, the algorithmic complexity for that operation is O(log(n)), meaning it scales beautifully with the input size, much faster than O(n)!

\n\n\n\n

How does this look in code? Let’s take a look.

\n\n\n\n

Binary Search Tree: Implementation

\n\n\n\n

First of all, we need to update our tree to be a binary tree and not just a generic tree that can take any number of children:

\n\n\n

\nclass Node<T> {\n  value: T;\n  left?: Node<T>;\n  right?: Node<T>;\n\n  constructor(value: T) {\n    this.value = value;\n  }\n\n  // ...\n}\n

\n\n\n

Now we need to implement the add method. To implement the add method we’ll need a way to compare values, because if a new value is greater than the node’s value, we want to add it as right child, otherwise, we want to add it as left child.

\n\n\n\n

To do that, we’ll make our values implement a compareTo method that will return:

\n\n\n\n

0 if the values are equal
A number greater than 0 if the other value is greater
A number smaller than 0 if the other value is smaller

\n\n\n\n

We can do that by defining an interface:

\n\n\n

\ninterface Comparable<T> {\n  compareTo(other: T): number;\n}\n

\n\n\n

And now implementing an actual Value Object:

\n\n\n

\nclass NumberValue implements Comparable<NumberValue> {\n  value: number;\n  constructor(value: number) {\n    this.value = value;\n  }\n\n  compareTo(other: NumberValue): number {\n    return other.value - this.value;\n  }\n}\n

\n\n\n

We’ll start with just numeric values to keep things simple, and then we’ll update it to handle actual Person objects.

\n\n\n\n

The main reason we need this object is because of the compareTo method. We can use that method to compare two value objects:

\n\n\n

\nconst a = new NumberValue(5)\nconst b = new NumberValue(7)\nconsole.log(a.compareTo(b)) // 2 (greater than 0), because 7 is greater than 5\n

\n\n\n

This way to compare might seem like an overkill, but I promise it will make sense!

\n\n\n\n

Now that we have our value objects and we can compare them with each other, we can use them in our Node:

\n\n\n

\nclass Node<T extends Comparable<T>> {\n    // ...\n}\n

\n\n\n

We enforce our Nodes to be of a comparable type, so we have access to compareTo. Now we can implement add:

\n\n\n

\nadd(newValue: T): void {\n  const comparison = this.value.compareTo(newValue);\n  if (comparison === 0) throw new Error(`Value ${newValue} already exists`);\n\n  // new value is smaller, so, add on the left\n  if (comparison < 0) {\n    if (this.left === undefined) {\n      // no left child, so create a new one\n      this.left = new Node(newValue);\n    } else {\n      // there is a left child, add the new value to that node instead\n      this.left.add(newValue);\n    }\n  } else {\n    // new value is greater, so, add on the right\n    if (this.right === undefined) {\n      // no right child, so create a new one\n      this.right = new Node(newValue);\n    } else {\n      // there is a right child, add the new value to that node instead\n      this.right.add(newValue);\n    }\n  }\n}\n

\n\n\n

We can now easily build a binary search tree:

\n\n\n

\nconst tree = new Node(new NumberValue(7));\ntree.add(new NumberValue(3));\ntree.add(new NumberValue(9));\ntree.add(new NumberValue(1));\n

\n\n\n

Notice that, while this is not bad performance (O(log(n))), it stills needs to navigate through a few nodes every time we add something. Unlike arrays, where adding something is of constant time (O(1)). So, we can already see nothing comes for free in data structures!

\n\n\n\n

The find method

\n\n\n\n

The find method is pretty similar to the add method:

\n\n\n

\nfind(needle: T): T | null {\n  const comparison = this.value.compareTo(needle);\n\n  if (comparison === 0) return this.value;\n  if (this.left !== undefined && comparison < 0) return this.left.find(needle);\n  if (this.right !== undefined && comparison > 0) return this.right.find(needle);\n\n  return null;\n}\n

\n\n\n

The algorithm will use compareTo and simply return the current value when the comparison matches, search on the left side when it’s a smaller value, and on the right side when it’s a greater value.

\n\n\n\n

And that’s it! We can search for a value in a pretty efficient way! Our example is quite small, but the bigger the tree gets, the more we’ll benefit from not having to traverse the whole tree.

\n\n\n

\nconst tree = new Node(new NumberValue(7));\ntree.add(new NumberValue(3));\ntree.add(new NumberValue(9));\ntree.add(new NumberValue(1));\n\ntree.find(new NumberValue(7)) // 7\ntree.find(new NumberValue(5)) // null\n

\n\n\n

I remember the first time I saw this I was blown away, I remember thinking something like: “Whoa, this is so cool. I could have never come up with something like this myself!”.

\n\n\n\n

And that also helped me understand that data structures are all about how you organize your data, and how you can quickly insert, retrieve or remove elements.

\n\n\n\n

As we’ve seen so far, for binary search trees, adding elements is slower than adding them to an array (O(log(n) vs O(1)) but finding elements is much faster (O(log(n) vs O(n)).

\n\n\n\n

Also, the tree uses more memory, because it needs to keep a lot objects in memory, while an array can just allocate a contiguous space of memory and just store values efficiently one after the other.

\n\n\n\n

Back to business: Finding people by name

\n\n\n\n

I promised you a way to find people by name and that’s what I’ll give you!

\n\n\n\n

We start by creating a Person object and implementing Comparable, as that’s required by our tree:

\n\n\n

\nclass Person implements Comparable<Person> {\n  name: string;\n  age: number;\n  city: string;\n\n  constructor(name: string, age: number, city: string) {\n    this.name = name;\n    this.age = age;\n    this.city = city;\n  }\n\n  compareTo(other: Person): number {\n    return this.name.localeCompare(other.name);\n  }\n}\n

\n\n\n

We use localeCompare to return either -1, 0 or 1 depending if one string comes before, after, or is the same as the order in a lexicographical (alphabetical, but including more characters) order.

\n\n\n\n

…and that’s it! We can use it like so:

\n\n\n

\n// some random people\nconst alice = new Person('Alice', 32, 'San Francisco');\nconst bob = new Person('Bob', 29, 'Frankfurt');\nconst juan = new Person('Juan', 41, 'Buenos Aires');\n\n// add them all to the tree, it doesn't really matter who goes first\nconst tree = new Node(alice);\ntree.add(bob);\ntree.add(juan);\n\n// we can now find a person by name in O(log(n)) time!\nconst found = tree.find(alice);\nconsole.log(found.name); // Alice\nconsole.log(found.age);  // 32\nconsole.log(found.city); // San Francisco\n

\n\n\n

Notice that because we followed good Object Oriented practices, our code is open for extension, but closed for modification, known as the Open-closed Principle.

\n\n\n\n

We can add new functionality without changing existing code, as we did here. We simply created a new Person class, then we didn’t need to “re-open” the Node class to implement our changes.

\n\n\n\n

Make the (hard) change easy, then make the easy change.
— Kent Beck

\n\n\n\n

Conclusion

\n\n\n\n

That was a lot! But hopefully it was clear enough to explain why not all algorithms are the same, and why it’s important to know at least the basics of your data structures. It’s not necessary to know everything about how they work under the hood, but just knowing how efficient their operations are and what problem they are supposed to solve is enough.

\n\n\n\n

If you are curious, feel free to check out the full source code at GitHub!

\n\n\n\n

Further Study

\n\n\n\n

If you like this topic, there are lots of things you can investigate, such as how to remove a node from a binary search tree, or other interesting data structures, from generic ones like Dictionaries (or Hash Tables) to very specific ones like Ropes.

\n\n\n\n

For a complete course on data structures I recommend Data Structures and Algorithms in Java by Robert Lafore. It’s an encyclopedic kind of book (~800 pages), but it does a very good job going from the most simple data structures to the most complex, and it includes a lot of them! Also it’s not like you have to know all of them, I find the basic ones to be the most important to know, such as Queues, Stacks, Linked Lists, and Trees. Don’t let the fact that it’s in Java stop you if you are interested. Java is fairly easy to read and you can practice by implementing the data structures in the language of your choice.

\n"}

My favorite Data Structure: The binary search tree

Simple data structures

Complexity (a.k.a “Big O” notation) for data structures

An example: Finding an element in an array

Enter the binary tree

Binary search

Binary Search Tree: Implementation

The find method

Back to business: Finding people by name

Conclusion

Further Study

Want updates on our latest blog posts? Subscribe to our newsletter!

Want updates on our latest blog posts?
Subscribe to our newsletter!