🎁 Give the #1 gift request of 2024... a ZTM membership gift card! 🎁

[Guide] Computer Science For Beginners

Yihua Zhang
Yihua Zhang
hero image

While most tech companies these days don’t require a computer science degree, the fact is that many developers do have one.

And so if you’re a Web Developer (or an aspiring Developer) who doesn’t come from a 'traditional' computer science background, not having a CS degree can be a source of imposter syndrome.

AWKWARD

Coming from the fitness industry, this was exactly how I felt.

However, after having worked as a professional software engineer now for almost a decade, I’ve since filled this gap myself with great resources online and have even had job offers from multiple FAANG companies.

I'm not saying this to brag at all but simply to let you know that you don't need a CS degree and you can come from any background.

And even better news for you...

  1. The resources that exist online today are way better than what was available ~10 years ago when I was in your shoes
  2. More and more developers are being hired at companies of all sizes without computer science degrees
  3. Rather than you having to scour the internet to find the best resources to get on a level playing field with people that do have a CS degree, I figured I would put it all in one place for you (it's what I wish existed)!

And so in this post, we'll explore the key topics covered in a Computer Science Degree and how relevant they are to your job as a typical Web Developer. I’ll also include all the best resources to learn each topic.

So, let's get started!

Note: In this article and throughout my courses, I teach everything in JavaScript since it’s the most commonly used language for Web Developers.

The topics covered here are still relevant for all programming languages, but the syntax of the code you see will be JavaScript!

Also, this post is a little complex, but stick with it. It may seem alien at first, but will become more clear as you read through.

What is Computer Science?

In simple terms, Computer Science is the study of computers and how they work.

Everything from the hardware and software, to understanding the theory behind the algorithmic processes, computation, and automation.

working with computers

Do you need to know this CS theory content to become a coder?

No. You can quite easily get a job as a beginner programmer or developer without a CS degree, or even being completely aware of all the concepts covered.

The reason being is that a CS degree or course is concerned with the theory behind how computers work and scale. While programming is the application of those principles.

It's only once you start to work in more complex scenarios, scalable projects, or applying to high-end companies where this information becomes vital to succeed.

For example

Imagine you want to build a small wall in your garden. This would be relatively simple right? You understand bricks and concrete, and how they work together and you can easily build that wall.

However, if you wanted to build a tower block, you would need a deeper understanding:

  • How many bricks would this need?
  • Is there a max weight that can be applied before the bricks crumble?
  • Are there other materials you can use?
  • How can you ensure it wont fail in worst case scenarios?
  • What other safeguards or processes need to be applied?
  • How would you even start designing something like this?

This is when you would need to apply that deeper theory and understanding.

poor construction

Likewise, imagine you're building a website for a small business.

You can set up a framework, build it, set up the hosting and shopping cart etc, which are all technical programming skills.

The principles to build Netflix would be similar, but the volume of traffic and users would be astronomically different.

scaling

Sure, more servers could help manage the larger volume of traffic, but it's only part of the solution, and that's where understanding the concepts and ideas in a CS degree would be required.

It's also partly where some of the imposter syndrome kicks in. Suddenly there's new problems that you not only didn't plan for, but you were not aware they were an issue - simply because they didn't come up during your previous experience.

The good news of course, is that we're going to cover them now 😀.

What's covered in a CS Degree?: The mile high overview

In a computer science degree, the main topics you’ll learn are:

  1. Computational Complexity
  2. Data Structures
  3. Algorithms
  4. System Design, and
  5. Databases

Now, each topic is worth its own blog post but I’ll give an overview of each and what you can expect to learn, while providing some great resources to start learning each of them.

We’ll start with computational complexity, data structures, and algorithms since these topics are the most useful to your day to day work as a developer.

Not only that, but if you’re interviewing for software engineering roles at large tech companies aka FAANG (Facebook aka Meta, Apple, Amazon, Google, Netflix), you will need to know these three topics deeply.

Computational complexity

Computational complexity refers to the amount of time and resources (memory space) required by a computer algorithm to solve a problem.

It's important to understand that some problems are inherently more difficult to solve than others, and as a result, require more time and resources. The lower the requirements, the better.

An algorithm is a set of instructions for solving a problem or completing a task, but we’ll dive deeper into algorithms later. What's important to know about algorithms, is that developers write them everyday.

In fact, every function, no matter how simple or complex can be considered an algorithm.

algorithms

As developers, we have to learn to write code that is as fast and efficient as possible, and knowing computational complexity will drastically help us understand how performant our code is, and recognize the different places where it can be improved.

Speaking of which...

Big O notation

Big O notation is what we use to describe how much time an algorithm takes to run or how much space (memory) the algorithm requires.

It expresses the upper bound of the growth rate of an algorithm's running time or space usage.

The performance is also relative to the size of the input passed into the algorithm. Typically, the larger the input, the more time and resources the algorithm takes.

It’s important to note here that the units for time and space are not tangible measurements. This means that we’re not measuring the time in milliseconds, seconds, minutes, etc, or the space in bytes, kilobytes or megabytes. Instead, it measures the rate of increase in resource consumption (time or space) relative to the size of the input.

Also, we use the term operations to denote the units of time or memory consumed.

Still with me so far? Here's where it gets a little more difficult to understand, but stick with it.

My fellow ZTM instructor (who has an incredible data structures and algorithms course) also put out a free Big O tutorial that you can check out but I'll also walk you through some of the key concepts below.

Big O complexity always comes in the form of O followed by brackets that contain the category of complexity, such as O(1), O(log n) or O(n log n).

Let’s take a look at the following chart to understand how these categories compare against each other:

Big O Complexity

Here, we can see that the larger the input (i.e. more Elements on the x-axis) the more resources the algorithm consumes (i.e. the number of operations on the y-axis).

Each line represents a different category of performance, and the order of these categories from best to worst is:

  1. O(1)
  2. O(log n)
  3. O(n)
  4. O(n log n)
  5. O(n2)
  6. O(2n)
  7. O(n!)

Let’s quickly run through what each category means first, then later when we talk about algorithms we’ll learn about how to determine what category an algorithm will fall into.

I’m going to talk about them from the perspective of time but the same explanation can apply for memory.

1) O(1) - Constant complexity

Constant time is the best category to have. This means that regardless of the input size (n), whether it’s zero elements, one element or one billion elements, the algorithm will take the same amount of time to run.

2) O(log n) - Logarithmic complexity

Logarithmic time is the next best time to have because it means the time taken barely increases while the input increases.

In Big O, whenever we see log we mean log base 2 or log2 meaning O(log n) is actually O(log2n).

If you’re unfamiliar with logarithms, logarithms help us solve the inverse of exponents.

The logarithm of a number (n) is the exponent to which the base number (2), must be raised to produce it (n). So 24 = 16, this means that log216 = 4.

We’ll see a real life example in an algorithm called binary search later when we look at algorithms.

Looking at three input sizes of 16, 1000, and 10000000, we see:

log2 16 = 4

log2 1000 = 9.96

log2 10000000 = 23.25

Even though our number of operations grows with the increasing inputs, it’s still significantly lower than the size of the input.

The input grows from 1000 to 10,000,000 but our operations consumption only grows from 9.96 to 23.25.

3) O(n) - Linear complexity

Linear time means that the amount of resources consumed by the algorithm grows at the same rate that the input grows (for simplicity sake 1:1).

If the input is 1 element, it takes 1 operation; if the input is 100 elements, it takes 100 operations; if the input is 1000000 elements, it takes 1000000 operations.

While this sounds bad, it’s actually still pretty good as our line O(n) is still in the green zone in the chart above. This is because the next categories are significantly worse in their growth.

4) O(n log n) - Loglinear complexity

Loglinear time is worse than linear time, and it’s a combination of linear and logarithmic.

It's commonly seen in recursive sorting algorithms and binary tree sorting algorithms. What it means is that for each element in the input (n), will take log2n time to run, hence n times log2n.

5) O(n2) - Quadratic complexity

Quadratic time is now very inefficient, and while some algorithms can only be written in this complexity category, this and each subsequent category should be a warning sign you need to optimize and refactor your solution.

Quadratic time represents a growth rate where the operations taken is the square of the size of the input.

This means an input of 2 elements has 4 operations; an input of 10 elements has 100 operations; and an input of 10000 elements has 100,000,000 operations.

This means that each element in the input (n) requires n operations to run.

6) O(2n) - Exponential complexity

Exponential Time is another extremely inefficient complexity category.

This means that the number of operations taken is exponentially growing relative to the size of the input.

This is even worse than quadratic time since quadratic time has the base as the input size (n), whereas in exponential time you are looking at an exponent equal to the input size (n).

We can see this by comparing quadratic and exponential times with an input size of 20:

202 = 400

220 = 1,048,576

7) O(n!) - Factorial complexity

Factorial complexity is the worst possible complexity.

It is an incredibly rare complexity to encounter, largely seen only in security related algorithms designed to be extremely inefficient to prevent hackers from computing a solution.

Factorial is a mathematical equation often used to calculate the number of possible permutations for size of n.

9 Factorial (or 9!) can also be calculated as:

9 x 8 x 7 x 6 x 5 x 4 x 3 x 2 x 1 = 362,880.

Compare this against quadratic and exponential complexity above with an input size of 20:

20! = 2.432902e+18 operations

Or it can also be written as, 2.432902 * 1018 operations.

tl;dr

We want to run fast algorithms, and so the first five categories are the main ones you’ll see:

  1. O(1)
  2. O(log2$ n)
  3. O(n)
  4. O(n log2 n)
  5. O(n2)

Don't worry because this will all make more sense as we continue to work through this guide.

In order to better understand these categories, we’ll have to learn about data structures and algorithms to see them applied, so let’s dive in while using the above list and chart as a reference.

Data Structures: An overview

Let’s start with data structures since we work with them everyday and it builds into algorithms.

To understand data structures, we need to first understand what data is.

Data refers to any piece of information that is stored or transmitted in digital form. It can be anything from text, images, audio, video, or even something as simple as binary code.

These bits of data are the lifeblood of the internet, powering everything on the web.

As developers, we work with data everyday and have to find ways to store, retrieve, manipulate and present this data in the applications we build.

Databases are what we use to store data for the long term, but data structures are common formats that the data comes in.

Important: Different data structures have different quirks on how they work, with their own advantages and disadvantages.

To understand this, instead of data, imagine you had different clothes that needed to be stored and organized. We could store them in different boxes for summer, fall and winter.

We could get more granular and store them in different drawers for tops, bottoms, and coats. Depending on the clothing type, our storage would have to be able to accommodate.

Storage for coats might be a closet with hangers, t-shirts might go in a small drawer, and pants in a tall cabinet.

All these different types of storage have unique features similar to the different types of data structures.

There are many different types of data structures such as:

  • Arrays
  • Objects (dictionaries if you’re coming from other programming languages)
  • Linked lists
  • Stacks
  • Queues
  • Trees, and
  • Graphs

Each data structure has different features about how data is stored and accessed from them, as well as some additional features they can perform which give them different advantages and disadvantages.

Let’s take a look at arrays and objects to better understand some of these advantages and disadvantages!

Arrays

If you’ve ever written any code before, you’ve most likely encountered an array.

Arrays are the most well-known and widely used data structure in computer science. An array is a collection of elements stored contiguously.

These elements can be of any data type, meaning they can store integers, floats, strings, booleans and even other arrays!

const myArr1 = [5, 2, 10, 99, 50, 1000]

const myArr2 = ['hello', 'ni hao', 'bonjour', 'hola']

const myArr3 = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

Think of an array as a connected series of boxes, each of which can hold a value. Every box also has an index number, which we use to access whats in that box.

The first box is given an index of 0, and it increases by 1 for each subsequent box (this is called zero-based indexing).

const myArr = ['a', 'b', 'c', 'd', 'e']
//              0    1    2    3    4  <-- index numbers

If we wanted to access the third value ‘c’ in our myArr we would call myArr[2].

If we wanted to access the value ‘e’ we would call myArr[4].

Accessing a value from the array using the index takes O(1) time (constant time) which is as efficient as it gets.

The problem is that this requires us to know which index the value we want is under. Unless we’ve sorted the array in a way where we know what value maps to what index, we’re going to have to manually find the value in the array.

Maintaining a separate map of indexes to values is also inefficient if those values move around or if we add or remove values from the array.

Despite this, arrays are still incredibly useful and common. We’ll explore more about arrays in algorithms when we talk about sorting and searching.

Objects

An object is a data structure that stores a collection of key-value pairs. Each key in the object maps to a corresponding value, much like how a word in a dictionary maps to its definition.

Unlike arrays, which use an index number to access its values, dictionaries use keys to access their values.

Also, unlike arrays, where the index is always an integer, the keys in an object can be strings and integers but they can be any hashable object in some programming languages.

Finally, the values can be of any data type, including integers, strings, arrays, or even other dictionaries.

Here is an object that maps the names of some fruits to their corresponding colours.

const myObj = {
	"apple": "red",
	"banana": "yellow", 
	"orange": "orange",
	"grapes": "purple"
}

We can access the value of a specific key by calling the key.

For example

myObj[”apple”] would return us “red”, whereas myObj[”banana”] would give us “yellow”.

Retrieving values here runs in constant time as well!

Objects are the most common data structures because we store a lot of data as objects. In frontend web development, almost all data is in JSON which stands for Javascript Object Notation. The reason it’s so useful is it’s intuitive to use.

If we’re storing user profiles, we can use keys are the attributes and the values as the details.

{
	"firstname": "Yihua",
	"lastname": "Zhang",
	"country": "Canada"
}

If we had multiple users, we commonly see them stored in arrays:

const userProfiles = [
	{
		"firstname": "Yihua",
		"lastname": "Zhang",
		"country": "Canada"
	},
	{
		"firstname": "Mo",
		"lastname": "Binni",
		"country": "Belgium"
	},
	{
		"firstname": "Sarah",
		"lastname": "Connor",
		"country": "Australia"
	}
]

As mentioned earlier, if we wanted to find Mo, unless we knew that he was at index 1, we would have to search through every user object in the array to find him.

Using an object instead, we could store users with their firstname as the key:

const userProfiles = {
	"Yihua": {
		"firstname": "Yihua",
		"lastname": "Zhang",
		"country": "Canada"
	},
	"Mo": {
		"firstname": "Mo",
		"lastname": "Binni",
		"country": "Belgium"
	},
	"Sarah": {
		"firstname": "Sarah",
		"lastname": "Connor",
		"country": "Australia"
	}
]

Now, we can find any user with just their first name. If we wanted Mo, we could call userProfiles[”Mo”] and we’ll get his user object!

tl;dr

There's a time and place to use arrays or objects to store your data, and each have their own advantages and disadvantages. It’s up to us as developers to understand the problem we’re trying to solve and pick the correct data structure.

Now, the remaining data structures are outside of the scope of this article, but you can learn more about them in-depth in our Data Structures + Algorithms course.

learn data structures and algorithms

Algorithms: An overview

We’ve circled algorithms in the last two topics, but now we’re going to get into the meat of it!

An algorithm is a set of instructions for solving a problem or completing a task. In computer science, algorithms are used to solve all sorts of problems, such as the one I mentioned above about searching in arrays.

We can combine everything we’ve learned so far together to understand how a binary search algorithm is better performing than a brute force algorithm.

For example

Let’s say we had an array containing random integers sorted in ascending order, and we don’t know how many numbers are going to be in the array:

[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]

A search algorithm is an algorithm that finds the index of a given integer. If it exists we return the index of the integer in the array, if it doesn’t we return -1.

An intuitive solution we can come up with is to start from the first integer and check if the value is what we’re looking for. If it is, we return the index, if it’s not we go right by one index and repeat until we either find the value we’re looking for or we reach the end of the array and return -1.

// Integer to find = 14

// Step 1
[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
 ^ // No

// Step 2
[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
		 ^ // No

// Step 3
[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
				 ^ // Yes

// Return index of 2

To evaluate the Big O time complexity of this solution, we just need to think about the worst case scenario.

For example

If we receive a value that doesn’t exist in the array, we have to check every value in the array. This means our solution runs in O(n) or linear time.

This means that if we have 100 integers we do 100 checks and if we have 1,000,000 integers we do 1,000,000 checks.

According to our chart, this is a good time, but we can do better with the binary search algorithm.

The binary search algorithm works if the values in our array are sorted in either ascending or descending order. Let’s take a look at how binary search works:

// Integer to find = 14

[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]

The first step is to go to the middle index of the array (if our array is even, we can round up or down) and check if the value is equal to the integer we’re looking for.

// Integer to find = 14

[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
												 ^
										//middle index

If this value is equal to our integer we return the index, if it isn’t we check if this value is greater than the one we’re looking for. If it is, we can throw away everything to the right; but if it’s lesser we throw away everything to the left.

// Integer to find = 14

[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
												 ^
// 50 is greater than 14, so we can throw 50 and everything to the right away

[4, 10, 14, 17, 32, 41, 50, 63, 71, 82, 90, 99, 100]
											 |----------------------------|
													// All greater than 14

// Remaining search area
[4, 10, 14, 17, 32, 41]

With the right half of the array removed, we repeat the same steps.

We take the middle index, compare the value, and if it exists we return the index, if not we check if the current middle value is greater than the integer we’re looking for.

If the middle value is greater, than we know we want to continue searching through the values that are smaller so we throw away the right side, and vice versa if it’s not.

We repeat this until we find our value, and if we can’t find it, we return -1.

// Integer to find = 14

// Remaining search area
[4, 10, 14, 17, 32, 41]
           ^
/* 
Since our array is even, we can pick the value to the left or right, we just
have to be consistent in our choice every time and search for the middle. Let's choose left.
*/

[4, 10, 14, 17, 32, 41]
         ^
// We have found our value! Return the index 2

Binary search is a much more efficient algorithm since each iteration of choosing the middle of the array and potentially throwing away half of the array, means with each iteration we’re reducing the amount of elements we have to search through in half!

Cutting our search space in half per iteration means that in the worst case, we’re dividing the total array by 2 until we end up with 1 number left.

This is exactly what the logarithm of base 2 does, as its the inverse of 2x.

If 25 is 2 doubling itself 5 times: 2 x 2 x 2 x 2 x 2 = 32; then log2 N is trying to figure out how many times do we halve N until we end up with 2: log2 32 = 5.

This halving is the exact same as what binary search does on each iteration, it cuts the search space in half, meaning our Big O for binary search is O(log2 n).

This is the second best category we can get and its better than our previous solution’s O(n)!

tl;dr

There are numerous other algorithms, many of which are more complex than binary search and many that require a deep understanding of the more complex data structures, and also algorithmic patterns.

But most aren’t necessary in your day to day as a web developer unless you’re working at (or even applying to) a big tech company (Meta, Apple, Amazon, Netflix, Google, Microsoft, Tesla, etc.).

The software engineering interviews for these companies are almost exclusively data structures and algorithms.

At large tech companies you’ll see more complex data structures such as trees, graphs and linked lists occasionally in your work, but outside of these large companies it’s much more rare to encounter them.

Learn how to get hired at FAANG

Again, Andrei and I teach data structures and algorithms specifically for coding interviews and specifically for students who don't have a Computer Science degree but want to get on the same level as anyone who has a degree.

They are part of our Master the Coding Interview series of courses, so be sure to check out those course below:

Whether you take our courses or not, Andrei and I (and many others) highly recommend this free CS50 lecture series by Harvard (the 2017 version is still the best one... the instructor is incredible):

System design: An overview

System design refers to the process of creating a high-level plan or blueprint for how a software system should be built.

In the world of web applications, this is required when applications grow in users and traffic, because the systems that support them also need to grow.

System design can be divided into two main categories: High-level and low level.

So let's break them down.

High-level system design (also known as architectural design or system architecture)

High-level design is usually what people mean when they say system design.

Large systems are usually built as distributed systems, which often require a lot of computers (servers) working together to achieve a common goal. By dividing the workload among multiple computers, distributed systems can improve performance, scalability, and fault tolerance.

Before we dive into that though, we need to understand some key concepts related to distributed systems:

Node

In a distributed system, a node can be a computer, server, or device that has its own resources (processing power, memory, and storage) and usually its own IP address.

When you think about the backend application code you write, it usually gets hosted on a node when it’s live. Nodes can also host databases, services, or any other system component.

Nodes operate independently from each other and communicate with other nodes through a network to exchange information and coordinate what they’re doing.

Node Cluster

Multiple nodes that work together and pool resources is called a cluster.

The nodes that live in these clusters usually run the exact same applications, and a cluster of nodes is treated as a single unit. Clusters have to balance the load of requests they receive amongst its nodes to ensure that no single nodes are overwhelmed or crash.

Network

The communication between nodes in a distributed system occurs over a network, which can be a local area network (LAN) or a wide area network (WAN), such as the internet.

The network enables nodes to send and receive messages or data, allowing them to collaborate on tasks.

Scalability

One of the main advantages of distributed systems is their ability to scale. As the workload increases and the number of users grows, additional nodes can be added to the system to handle the increased demand.

This allows the system to maintain its performance and availability, even as it grows in size and complexity.

Systems will need to scale up and down the amount of nodes depending on how much traffic is currently going through the system.

For example

E-commerce applications may need more nodes during Black Friday sales, versus their regular shopping season.

This means that they’ll need to be able to scale up when there is a huge spike in traffic, or scale down when that seasonal traffic is gone.

Consistency

In a distributed system, data is constantly flowing through multiple nodes and databases.

Maintaining consistency of data across all nodes can be challenging and usually there is a tradeoff of how readily available that data is to those requesting it.

Various consistency models, such as eventual consistency or strong consistency, are used to define the level of synchronization and data accuracy across different nodes. Systems like banks need near immediate consistency of banking data since accuracy about money is a must have.

Availability

In a distributed system, there are often millions of requests for data coming from users, application code, services etc. Retrieving and storing data, running application code and all the other things that happen in a system is often time consuming.

Users on the other hand are not sympathetic to the needs of our system and want things to happen fast, which means that many systems will need to prioritize being able to show them stale data, even if it’s slightly off, rather than wait until they’re sure the data is consistent.

The tradeoff for higher availability is strong consistency, and most systems that prioritize availability can ensure that eventually the data will be consistent, but it’s not the priority.

An example of this would be a messaging application, where users expect to be able to use it at anytime rather than how up to date all the information about the messages themselves are.

Partition tolerance

Distributed systems are designed to be resilient when failures happen, such as hardware malfunctions, network issues, or software bugs. If one node fails, the other nodes can continue to operate, ensuring that the system remains functional.

In the world of distributed systems, partition tolerance is a must have. Traffic spikes, servers crash, bad code gets pushed, security breaches happen and all things under the sun to bring your system to its knees.

As a result, we need to design our systems to be resilient to these things through back up nodes, clustering, and other techniques!

High level system design is incredibly important for every developer to start learning. You can read my article that does a deep dive into System Design here.

You can also take my course on the basics of system design here.

Low-level system design (also known as detailed design or component design)

This stage involves breaking down the high-level design into smaller, more detailed components, modules, or classes. It includes defining algorithms, data structures, and interfaces for each component or module.

Low-level design is concerned with the specific implementation details of the system, such as coding and programming languages.

When building an application, you’ll see a lot of early discussions around how to architect the codebase.

This is why in all our courses that teach web development at Zero To Mastery, we teach low level design in the respective domain!

This means that if you learn React with me or React-Native with Mo, you’ll learn low level architecture patterns for how to structure the applications we build together.

There is a lot of bad software out there but we want to teach you how to write good software partially because it'll help you stand out from your peers but it'll also just save you many headaches down the road by doing things the "right" way in the first place 🙂. Many courses don't bother teaching you this but that's a miss in our opinion.

tl;dr

All in all, you don’t need to learn system design through a computer science degree. In fact, a lot of it doesn’t make sense until you work on a real application at scale to really solidify these concepts into a real world system.

Regardless, it’s still incredibly valuable to learn the theoretical side of system design to better understand how all of web development comes together!

Databases: An overview

So we’ve discussed using data directly in our web applications and computer programs using data structures. A database on the other hand is a software system helps you store, manipulate and retrieve data in the long term.

As a developer, you’ll likely work with a database almost everyday, or at the very least the data that comes from a database.

In computer science, you learn not only how to work with databases, but also which principles guided the way they are designed. This is incredibly important as there are multiple different databases each with their advantages and disadvantages that you’ll learn about.

Each database is an extensive topic on its own, but in this article we’ll discuss some of the most common principles that guide database design.

We’ll discuss two popular database designs called ACID and BASE, but before we do that we need to understand the concept of transactions.

In databases, there are mainly three operations that are performed:

  • Inserting
  • Updating, or
  • Deleting data

A transaction is a sequence of these operations executed together as a single unit of work.

An example of this would be in a banking application. If Cindy sends $100 to Fred, that transaction requires us to perform two different operations:

  1. Subtract $100 from Cindy’s account in the database

  2. Add $100 to Fred’s account in the database

Simple!

These two operations are two different updates, each going to a different entry in the database but it's considered a single transaction of transferring $100 from Cindy to Fred.

When it comes to database design, particularly for ACID and BASE databases, they are designed with properties in mind that apply to their transactions and in turn to the rules that govern the database.

Another important thing to remember is that databases are typically split up or duplicated across multiple backup nodes. This is for redundancy in case one node goes down, so our system can still operate through the backups.

As a result, the data is going to live in multiple places, which means we need to figure out how to ensure that the data is the same across these sources.

These two designs of ACID and BASE have different opinions about how and when to do this which we’ll explore as well.

Let’s start with ACID.

ACID databases

A database is considered ACID compliant if its transaction have the four properties of atomicity, consistency, isolation and durability.

Atomicity

A transaction is atomic if it is treated as a single unit of work that either succeeds or fails completely. This means that all the operations within a transaction should be completed successfully, or none of them should be applied to the database.

If any operation fails, then roll back any changes this transaction would have made from the database.

Looking at our above banking example of Cindy transferring $100 to Fred, our transaction is only considered complete if both operations succeed. If any of the operations fail, we rollback any data that has been modified by this transaction.

This makes sense because we don’t want our data to reflect that Cindy lost $100 but Fred didn’t gain $100 dollars.

We also don’t want our data to reflect that Fred gained $100 but Cindy never lost $100 dollars. Either outcome would have our data being incorrect!

Consistency

A transaction is consistent if the database remains in a consistent state before and after the transaction. This means that any changes made to the database must adhere to its predefined rules and constraints of the database.

For example

If a database has a constraint that ensures that every record has a unique key, a transaction that violates this constraint should be rolled back.

Consistency also ensures that a transaction does not leave the database in an ambiguous state. This means that when a transaction is executed, it should not leave the database in a state where the data does not make sense or cannot be used.

Looking at our banking example with Fred and Cindy, our database could have a rule that that a money transfer cannot be greater than the amount in the senders account balance. Cindy cannot send $100 dollars to Fred if her account balance is lesser than $100.

If her account has more than $100 dollars, the transaction is successful and the database is in a consistent state with updated account balances.

If the transaction fails due to insufficient funds, the database remains consistent, and the transaction is rolled back.

Isolation

Isolation means that each transaction must be executed independently and should not be affected by other concurrent transactions that are executing at the same time.

This means that each transaction happens in a distinct order without other transactions occurring at the same time even if both fire concurrently. Any reads or write operations performed on the database will not be impacted by other reads and writes from different transactions occurring at the same time.

There are usually systems in place to ensure this, but an example might be a global ordering queue is created and each transaction queues up in line to ensure that transactions are completed in their entirety before the next one begins.

Lets look at another banking example.

Cindy has $300 dollars in her account, and two transactions are executed concurrently. Transaction A has Cindy transferring $100 to Fred, while the transaction B is Cindy transferring $250 to Samantha.

Remember our database rule where a transfer cannot happen unless there is sufficient funds.

Since both transactions are firing concurrently, we could see transaction A checking Cindy’s account balance for sufficient funds, sees $300 and starts. While this is happening, transaction B also checks Cindy’s account for sufficient funds, and also sees $300 dollars and starts.

The problem is that if both transactions happen, Cindy would have -$50 in her bank account meaning our database is inconsistent and the data incorrect. One of the two transaction should not have happened since if either transaction finished first, we would seen Cindy without enough funds to perform the remaining transaction.

Durability

Once a transaction is committed, the changes made to the database must be permanent and survive any subsequent system failures. This means that the database must ensure that all committed transactions are recorded permanently and can be recovered in case of a system failure.

ACID databases are commonly used in mission-critical systems such as financial applications, where data consistency and reliability are critical requirements.

SQL or relational databases are the most common and used versions of ACID databases.

Examples of SQL databases include PostgreSQL, MySQL, Oracle Database and Microsoft SQL Server. You can learn extensively about SQL databases (and other databases) by taking Mo and Andrei’s Database course which focuses a lot of time on mastering SQL.

While the advantages of ACID databases is that you know the data in it is consistent, there is a large performance cost of running these kinds of databases.

To ensure that transactions are ACID compliant, there are numerous systems such as the global ordering queue that make it slower when it comes to processing multiple, concurrent transactions.

In an ACID database, data is prioritized being consistent over it being available, meaning that if there is a chance there is an error in the data, our database will not give data out when requested until the data is secure.

Not all parts of an application need this level of consistency, which is where a BASE designed database can come in.

BASE databases

A BASE database is a type of database management system that provides a relaxed consistency model and is designed to achieve higher availability and scalability.

A BASE database is designed to always provide high availability to users and clients asking for data, even at the cost of the data being stale.

This acronym is honestly a little confusing because the different properties overlap a lot, but this is likely due to trying to force the ACID/BASE metaphor to work.

The acronym BASE stands for 'Basically Available Soft State'.

Basically Available

This means that the system is designed to ensure that data is available for read and write operations most of the time, even in the case of network partitions or node failures.

It prioritizes availability over consistency, meaning that the system may return stale or out-of-date data in some cases.

Soft State

The concept of "soft state" refers to the idea that the state of the system can change over time, even without input. This may make more sense when compared to an ACID database where the database can only change through external input (i.e. transactions).

The reason for this is because a BASE database does not prioritize consistency, which means the state of the data stored across nodes in a system may not be the same.

A BASE database may allow for temporary inconsistencies in data. This means that the data in the database may not be consistent at all times, but it should eventually converge to a consistent state.

In other words, the system's state is only an approximation of reality, and it may change in response to various factors such as network latency, node failures, and other external factors.

Eventually Consistent

A BASE database may allow for temporary inconsistencies in data, but it will eventually converge to a consistent state. This means that the data may not be consistent at all times, but the database will eventually reach a consistent state.

Examples of BASE databases includes Cassandra, Hbase, and Amazon DynamoDB.

There are tradeoffs you have to make with any system, and making those tradeoffs at the database level is one of the key ones to make. Usually large systems will run both ACID and BASE databases at different levels of the application to service different needs.

Understanding Databases is crucial to learn for every Web Developer, but you can pick it up as easily on the job as you can in a computer science degree.

Where to keep learning Computer Science

Whew! If you’ve made it all the way to end, amazing work for sticking with me this far.

I know it was a lot to cover and fairly complex, so well done.

Of course we haven't covered everything here that you would cover in a CS degree.

Whaaaat? Why? Because your time is valueable!

The topics I covered are the ones that you're most likely to encounter on the job or in a coding interview which is all that 99% of people need to know.

For the 1% out there, a computer science degree will also teach other things like the relevant math for proofs, security and networking, operating system and how to build programming languages. But again, these are less useful for the majority of web developers.

So with that... I hope this guide helped introduce you to the basics and key topics of computer science that you'll need to learn and know as a web developer.

And hopefully you're excited to dive deeper!

If you have any questions, be sure to join our private Discord community by becoming a member of Zero To Mastery.

On Discord, you can ask me or other instructors and mentors or even other students currently taking any of our related Computer Science / Data Structure and Algorithm courses any questions you have.

I'm not trying to sell you but you honestly have nothing to lose... ZTM has a no questions asked 30-day money back guarantee. We are confident offering that because we've helped 1,000s of students just like you go from NO computer science background or experience to getting hired at top companies.

So as long as you're willing to put in the work, we can help you do the same! If you find it's not a good fit for you for whatever reason, no problem at all.

Hope you to see you inside the Zero To Mastery Academy soon!

More from Zero To Mastery

How To Become A Web Developer (From Complete Beginner to Hired) preview
How To Become A Web Developer (From Complete Beginner to Hired)

Want to become a Web Developer but not sure how? Our step-by-step guide shows you how to go from beginner to hired (without wasting $1,000s on a bootcamp or degree). And we answer all your FAQ.

How To Learn Anything: Faster And More Efficiently preview
How To Learn Anything: Faster And More Efficiently

Want to learn a new skill? Here are 19 tips, techniques, and tactics that you can use to learn (and remember) absolutely anything, today!

The 10 Hard Truths About Being A Software Engineer preview
The 10 Hard Truths About Being A Software Engineer

Be skeptical. Before you jump into this career, learn 10 hard truths about working in the software industry (bootcamps and universities don't tell you this).