Beginner’s Guide to Quality of Service in DevOps

When everything slows down during a deploy or a backup, it’s easy to blame your app.

The thing that most people don’t realize though is that it's not always the code's fault. Usually the problem lies with the way your traffic is handled.

The good news is that there’s a solution called Quality of Service (QoS) that can keep your most important services running smoothly, even when things get busy. The problem? It’s often overlooked in DevOps, even though most Linux systems support it by default (when set up correctly).

In this guide, I’ll break down what QoS is, why it matters, how to use it to your advantage and set it up with Bash, and how to avoid the most common mistakes along the way.

Let’s dive in…

What is Quality of Service?

When multiple things on a system try to use the network at the same time, they compete for resources:

One tool might be uploading logs
Another might be downloading updates
Or maybe a deploy is running, or a backup just kicked off

All of this goes through the same connection, and by default, Linux doesn’t prioritize anything. It just treats everything as equally important and sends everything out as fast as it can. This lack of prioritization can create problems because a quick health check or a small API call might get stuck waiting behind a huge file transfer.

Even worse?

Your system will start to feel slow for your users, even though nothing is actually broken.

Quality of Service can help fix this, because it lets you control how different types of network traffic are handled. This means you can prioritize the most important parts and improve that user quality experience.

To be clear, you’re not increasing bandwidth or speeding anything up, but you are deciding what should go first, what can wait, and what should be slowed down if things get crowded. So from the user's perspective, everything runs as fast as possible.

You can kind of think of it like adding traffic lights to a junction. Sometimes you’ll get far more traffic coming into that junction that needs to be prioritized. However, you also want to make sure that other lanes are not sat waiting for hours.

That’s what QoS is all about. Handling traffic better so important things can happen and user experience is the best it can be. You’re not building more roads, but you are keeping things moving in the right order.

And the best part?

If you’re on Linux then you already have what you need because the tooling is built-in. You can use Bash to set it up, test it out, and gradually take control of how your system behaves under pressure.

You just need to know how to set it up and how it works, so let’s break that down now.

How Linux handles QoS (and how to control it with Bash)

Linux gives you all the tools you need to shape traffic and prioritize what matters, however, none of it is turned on by default. So if you want to control how traffic flows through your system, then you have to define the rules yourself.

That’s where a command-line tool called tc comes in. It stands for “traffic control,” and it lets you attach rules to your network interface — usually something like eth0, ens33, or enp0s3.

To see the available interfaces on your system, you can run:

ip link show

Once you know the name of your interface, you can then start applying traffic rules. But first, let’s understand how tc works behind the scenes.

Queuing disciplines (qdiscs)

Every time data leaves your system, it gets placed into a queue. That queue can be controlled in different ways using what Linux calls queuing disciplines, or qdiscs.

The default is usually something simple like FIFO (first in, first out). That just sends packets in the order they arrive. It’s fast, but it doesn’t give you any control.

For more flexibility, you can use a more powerful qdisc like HTB (Hierarchical Token Bucket). HTB lets you split traffic into classes and assign each one a speed limit or priority. That means you can give your monitoring traffic a guaranteed amount of bandwidth, cap your backups, or make sure certain ports always get through first.

To start clean, you should remove any existing qdisc before adding a new one. This command clears it:

tc qdisc del dev eth0 root 2>/dev/null

Now let’s set up HTB:

tc qdisc add dev eth0 root handle 1: htb default 10

This creates a root HTB qdisc and assigns a default class (10) to any traffic that doesn’t match other rules.

Next, you’ll define a class with a speed limit:

tc class add dev eth0 parent 1: classid 1:10 htb rate 1mbit ceil 1mbit

This creates a class with a 1 megabit per second limit. Anything that falls into this class will never go faster than that.

Adding filters

You can go further by creating additional classes and filters to match traffic based on IP address, port, or protocol. Here’s a basic example of matching traffic by port:

tc filter add dev eth0 protocol ip parent 1:0 prio 1 u32 \
  match ip dport 9000 0xffff flowid 1:10

This tells Linux: “any outgoing traffic to port 9000 should go into class 1:10,” which you already capped at 1mbit.

You can now apply this logic to different types of traffic.

Want to deprioritize backups? Cap that port
Want to guarantee bandwidth to a health check? Assign it to a higher-priority class

And because you’re doing this in Bash, you can bundle these commands into a script, run them as part of a CI pipeline, or reset the rules on boot, because everything is transparent and under your control.

Tl;DR

So in short, Linux lets you control traffic using:

tc to define rules
qdiscs like HTB to organize it into classes
and filters to match the right traffic

And with Bash, you can manage it all in a repeatable way. Once you get comfortable with these tools, you can test, tweak, and adjust your QoS setup to fit exactly what your system needs.

So now let’s look at how to verify whether your rules are actually working.

How to verify your QoS settings are working

It’s important to understand that Linux won’t throw an error if you apply a traffic rule that doesn’t match anything. And if something feels slow or unchanged, you’ll want to be able to confirm whether the limit is in place or not.

The good news is that there are a few simple tools that can help.

One of the easiest is nload, because it gives you a live, visual readout of how much traffic is moving through your network interface.

You can install it with your package manager:

sudo apt install nload # for Debian/Ubuntu

Then run:

nload eth0

This shows how much data is coming in and going out. If you’ve applied a QoS rule to limit traffic, you should see the outgoing rate stay near that limit.

Another tool is iftop, which breaks down traffic by connection and shows which processes or remote hosts are sending the most data. It’s useful when you’re trying to spot something noisy or confirm that a background process is being throttled.

For more specific testing, you can try iperf3.

It creates artificial network traffic so you can see how your system performs under controlled conditions. Then you can run it between two machines, (or between a VM and the host), and you can measure throughput directly, both with and without QoS rules applied.

You can also check your applied tc rules with:

tc qdisc show dev eth0
tc class show dev eth0

If you see your HTB classes and limits listed, you’ll know they’re active. If not, something didn’t stick, and now you know where to look.

Handy right?

This kind of visibility is what makes QoS so useful. You’re not just setting rules and hoping for the best. You’re observing, adjusting, and shaping traffic in a way that actually fits your system.

But even with the right tools, things can still go wrong if you’re not careful. And in practice, that’s usually where most of the trouble starts.

Common mistakes DevOps engineers make with QoS

QoS gives you powerful control over how your system handles traffic, but only if it's set up the right way. And let’s be honest, user error is where things often go sideways.

From using the wrong interface to copy-pasting complex rules without fully understanding them, it’s easy to misconfigure something and not realize until performance takes a hit.

So instead of stumbling into them later, let’s unpack what actually goes wrong, why it happens, and how to prevent it.

Mistake #1. Applying rules to the wrong network interface

Linux doesn’t always name network interfaces the same way.

On some systems it might be eth0, but on others it could be ens33, enp0s3, or something else entirely. So if you copy a script that uses eth0 but your system calls the interface by a different name, the rule won’t apply, and you won’t get an error either.

It’ll just silently fail, so to avoid this, always run:

ip link show

This command lists all available network interfaces so you can be sure you're applying QoS rules to the right one. It's a simple step, but skipping it is one of the easiest ways to break your setup without realizing it.

Mistake #2. Forgetting to clear old rules

The tc tool doesn’t automatically replace existing settings, so if you run a new command without clearing out the previous configuration, Linux might stack rules on top of each other, or refuse to apply the new ones entirely. Either way, the result can be confusing because your changes might not take effect, or they might behave unpredictably.

This is especially common during testing. You try one setup, tweak it, run the script again… and nothing seems to change. But what’s really happening is that your old rule is still active underneath.

To avoid this, get into the habit of resetting the interface before adding new rules:

tc qdisc del dev eth0 root 2>/dev/null

This tells Linux to delete the root queuing discipline for the interface (which clears everything). The 2>/dev/null part just hides the error message in case there was nothing to delete.

It's a clean slate every time and that makes your scripts more reliable.

Mistake #3. Expecting QoS to act like a performance booster

As I mentioned earlier, QoS can absolutely improve how your system performs but only up to a point. It doesn’t increase bandwidth or magically make your network faster. What it does is fix how that bandwidth gets used.

For example

If a service is slow and you assume QoS will make it faster, you might be solving the wrong problem. Maybe the real bottleneck isn’t network traffic at all. Maybe it’s disk speed, DNS lag, or CPU load. Or maybe your network isn’t even under pressure yet, so the QoS rules you apply don’t visibly do anything.

QoS works best when you know your limits and want to protect the right traffic under pressure. It won’t help if your system is already overloaded. And it won’t help if what you’re fixing isn’t actually a network problem.

The best way to use it is as a form of control and not as a cure-all.

Always measure first and find out what’s competing. Then use QoS to make sure your system behaves the way it should when things get crowded.

Mistake #4. Leaving strict limits in place after testing

When you're experimenting with QoS, it's common to set hard limits to see how your system behaves.

For example

Maybe you throttle bandwidth to 1mbit per second so you can simulate slow network conditions. And that’s a useful test but if you forget to remove the rule afterward, you’ve just capped your system permanently.

This happens more often than you’d think, especially if the script runs at boot or gets added to a provisioning process. Suddenly, a deployment feels slow, logs lag, or your service starts timing out, and it’s not obvious why.

So the fix is simple. Just remember to treat test rules like any other temporary setting and clean them up when you’re done. That might mean manually removing the limit with:

tc qdisc del dev eth0 root

Or building a separate cleanup script so nothing gets left behind. Just make sure whatever you test doesn’t accidentally stick around longer than it should.

Mistake #5. Copy-pasting complex rules without understanding them

If you’ve ever looked up how to use tc, you’ve probably run into examples that feel more like cryptic puzzles than helpful commands. The syntax is dense, the flags aren’t self-explanatory, and most tutorials just throw a block of text at you with little explanation.

That’s why it’s so tempting to copy and paste someone else’s QoS rule into your script and hope it works. But this is one of the fastest ways to introduce bugs or slowdowns that are hard to diagnose, especially if the rule does something different than you think.

For example

Let’s say that you find a command online that sets a filter on port 22 to apply a bandwidth cap. So you paste it into your script without realizing that port 22 is used for SSH. Suddenly, your remote sessions are laggy or even cut off entirely, and it’s not obvious why because the rule technically “worked.”

Or maybe you copy a rule that uses a queuing discipline like fq_codel or hfsc without understanding what it does. The rule applies, but traffic doesn’t behave the way you expect. You start changing things blindly, not realizing that the problem is a mismatch between the queuing strategy and your network conditions.

The solution is to start small. Don’t try to implement traffic shaping, filtering, and prioritization all at once. Begin with one simple rule like limiting all outgoing traffic to a known rate and build from there. And when you add complexity, make sure you understand what each flag and setting actually does.

And finally, always test changes in a controlled environment, and check your setup with commands like:

tc qdisc show dev eth0
tc class show dev eth0

That way, you can see exactly what’s applied and make sure it matches what you intended.

Mistake #6. Only shaping upload (egress) traffic

Most of the examples you’ll find online, (and even the ones in this guide), focus on outgoing traffic. That’s because Linux’s tc tool is designed to shape what leaves the system, not what comes in. And for many DevOps workflows, shaping egress is enough, simply because things like backups, CI builds, or container logs all usually send data out, and those can be easily throttled or prioritized.

But that’s not the whole picture.

Some of the most frustrating network issues happen on the inbound side.

For example

Maybe your app is getting flooded with incoming requests, or a remote data sync is pulling in large files faster than your system can process them.

Why does this matter?

Well, if you’re only shaping upload traffic, then none of that inbound pressure is being managed and you’ll still see lag, with dropped packets or resource starvation.

The tricky part is that Linux can’t shape download traffic the same way it shapes uploads. By the time a packet arrives at your interface, it’s already here, which means that Linux doesn’t have a chance to delay or deprioritize it before it hits the system.

There are workarounds, in that you can sometimes limit incoming traffic indirectly by controlling the receiving application or using filtering and policing techniques. Also, some setups use ingress qdiscs or Intermediate Functional Block (ifb) devices to redirect incoming traffic through a virtual interface where shaping can happen. But these are more advanced and not always worth the complexity unless you're dealing with a very specific bottleneck.

The key takeaway is this:

If your traffic problem is coming into the system, shaping outbound traffic won’t fix it. And if you don’t know whether the issue is inbound or outbound, you won’t know where to focus your QoS rules.

Before applying anything, take a moment to observe your traffic flow. Are you sending too much? Receiving too much? Both? That one step will save you a lot of guesswork, and make your QoS setup far more effective.

Mistake #7. Using QoS without observability or context

One of the most common mistakes is when someone sets up QoS rules without first understanding what their traffic actually looks like.

You might assume your CI pipeline is the problem because it runs during business hours, or that a certain container is hogging bandwidth because it writes a lot of logs. So you set a traffic limit on that service, thinking you’re fixing a conflict. But if you haven’t measured anything, you’re just guessing and those guesses can cause more harm than good.

For example

Let’s say you apply a limit to your backup job because it “seems” like it’s interfering with your app. But after applying the rule, your system still lags.

So what’s happening?

Well in this situation, the problem wasn’t the backup. It was a metrics exporter running on a different port, pulling in external data every minute. And so without checking actual traffic flow, your QoS rule just missed the mark.

As a rule of thumb, before you start shaping or limiting anything, take time to observe.

Tools like:

iftop shows live bandwidth usage by connection or port
nload shows total incoming and outgoing traffic in real time
netstat or ss lists active connections and their states
ip -s link shows traffic totals per interface

These tools don’t give you deep analytics, but they do give you just enough visibility to know what’s talking, how much, and where it’s going.

Because when you can see which services are active, what ports they use, and how much data they’re moving, you’ll set far more accurate and effective QoS rules. You’ll avoid throttling the wrong thing, and you’ll catch problems earlier, before they get baked into a script or deployment config.

QoS isn’t just about control. It’s about informed control. If you’re not measuring first, you’re not shaping traffic and you’re just making guesses.

Give QoS a try with your own traffic today

So as you can see, QoS isn’t a silver bullet, but it is one of the most overlooked tools in a DevOps engineer’s toolkit. It gives you control where there usually isn’t any, especially when traffic gets busy and the system starts to feel unpredictable.

And the best part? You don’t need a complex platform or external service to use it. If you’ve got a Bash terminal and a Linux machine, you’re ready to start.

So test it out. Apply a simple limit, watch what happens, and get familiar with the tools. The more you understand how your traffic behaves, the more control you’ll have over how your system performs when it matters most.

P.S.

If you want to improve your DevOps skills, then check out my courses on Bash, Linux, Terraform and more.

Intermediate

Bash Scripting: Learn Shell Scripting

10 Hours •120 Lessons

Learn Bash Scripting from scratch, from an industry expert. You'll learn Shell Scripting fundamentals plus get the practice and experience to get hired as a DevOps Engineer, SysAdmin, or Network Engineer.

Andrei Dumitrescu

Beginner

DevOps Bootcamp: Learn Linux & Become a Linux Sysadmin

25 Hours •213 Lessons

This DevOps Bootcamp will take you from an absolute beginner in Linux to getting hired as a confident and effective Linux Sysadmin. Includes Ansible, Docker, IPFS + more!

Andrei Dumitrescu

Intermediate

DevOps Bootcamp: Terraform

8 Hours •95 Lessons

Learn Terraform from scratch, from an industry expert. You'll learn Terraform fundamentals all the way to advanced, in-depth expertise so that you go from beginner to being able to get certified as a Terraform Associate!

Andrei Dumitrescu

All updated for 2025. They’ll give you the skills you need to become a DevOps Engineer this year, or fill out any gaps in your current knowledge.

Better still?

Once you become a ZTM member, you get access to all of these courses, as well as every other course in our library!

Not only that, but you can join our private Discord community and chat with me, other teachers, students, and working tech professionals so you’re never stuck.

Best articles. Best resources. Only for ZTM subscribers.

If you enjoyed Andrei's post and want to get more like it in the future, subscribe below. By joining over 300,000 ZTM email subscribers, you'll receive exclusive ZTM posts, opportunities, and offers.

No spam ever, unsubscribe anytime