Part 03
Cicd Pipeline Design
Speaker 1: Picture manually copying files to dozens of servers after every tiny fix. It's
slow, error-prone, and frankly a bit soul-crushing.
Speaker 2: Exactly. Continuous integration and continuous delivery—CI/CD for
short—evolved to end that pain. They let us package code changes automatically and
push them to testing or production with a single command.
Speaker 1: Even non-technical teams appreciate the reliability. Instead of wondering if
the latest version actually made it to the customer site, everyone can see the pipeline
status in real time.
Speaker 2: Think of it like an assembly line in a factory. Once the pieces start moving,
the machinery takes care of the rest, freeing people to focus on improving the product
rather than shipping logistics.
Speaker 1: At its core, a CI/CD pipeline takes every change and runs it through the
same checklist—build the code, test it thoroughly, then deploy it in a predictable way.
Speaker 2: Automating these steps removes guesswork and inconsistency. The trick is
to build the application once, store that exact artifact, and promote it through test,
staging, and production environments.
Speaker 1: Quick feedback loops are essential. If a test fails minutes after a commit,
you can fix the issue before it grows.
Speaker 2: Security scans and quality gates slot neatly into this process as well, so new
features never skip the basic health checks. The pipeline should become the trusted
guardian that approves everything that ships.
Speaker 1: Once a developer commits code, the pipeline springs into action. A clean
environment spins up so any leftover files from previous builds can't cause trouble.
Speaker 2: The code is compiled or packaged, then a suite of automated tests checks
that nothing obvious broke. If those tests pass, the same package moves on to a
staging area where more realistic checks occur.
Speaker 1: Staging mirrors the production environment closely. It's where you might
run smoke tests or manual reviews if needed.
Speaker 2: Only when everything looks good does the pipeline promote the very same
build to production. That consistent flow—from commit to build to test to
deploy—means you always know exactly what you're releasing.
Speaker 1: One popular way to run a pipeline is through GitHub Actions. It's basically a
set of instructions written in a simple YAML file that tells GitHub what to do whenever
new code arrives.
Speaker 2: You can define separate jobs for building, testing, and deploying. That
separation keeps things tidy and makes it easy to see where a failure happens.
Speaker 1: The workflow can also reuse community-created actions for common tasks
like setting up a programming language or uploading artifacts.
Speaker 2: And because sensitive information is stored in encrypted secrets, you don't
have to hard-code passwords or access keys. It's like giving the pipeline a safe to keep
its credentials locked away while it works.
Speaker 1: In the end, a solid pipeline is less about fancy tooling and more about
building trust. Every time code flows smoothly from a developer's laptop to production,
confidence grows across the team.
Speaker 2: Automating the repetitive steps frees everyone to focus on quality and new
ideas. Mistakes still happen, but the pipeline catches many of them before customers
ever notice.
Speaker 1: Think of it as a safety net and a speed booster rolled into one. Once you've
experienced reliable CI/CD, it's hard to imagine going back to manual releases.
Speaker 2: So the real takeaway is simple: invest in your pipeline early. It pays off with
faster feedback, fewer surprises, and calmer midnight releases—or better yet, no
midnight releases at all.
Devops Sre Platform Careers
Speaker 1: DevOps, SRE and platform engineering roles all grew from the need to ship
software quickly without breaking things. Ten years ago you would rarely see these
titles outside of tech giants. If you told someone you were a DevOps engineer, they
might have asked if you fixed printers. Companies like Netflix and Google proved that
automation and tight feedback loops were essential at scale, spawning whole teams
devoted to keeping services healthy.
Speaker 2: The roles overlap, yet each leans in a different direction. DevOps engineers
build the pipelines and smooth collaboration between development and operations.
SREs stand guard over reliability by codifying incidents and measuring service levels.
Platform engineers design the internal tools and "paved roads" that keep everyone
productive. Many of these professionals contribute to open source projects like
Kubernetes or Terraform, and remote-first cultures allow them to work across time
zones. All three paths pay well once you've mastered the tooling, though diversity
challenges remain an industry topic.
Speaker 1: DevOps engineers craft the pipelines that move code from commit to
production. A normal day might involve wiring up GitHub Actions or Jenkins jobs,
containerising apps with Docker and debugging a failed deployment at 3 AM. Their
motto: "It works on my machine" is not an acceptable release strategy.
Speaker 2: Most start as system administrators or developers who discover a knack for
automation. They thrive on curiosity, clear communication and comfort with ambiguity.
Certifications like AWS DevOps Professional or the Kubernetes CKAD help them
progress. Junior salaries hover around $70k but can climb past $120k in senior roles,
often with remote work and on-call rotations. Small startups may have one DevOps
generalist, while large enterprises build teams of five or more. Over time you can grow
into architect or engineering manager positions and contribute to open source tooling
along the way.
Speaker 1: Site reliability engineers take system stability to heart. The discipline grew
out of Google's need to manage massive scale, so it's all about automation and
repeatable operations. An SRE might spend mornings writing Kubernetes operators and
afternoons running chaos experiments or tuning Prometheus alerts. When Netflix
battled early outages, their SREs pioneered chaos engineering to strengthen resilience.
Speaker 2: Most SREs come from software or DevOps backgrounds and are comfortable
coding and troubleshooting production issues. They remain calm when a 3 AM page
arrives and excel at communicating what went wrong. Certifications such as Google's
Professional Cloud DevOps Engineer bolster credibility. Entry salaries start near $80k,
but staff and principal SREs can exceed $140k. On-call rotations and incident
retrospectives are part of the culture, though many teams are remote friendly. Large
companies maintain dedicated SRE squads, while smaller firms rely on a handful of
specialists who obsess over metrics to keep services healthy.
Speaker 1: Platform engineers provide the reusable tooling that keeps teams
productive. Think of them as the people building the roads and traffic lights so
everyone else can drive smoothly. A platform engineer might design Kubernetes
templates, manage golden AMIs or maintain an internal developer portal. Spotify's
Backstage project is a great example of tooling that began inside one company and
became open source for all.
Speaker 2: Many platform engineers come from DevOps or SRE backgrounds and
discover they love building shared services more than maintaining single applications.
Empathy for developers, an eye for system design and a dash of product thinking all
help. Salaries often start around $90k and can reach $150k for senior architects,
especially if you hold cloud certifications. Larger enterprises may have teams of twenty
or more, while small start-ups rely on a single expert. Over time you can grow into a
platform architect or product manager role. Remember, nobody notices the road
builders until there's a pothole!
Speaker 1: Each path offers room to grow from hands-on engineer to strategic leader.
Small start-ups might only have one or two DevOps generalists, while mid-sized
companies carve out dedicated SRE and platform teams. Large enterprises can support
whole divisions focused on reliability or internal tooling.
Speaker 2: Certifications in AWS, GCP or Kubernetes help you stand out, and
remote-friendly cultures mean geography is less of a barrier. Expect the occasional
on-call rotation, but flexible hours are common. Junior roles start around $70k–$90k and
can rise above $150k as you progress to staff or principal levels. These careers reward
people who enjoy problem solving, clear communication and continuous learning.
Diversity and inclusion efforts are improving, yet more participation from
underrepresented groups is still needed. Whether you specialise or blend all three
paths, the demand for automation and reliability skills keeps climbing.
Dora Metrics
Speaker 1: Software teams often argue about speed versus stability. The DORA study
cut through that noise by identifying four metrics that predict success.
Speaker 2: Think of them as a health check for your delivery pipeline. When these
numbers improve, you know your process is maturing without sacrificing reliability.
Speaker 1: Deployment frequency tracks how often you successfully release code. Lead
time measures the journey from commit to production. Together they show how
smoothly work flows.
Speaker 2: Change failure rate looks at what proportion of releases cause problems.
Mean time to recovery tells you how quickly you can fix things when they break. High
performers excel on all four.
Speaker 1: Numbers alone don't guarantee improvement. Track these metrics over time
and relate them to customer experience and business goals.
Speaker 2: If deployment frequency goes up but failure rate follows, you may need to
slow down and reinforce testing. Balanced metrics drive sustainable velocity.
Speaker 1: The takeaway is simple: measure what matters and act on it. DORA metrics
provide a clear lens on delivery performance.
Speaker 2: Use them to spark meaningful conversations about reliability and speed.
Continual tracking turns raw data into real improvement.
Github Actions Workflows
Speaker 1: Remember the bad old days of copying files to a server by hand? One
forgotten step and everything broke. GitHub Actions saves us from that drama by
automatically running the same steps every time code changes.
Speaker 2: Think of it like a recipe that bakes itself whenever someone adds a new
ingredient. Each change triggers a series of pre-defined tasks so that nothing is missed
and everyone can see exactly what happened. Automation isn't just a nice-to-have; it's
the safety net that keeps our future updates smooth and repeatable.
Speaker 1: Once you taste that reliability, you never want to go back to manual
deployments.
Speaker 1: A GitHub Actions workflow is defined in a YAML file. YAML is just a
human-readable list of instructions, like a shopping list written in plain text.
Speaker 2: Each workflow starts with a trigger—maybe someone pushed code, opened
a pull request, or a timer fired. That trigger launches one or more jobs which run on
"runners," the virtual machines GitHub provides or your own servers.
Speaker 1: Inside a job you define individual steps. Those steps can call reusable
actions from the community or simply run shell commands. Because the workflow lives
next to the code, it goes through the same pull request review process, making
automation changes visible and safe.
Speaker 1: Picture updating a website by hand—copying files, running tests, and hoping
nothing breaks. With GitHub Actions, those chores become a push-button affair.
Speaker 2: A workflow might run unit tests on Windows, Linux, and Mac all at once
using a matrix job. If everything passes, another job bundles the code and deploys it to
your web host automatically.
Speaker 1: You can even have Actions send a Slack message when a build fails, like a
smoke detector for your code. These real-world examples save hours of manual effort
and catch mistakes before customers see them.
Speaker 2: And because the steps are version controlled, you can rewind or tweak them
just like regular code.
Speaker 1: When code is ready, another workflow job can deploy it. You can require
approvals before the deployment step runs, a bit like needing a manager sign-off before
shipping a product.
Speaker 2: GitHub calls these gated stages "environments." You can limit who can run
them and track every deployment's history. For cloud infrastructure, official actions
from providers like AWS or Azure handle most of the heavy lifting.
Speaker 1: Notifications are part of the story too. A simple step can send a chat
message or email so the team knows exactly when something went live—or if it failed.
Speaker 2: You might be wondering how all this applies to your day job. For help desk
staff, automated tests mean fewer broken releases landing on your queue.
Speaker 1: If you’re in quality assurance, GitHub Actions can spin up fresh test
environments on demand so you spend your time testing features, not setting up
servers.
Speaker 2: Project managers get clear visibility because each workflow run records
when and how a release happened. And for small businesses, automation slashes the
manual labour that used to require extra headcount.
Speaker 1: Wherever you land in IT, having a trusty robot assistant frees you to focus
on the interesting problems.
Speaker 1: Keep each workflow job focused on a single task so failures are easy to
trace. Short jobs also finish faster, giving developers quick feedback.
Speaker 2: Limit permissions wherever possible and rotate secrets regularly. GitHub
lets you grant only the access a job truly needs, which reduces risk if credentials leak.
Speaker 1: Track how long jobs take and watch for any slowdowns. If a five-minute build
suddenly takes twenty, that’s a sign something needs attention.
Speaker 2: Finally, reuse actions from the community or your own library to avoid
reinventing the wheel and keep your automation maintainable.
Speaker 2: When teams treat workflows like code, automation evolves right alongside
the project. Each improvement is recorded in version control so you always know why
something changed.
Speaker 1: GitHub Actions brings that automation directly into the repository. Build
steps, tests, deployments and notifications all happen in a single, auditable place. You
get faster releases, fewer mistakes and clear visibility for everyone involved.
Speaker 2: Whether you’re supporting a small website or a complex enterprise system,
the key takeaway is simple: start small, keep iterating and let the platform handle the
repetition so you can focus on delivering value.
Sre Error Budgets
Speaker 1: Imagine you promised your app would be available 99.9% of the time. That
still allows roughly nine hours of downtime a year.
Speaker 2: Site Reliability Engineering, or SRE, turns those promises into math we can
track. The key tools are service level objectives, or SLOs, and the error budgets tied to
them.
Speaker 1: By defining how reliable a service must be, we can decide when it's safe to
launch new features and when it's time to pause and fix instability.
Speaker 1: A service level objective is basically a reliability target, like "respond to user
requests within two seconds 99% of the time." It sets clear expectations.
Speaker 2: For non-technical folks, think of an SLO as a promise to customers. If we hit
that goal, users stay happy. If we miss it, they notice glitches or slow pages.
Speaker 1: We monitor these objectives continuously so we know if the service is
drifting away from the acceptable range before customers complain.
Speaker 1: An error budget is the small slice of allowable failure built into an SLO. If our
target is 99.9% uptime, that gives us about forty minutes of downtime each month.
Speaker 2: As outages or performance issues occur, we "spend" that budget. When it's
gone, engineering focuses on reliability work instead of shipping new features.
Speaker 1: This approach keeps everyone aligned. Product managers see how
instability eats into development time, while engineers know exactly when to slow their
release pace.
Speaker 1: Error budgets aren't just for ops teams. They spark conversations about risk
across the organization.
Speaker 2: When the budget burns down faster than expected, it's a sign our releases
might be too risky or our SLO is unrealistic.
Speaker 1: Teams can agree to slow deployments, add more testing, or even adjust the
SLO if customer impact warrants it. The data helps remove emotion from those
decisions.
Speaker 1: The real win is shared language. SLOs and error budgets help teams
quantify acceptable risk instead of arguing about perfection.
Speaker 2: They also create breathing room. When you're under budget, you can
innovate quickly. When it's nearly spent, stability becomes the priority.
Speaker 1: By treating reliability like any other feature, SRE practices keep users happy
and developers focused on the right work at the right time.
Trunk Vs Feature Branching
Speaker 1: Modern teams debate whether to work directly on the main branch or use
long-lived feature branches.
Speaker 2: Each method shapes how quickly changes integrate and how much merge
pain you face later.
Speaker 1: Trunk-based development keeps everyone committing to the same main
line.
Speaker 2: Small, incremental changes integrate quickly, and feature flags hide work in
progress.
Speaker 1: Feature branching isolates each piece of work so teams can experiment
safely.
Speaker 2: The downside is merges grow complex the longer a branch lives away from
trunk.
Speaker 1: Teams often blend approaches, keeping branches short and integrating
daily.
Speaker 2: The goal is fast feedback with just enough isolation for code review and
testing.
Speaker 1: Whether you prefer trunk-based or feature branches, keep merges small
and frequent.
Speaker 2: Continuous integration works best when nothing stays out of the main line
for long.