DevSecOps - Site Reliability Engineer (SRE) / US Gov
Description
Design, automate, deploy, and operate highly reliable cloud systems supporting mission-critical workloads for U.S. Government customers. This role is centered on DevSecOps and site reliability engineering, with a strong emphasis on deployment automation, operational stability, and system resilience across AWS GovCloud and AWS C2E environments. You will be responsible for the reliability and operability of Quindar’s platform in production, ensuring systems are observable, fault-tolerant, and require minimal manual intervention. Your work will directly impact mission success by improving system uptime, deployment velocity, and operational confidence in constrained and classified environments. A key focus of this role is building and evolving automated deployment pipelines, hardened runtime environments, and repeatable infrastructure patterns that support secure and scalable operations in regulated environments. You will also support and improve Quindar deployments to air-gapped networks, driving consistency, reliability, and performance across all environments. As the organization grows, you will help define and implement best practices for availability, latency, incident response, and service-level objectives (SLOs). This role includes participation in incident response and a 24/7 on-call rotation, with a strong mandate to eliminate toil through automation and continuously improve system reliability. You will collaborate closely with frontend, backend, and platform engineers to ensure systems meet performance, reliability, and mission assurance requirements.
Technical Skills
- Strong experience with Kubernetes and containerized workloads in production environments
- Hands-on experience operating clusters in AWS EKS, Rancher, or similar platforms
- Experience supporting GovCloud, IL-enclave, or C2E environments
- Deep experience with CI/CD systems and deployment automation (GitLab preferred)
- Proficiency in Python and Infrastructure-as-Code tools (Terraform or similar)
- Experience with observability platforms (Grafana LGTM stack, Datadog, or equivalent)
- Strong understanding of distributed systems, APIs, databases, caching, and event-driven architectures
- Solid networking fundamentals (VPCs, VPNs, load balancers, TLS, service connectivity)
- Experience with Linux/Unix systems
- Familiarity with cloud security best practices, enclave boundaries, and secure system design
- Experience with identity and access management (AWS IAM, Auth0, Keycloak, ICAM pattern)
Similar jobs
AI Security Engineer
Cyber Client Service Technician - Hybrid (Remote + Customer On‑Site)
Cybersecurity Automation Engineer
Director, Information and Cybersecurity (Remote)
Cloud Computing Systems, Department of Cybersecurity - Adjunct Faculty
Senior GRC Advisor
Cyber Resilience Advisor
Security Engineer I
Senior Application Security Engineer
Application Security Engineer
