About the Role

Site Reliability Engineer (Space Communications)

About Northwood:

Northwood is on a mission to transform connectivity between earth and space and bring the benefits of space to the masses through innovations in space communications technologies. If you like building quickly and seeing your work deployed in locations around the globe with real impact, we want you at Northwood.

Role:

Northwood is looking for an Site Reliability Engineer to help build the monitoring and reliability systems that keep satellites connected to Earth. As we rapidly scale our ground station network across multiple continents, you'll build the observability infrastructure that ensures our space communications systems operate 24/7 for customers ranging from commercial satellite operators to national security missions.

This is a high-growth role where you'll evolve from building core monitoring systems to potentially leading infrastructure teams and architecting global-scale reliability platforms. You'll work directly with our founding engineering team to establish the monitoring, alerting, and deployment practices that will scale with us from startup to enterprise. If you're excited about space technology and want to build infrastructure that directly supports mission-critical satellite operations, this role offers that opportunity.

Responsibilities:

• Build and maintain observability stack (Grafana, Prometheus, Loki, Vector, VictoriaMetrics) that monitors ground stations, satellite communication systems, and cloud infrastructure across multiple AWS regions

• Support CI/CD pipelines using GitLab and ArgoCD, partnering with development teams to ensure reliable deployments of mission-critical software

• Develop and maintain AWS infrastructure using Terraform, with focus on multi-region reliability and automated scaling for ground station operations

• Deploy and manage Kubernetes applications with Helm, ensuring both developer productivity and system uptime for satellite communication services

• Establish monitoring strategies, alerting frameworks, and incident response procedures for infrastructure supporting real-time satellite communications

• Participate in on-call rotation and lead post-incident reviews to continuously improve system reliability

 Basic Qualifications

• 2-5 years of production infrastructure and monitoring experience with measurable reliability improvements

• Strong experience with Kubernetes, Docker, and container orchestration in production environments

• Hands-on experience with CI/CD tools and infrastructure as code (Terraform preferred)

• AWS experience with multi-service deployments and Python programming skills for automation

• Self-directed work style with ability to own projects from conception to production in fast-moving environments

• Understanding of SRE principles, SLOs/SLIs, and systematic approaches to system reliability

Preferred Qualifications

• Experience with observability tools (Vector, Loki, Grafana, Prometheus) in production environments

• Familiarity with HashiCorp Vault, Okta, or similar identity/secrets management systems

• Previous experience scaling infrastructure at high-growth companies (startup to 100+ employees)

• AWS certification or demonstrated expertise with advanced cloud networking and security

• Linux system administration experience and networking fundamentals

• Interest in aerospace, telecommunications, or mission-critical systems

Additional Information:

To conform to U.S. Government space technology export regulations, including the International Traffic in Arms Regulations (ITAR) you must be a U.S. citizen, lawful permanent resident of the U.S., protected individual as defined by 8 U.S.C. 1324b(a)(3), or eligible to obtain the required authorizations from the U.S. Department of State.

Northwood is an Equal Opportunity Employer; employment with Northwood is governed on the basis of merit, competence and qualifications and will not be influenced in any manner by race, color, religion, gender, national origin/ethnicity, veteran status, disability status, age, sexual orientation, gender identity, marital status, mental or physical disability or any other legally protected status.

About the Company

Northwood Space

Northwood Space is a modern space infrastructure company specializing in the ground segment of the space industry. Their core mission is to build advanced technology that keeps the world connected through space, focusing on the development of scalable, resilient, and manufacturable ground gateways. These gateways are designed for volume manufacturing and iterative development, allowing for proactive deployments and dynamic capacity provisioning. Northwood’s horizontally scalable architecture ensures that their infrastructure can adapt to growing demands, while their emphasis on eliminating single points of failure enhances system resilience and reliability.

Potential employees might appreciate Northwood Space’s forward-thinking approach to space infrastructure, where innovation and adaptability are at the forefront. The company’s commitment to proactive development and robust engineering creates an environment where team members can contribute to cutting-edge solutions that have a global impact. Working at Northwood Space means being part of a mission-driven team dedicated to building the backbone of space connectivity, offering opportunities for growth, collaboration, and making a tangible difference in the rapidly evolving space technology sector.
More roles from
Northwood Space
Department
Location
Northwood Space

Site Reliability Engineer (Space Communications)

Type
full-time
Department
Software
Location
Torrance, CA
Salary
Apply Now