The Senior Site Reliability Engineer will manage cloud infrastructure, improve reliability, enhance security, and collaborate with teams to design reliable systems.
Your Career, our Future—Together.
What You'll Do
What You'll Bring
Workplace & Compensation
Let's Start the Conversation
Ready to join something big? At SoundHound AI, we bring voice, generative, and conversational AI together to transform how people interact with products and services. From voice-enabled vehicles to food ordering and customer support, our multilingual, omnichannel technology already impacts hundreds of millions worldwide.
The OpportunityThis is a high-ownership role with direct influence over infrastructure decisions. The team has a clear roadmap focused on improving reliability, security posture, and operational maturity. The Senior Site Reliability Engineer helps build first-class infrastructure to deliver our best-in-class technology to the world. The infrastructure is large and complex, running in the cloud and on Kubernetes, so there's no shortage of interesting problems.
- Build software and systems for cloud infrastructure management and automation (Terraform, Ansible, Oracle Cloud, GCP)
- Participate in developing frameworks for application deployment, customization, and upgrades (Kubernetes, ArgoCD, Vault, Jenkins)
- Ensure application and infrastructure security complies with ISO 27001 / SOX / PCI
- Improve observability, implement and measure key metrics, and define and enforce SLOs/SLAs (Prometheus, Grafana, ELK)
- Collaborate with engineering, quality engineering, and product management to architect and build highly available, reliable, and secure systems
- 8 years of experience working with cloud services at scale in a high-volume customer-facing environment with a Bachelor's degree in Computer Science or equivalent
- Willing to participate in on-call rotation
- Vast experience working in Linux environments, security, and networking with Python, Go, or Bash
- Very experienced with monitoring and alerting tools such as Prometheus, Grafana, ELK stack, and PagerDuty
- Experience with deployments in cloud technologies and architectures, CI/CD tools, and configuration management such as Ansible, Terraform, and Kubernetes
- Proficient with a wide range of relevant server-side technologies such as Consul, Vault, Kafka, MongoDB, PostgreSQL, MySQL
- Pragmatic, problem-solving approach when designing and implementing solutions
This role is available throughout Canada. Employees within a 100-kilometer radius of our Toronto office are expected to work from the office on three pre-scheduled “core days” each month to encourage cross-team connection and in-person collaboration.
Compensation includes salary, equity, comprehensive healthcare, paid time off, and other benefits. Our recruiting team will provide a specific salary range based on location and years of experience.
#LI-MQ1 #LI-REMOTE
Join SoundHound AI and collaborate with colleagues worldwide who are shaping the future of voice AI. Guided by our values—supportive, open, undaunted, nimble, and determined to win—we strive to build breakthrough AI experiences together.
We provide reasonable accommodations for individuals with disabilities throughout the hiring process and employment. To view our job applicant privacy policy, please visit https://static.soundhound.com/corpus/ta/applicantprivacynotice.html.
Discover more about our philosophy, benefits, and culture at https://www.soundhound.com/careers.
***Please beware of agency recruiters falsely stating that they represent SoundHound AI on job posts. Our job post above will note if we are utilizing a specific agency to assist with the search. Our recruiters use @soundhound.com email addresses exclusively.
Similar Jobs
Software
Operate and maintain production AWS/EKS Kubernetes clusters; design and ship infrastructure-as-code with Terraform; manage Helm charts and ArgoCD GitOps for multi-region SaaS; maintain observability (Grafana, alerting, logs); improve CI/CD pipelines; remediate container and infrastructure CVEs; support compliance (FedRAMP/SOC2/NIST); create runbooks and lead incident response and post-incident reviews.
Top Skills:
Amazon EksArgocdAWSCi/CdClaudeDockerGitopsGrafanaHelmKubernetesTerraform
Internet of Things
Operate and evolve an EKS-based Kubernetes platform, design CI/CD pipelines (GitHub Actions, OIDC), maintain infra-as-code (Pulumi/Terraform/OpenTofu) across AWS accounts, run observability stack, enforce security best practices, diagnose incidents and lead postmortems, participate in on-call rotation, and produce runbooks and documentation.
Top Skills:
Amazon EksAWSAws IamAws Secrets ManagerExternal Secrets OperatorGithub ActionsGrafanaKubernetesOidcOpentofuPulumiTerraformVectorVictorialogsVictoriametrics
Artificial Intelligence • Information Technology • Software • Database
As a Site Reliability Engineer, you will design, implement, and maintain scalable infrastructure, ensure system reliability, automate processes, and collaborate with engineering teams.
Top Skills:
DockerElk StackGoGrafanaJavaKubernetesNode.jsPrometheusPulumiPythonRubyTerraform
What you need to know about the Melbourne Tech Scene
Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.



