Optum

Observability Engineer

Reposted 3 Hours Ago

Be an Early Applicant

In-Office

Chennai, Tamil Nadu

Senior level

In-Office

Chennai, Tamil Nadu

Senior level

Design, implement, and manage Kubernetes clusters, automation with Ansible, and CI/CD pipelines. Deploy and monitor machine learning models and cloud infrastructure, optimize performance and reliability, troubleshoot issues, enforce security best practices, and document DevOps processes across M&A partners.

The summary above was generated by AI

Requisition Number: 2341973
Optum is a global organization that delivers care, aided by technology to help millions of people live healthier lives. The work you do with our team will directly improve health outcomes by connecting people with the care, pharmacy benefits, data and resources they need to feel their best. Here, you will find a culture guided by inclusion, talented peers, comprehensive benefits and career development opportunities. Come make an impact on the communities we serve as you help us advance health optimization on a global scale. Join us to start Caring. Connecting. Growing together.
DevOps / Observability Engineer
(Monitoring, Alerting & Data-Driven Operations)
Job Summary
We are seeking a DevOps / Observability Engineer with deep expertise in monitoring, alerting, metrics, and logging systems to help design, operate, and evolve our observability platforms across multiple environments, including M&A partner infrastructures.
This role is not a pure CI/CD or cloud automation position. Instead, it is focused on building robust, scalable, and intelligent monitoring and alerting systems, primarily using open-source and custom-built ("home-made") stacks.
The ideal candidate is passionate about metrics, signals, and system behavior, enjoys working closely to the data, and is interested in forecasting, anomaly detection, and algorithmic approaches to infrastructure monitoring. Experience with MLOps and deploying data-driven models in the cloud is a strong plus.
You will work closely with platform, operations, and data teams to ensure high reliability, actionable alerting, and long-term observability maturity.
Key Responsibilities
Observability & Monitoring (Primary Focus)

Design, implement, and operate monitoring and alerting platforms across multiple internal and M&A partner environments.
Build and maintain metrics pipelines using tools such as Prometheus, Alertmanager, Grafana, and VictoriaMetrics (or similar time-series databases).
Develop high-quality alerting strategies (SLOs, SLIs, burn rates, anomaly detection) to reduce noise and improve signal quality.
Own logging architectures, including ingestion, retention, querying, and correlation with metrics and traces.
Work extensively with open-source observability tooling and contribute to or extend "home-made" solutions when off-the-shelf tools are insufficient.

Data, Forecasting & Intelligent Operations

Apply forecasting techniques and algorithms to capacity planning, trend analysis, and proactive alerting.
Collaborate with data scientists and ML engineers on data-driven monitoring, anomaly detection, or predictive reliability use cases.
Participate in MLOps workflows, including deploying, monitoring, and operating ML models in production environments.

Platform & Infrastructure (Supporting Focus)

Design and operate Kubernetes-based platforms, with a strong emphasis on observability, reliability, and performance.
Support infrastructure automation using Ansible and other configuration management tools.
Troubleshoot complex system issues across metrics, logs, Kubernetes, and underlying infrastructure.
Ensure security and operational best practices are applied across monitoring and infrastructure stacks.
Document architectures, operational practices, and observability standards.

Required Qualifications

Bachelor's degree in Computer Science, Engineering, or equivalent practical experience.
7+ years of experience in DevOps, SRE, Platform Engineering, or Observability-focused roles.
Strong, hands-on expertise in monitoring and alerting systems, including:

Prometheus (or compatible ecosystems)
Grafana
Alertmanager
Time-series databases (VictoriaMetrics strongly preferred)

Solid experience with logging systems and log/metric correlation.
Deep familiarity with open-source tooling and building/customizing internal platforms.
Strong Kubernetes experience, including troubleshooting production clusters.
Experience with automation tools such as Ansible.
Ability to reason about systems using metrics, data, and trends, not just dashboards.

Excellent problem-solving, communication, and collaboration skills
Preferred Qualifications:

Experience with containerization technologies such as Docker/Podman
Experience with "Infrastructure as Code" (IaC) tools such as Terraform
Familiarity with monitoring and logging tools such as Prometheus, Grafana, or ELK stack
Knowledge of scripting languages such as Python, Bash, PowerShell or similar

At UnitedHealth Group, our mission is to help people live healthier lives and make the health system work better for everyone. We believe everyone - of every race, gender, sexuality, age, location and income - deserves the opportunity to live their healthiest life. Today, however, there are still far too many barriers to good health which are disproportionately experienced by people of color, historically marginalized groups and those with lower incomes. We are committed to mitigating our impact on the environment and enabling and delivering equitable care that addresses health disparities and improves health outcomes - an enterprise priority reflected in our mission.
#NJP #NIC

Top Skills

Ansible

AWS

Azure

Bash

Docker

Elk Stack

Gitlab Ci

GCP

Grafana

Jenkins

Kubernetes

Podman

Powershell

Prometheus

Python

Terraform

Similar Jobs at Optum

Optum

Trainer

3 Hours Ago

In-Office

Entry level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The Trainer is responsible for implementing and managing training programs for employees in healthcare revenue cycle management, focusing on best practices and achieving performance targets through various training methods.

Top Skills: BillingCpt04HcpcsHealthcare SystemsIcd-10Revenue Cycle ManagementUb04 Claim Form

Optum

Machine Learning Engineer

Yesterday

In-Office

Senior level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

Lead the design and implementation of systems to process clinical data, mentor engineers, ensure product quality, and engage with customers to deliver AI products.

Top Skills: AWSAzureCi/CdGitGCPPython

Optum

Technical Project Manager

Yesterday

In-Office

Mid level

Artificial Intelligence • Big Data • Healthtech • Information Technology • Machine Learning • Software • Analytics

The role involves managing projects, coaching teams on Agile practices, planning sprints, identifying risks, and improving project communication.

Top Skills: AgileJIRAMS OfficeRallySafeScrum

What you need to know about the Melbourne Tech Scene

Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.