Khoros

Site Reliability Engineer III

Posted 9 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in India

Senior level

Remote

Hiring Remotely in India

Senior level

As a Site Reliability Engineer III, you will manage Khoros applications and ensure infrastructure reliability while collaborating with development teams. Responsibilities include monitoring, troubleshooting, automating processes, participating in on-call rotations, and continuously improving systems. You will handle incident root causes, document procedures, and support a collaborative team environment.

The summary above was generated by AI

At Khoros, our passion is to help the world’s best brands create customers for life. We build products we’re proud of, and we’re passionate about customer success. As part of the Vista Equity family, you’ll receive best-in-class development opportunities and the ability to work with global brand customers like Samsung, HP, Sony, and Visa.

We are seeking to recruit a Site Reliability Engineer III within our Mission Critical Support Team to support our infrastructure, production data, and applications. This role, based in Bangalore, provides support to global locations. As part of this position, you will manage critical Khoros applications, ensuring the reliability, scalability, and performance of our infrastructure and applications. Also, you will be collaborating closely with development teams, you will design, build, and maintain highly available systems capable of handling increasing user traffic and demand. The role requires close coordination with teams across application development, networks, security, management systems, storage, and databases. This senior-level position is specialized, demanding exceptional technical troubleshooting skills and playing a key role in problem resolution.

Responsibilities :

Manage environments on the Cloud.
Monitor, troubleshoot, and resolve issues related to infrastructure, applications, and services.
Monitor availability and maintain the systems in good health.
Implement automation tools and processes to improve efficiency and reliability.
Participate in on-call rotation and respond to incidents promptly.
Continuously evaluate and improve our systems and processes to enhance reliability and performance.
Document runbooks and procedures.
Work closely with 1st Level support groups as well as Development groups.
To follow departmental change management procedures in defining, planning, and implementing change so that service disruption is minimized and adherence to Service Level Agreements is ensured.
Perform the Incident root cause analysis.
Have the ability to run with projects/issues solo and work in a team environment.
Be a Team Player – work in a collaborative team-oriented environment, share information, respect diverse ideas, and interact with customers and, partner with cross-functional and remote teams.
Be Curious & Innovative – continuously update yourself with next-generation technology and development tools, and contribute to process development practices. Evaluate new technologies and software products to determine the feasibility and desirability of incorporating capabilities within the company's products.
Be Agile – with a strong sense of urgency and a desire to work in a fast-paced, dynamic environment to deliver solutions against strict timelines.

Requirements:

4+ years experience as an SRE in fast-paced and high-traffic environments.
Experience deploying and maintaining applications in any one of the clouds (AWS- must have, AZURE/ GCP- good to have)
Working knowledge of Linux and Windows operating systems
Working knowledge with any of the scripting languages - Shell, bash, python, PowerShell
Understanding of containerization and orchestration technologies (e.g., Docker, Kubernetes).
Working knowledge with Jenkins, Ansible, Terraform, and ArgoCD (good to have)
Administration of databases (MS SQL, MongoDB, etc)
Extensive experience with some monitoring, logging, and observability tools ( Sumo, DD, AWS CloudWatch, AWS X-Ray, New Relic, Splunk, etc.)
Ability to debug issues and solve problems
Excellent problem-solving and communication skills.
Ability to work independently and collaborate effectively in a team environment.
Familiarity with agile development methodologies is a plus.

About Khoros

The Khoros platform connects every facet of customer engagement, including digital contact centers, messaging, chat, online brand communities, CX analytics, and social media management so brands can listen, respond, and act on customer conversations- creating deep relationships and fostering brand loyalty and advocacy.

Khoros offers a great working environment and competitive compensation and benefits packages. We're looking for fast-thinking, innovative, passionate team players who enjoy brainstorming new ideas, working with the best and brightest in the social media software industry.

Our Core Values

Accountability - We embrace an ownership mentality

Customer-Centricity - We are obsessed with achieving customer value

Agility - We move with urgency and purpose

Top Skills

Ansible

Argocd

AWS

Azure

Bash

Docker

GCP

Jenkins

Kubernetes

Linux

MongoDB

Ms Sql

Powershell

Python

Shell

Terraform

Windows

Similar Jobs

Rackspace Technology

Site Reliability Engineer III-IN

2 Days Ago

Remote

India

Senior level

Cloud • Information Technology • Software

As a Site Reliability Engineer at Rackspace, you will implement observability solutions, build scalable systems, and develop monitoring tools. You will collaborate with development teams to ensure reliability and performance while identifying performance bottlenecks and resolving service issues.

Top Skills: PerlPHPPythonRuby

Rackspace Technology

Site Reliability Engineer / Observability Engineer

15 Hours Ago

Remote

India

Senior level

Cloud • Information Technology • Software

The Site Reliability Engineer will implement observability solutions, develop monitoring tools, and gather system metrics. Collaboration with development teams is essential to ensure reliability and performance standards, while also identifying and resolving system issues.

Top Skills: PerlPHPPythonRuby

Red Hat

Site Reliability Engineer - OpenShift

19 Hours Ago

Remote

India

Junior

Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation

As a Site Reliability Engineer, you will develop and operate the Red Hat OpenShift Managed Cloud platform, ensuring service reliability and automating management tasks. You'll work on large-scale distributed systems, contribute to codebases, and participate in a geographically distributed on-call rotation for production support.

Top Skills: GoPython

What you need to know about the Melbourne Tech Scene

Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.