Element451 Logo

Element451

Senior Platform Engineer

Posted An Hour Ago
Be an Early Applicant
Remote
Hiring Remotely in Canada
Senior level
Remote
Hiring Remotely in Canada
Senior level
Own platform reliability, security, and delivery: define SLIs/SLOs, operate observability and on-call, lead incident response, automate remediation and CI/CD, manage IaC and AWS infrastructure, ensure database durability and compliance (SOC 2, FERPA), and build developer-facing automation to scale operations.
The summary above was generated by AI

Element451 is building the AI-powered platform reshaping how colleges and universities recruit, enroll, and support their students - and the reliability of that platform is what earns the trust of the institutions that run on it. This is a rare chance to own reliability, operations, and security at the level where it actually happens: keeping a real production system healthy, fast, and safe, and building the automation that keeps it that way. If you're a broad operator who's happiest when a single week spans operations, delivery, security, infrastructure, and data - and you've done that work where things move fast and the edges go unowned - you'll feel at home here.


The Role

You'll keep Element451's platform reliable, secure, and operable - and build the delivery systems that let it scale without scaling the firefighting. This is a hands-on senior IC role, and a deliberately broad one: reliability and operations are the core, with CI/CD and delivery, security, infrastructure, and data reliability all real, recurring parts of the work. You'll partner closely with our Director of Platform Engineering, who owns the platform strategy; you own a large share of the operational and delivery execution that brings it to life.

We hold a high bar - and we give you the ownership, context, and support to meet it. The work has range and a fair amount of unpredictability; the people who thrive in it tend to want exactly that.

What You'll Own

You own the operational health of the platform - that it's available, fast, observable, and safe in production - and you build the automation that makes those properties durable rather than heroic. In practice, you:

  • Own the reliability discipline in practice - define and track SLIs and SLOs, keep the observability stack sharp (we use CloudWatch, Sentry, Papertrail, and Langsmith), and make system health and customer impact legible in real time.

  • Carry production operations day to day - participate in on-call, lead incident response, run blameless post-incident reviews, and drive issues to root cause rather than patching symptoms.

  • Treat operational toil as engineering work to eliminate - relentlessly automate remediation, sharpen alert quality, and drive down MTTD and MTTR rather than absorbing manual load.

  • Own and evolve the CI/CD and delivery platform alongside the Director of Platform Engineering - build pipelines, deployment automation, environment management, and release tooling - so shipping is routine, safe, and low-drama, with progressive delivery, automated rollback, and production validation gates as standard.

  • Build the developer-facing automation and paved roads that cut friction for the product engineering team - treating them as the platform's customer and making the reliable, secure path the easy path.

  • Be the platform's hands-on security operator - IAM and least-privilege hygiene, secrets management, threat detection and response (WAF, GuardDuty), and vulnerability triage and remediation against SLA. This is the security function in practice today; you partner with the Director of Platform Engineering on strategy and standards, and you're trusted to set the operational bar where none exists yet.

  • Keep the platform audit-ready by default - produce the infrastructure and reliability evidence that SOC 2 Type II and FERPA obligations depend on (we use Vanta), so audits are a byproduct of good operations, not a scramble.

  • Build and operate cloud infrastructure as code - AWS (ECS/Fargate, Lambda, SQS/SNS, EventBridge, S3/CloudFront, VPC) managed with Terraform, with no manual or snowflake infrastructure in any environment.

  • Plan and execute scaling ahead of product growth, and keep operational documentation current - failure modes and recovery procedures included.

  • Own the operational health of our data stores - MongoDB Atlas backups and recovery, performance tuning, and monitoring at scale - so data stays durable, performant, and recoverable.

How You'll Show Up

We hire for behavior as much as for experience. Our values describe how we work, and we look for people who already operate this way:

  • Understand the "why" before the "what." You dig for the reason behind the work and tie operational, delivery, and security decisions to real customer and business impact - rather than executing tickets on faith.

  • Own the outcome. You're on the hook for the result, not the task - reliability, security, and operability end to end - and you're genuinely unsatisfied with "it mostly works."

  • Take initiative to move work forward. You see the unowned edge and close it, surface risks early, and instinctively ask "how do we make this permanent and self-healing?" rather than waiting to be told.

  • Engage collaboratively to solve problems together. You treat the engineering team as the platform's customer, work in the open without heroics or silos, and are energized rather than rattled when a day spans an incident, a deploy, a security finding, and a database.

What You Bring
  • 7+ years across site reliability, operations, delivery, infrastructure, or platform engineering, with a track record of hands-on delivery - including real startup experience where you owned broad, shifting scope without a large team behind you.

  • A strong SRE foundation - SLI/SLO design, observability stack ownership, and incident response in a production environment with real customer impact.

  • Strong CI/CD and delivery-engineering chops - building and operating pipelines (GitHub Actions), Docker/ECR workflows, and ECS deployment automation, with progressive delivery and automated rollback - enough to co-own the delivery platform, not just consume it.

  • Security-operations depth you can own without a security team behind you - IAM governance and least-privilege, secrets management, network security and threat detection (WAF, GuardDuty), and vulnerability triage and remediation. You'll be the org's hands-on security practitioner, so the judgment to set an operational bar - not just follow one - matters here.

  • Deep, current AWS expertise - ECS/Fargate, Lambda, SQS/SNS, EventBridge, S3/CloudFront, VPC networking, IAM, and Secrets Manager - plus strong Terraform and infrastructure-as-code discipline across multi-environment systems.

  • Operational experience with MongoDB Atlas or a comparable managed database platform - backup and recovery, performance tuning, and monitoring at scale.

  • Working knowledge of compliance operations - SOC 2 Type II and FERPA - and producing audit evidence (we use Vanta) as a byproduct of good engineering rather than a separate effort.

  • Comfort operating as a high-output IC with broad domain ownership and a long execution horizon. Current familiarity with AI-assisted operations - intelligent alerting, anomaly detection, or AI-augmented incident response - is a plus.

Our Values
  • Impactful not Immediate - We prioritize and invest in initiatives that will be most impactful.

  • Progress before Perfection - We are action-oriented people. We are empowered to make decisions and achieve our goals.

  • Learners before Masters - We are curious and humble people who strive to constantly improve.

  • Together not Alone - We rally behind each other and pitch in to support the greater whole.

  • Customer Success not Support - We solve partner goals and prioritize their success.

Perks & Benefits

We invest in our team the same way we invest in our product - thoughtfully, and for the long term.

  • Competitive pay and full benefits - a salary calibrated to the seniority of the role, plus comprehensive medical & dental coverage for you and your family.

  • Truly remote - We're built remote-first, not remote-tolerant.

  • Time to recharge - flexible PTO, paid company holidays, tenure milestones that reward your commitment, and your birthday off.

  • Work that matters - every release helps students find their path to college and succeed once they get there. Your craft has a real human impact.

Our Interview Process

Our process is rigorous and designed to be real signal - for you and for us. Expect live, interactive technical assessment with our engineers - real systems and real operational scenarios, not take-home busywork or trivia. You'll show how you reason about reliability, security, and trade-offs under pressure, and how you apply AI tooling in the work. We move quickly for the right people and give you a clear view of the bar before you accept.

Similar Jobs

Yesterday
Remote
Senior level
Senior level
Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy
Lead ownership of Dropbox's Windows endpoint platform: manage Intune-based device lifecycle, drive security/compliance improvements, automate operations, mentor engineers, and coordinate cross-functional platform rollouts and reliability work.
Top Skills: AutomoxChocolateyFleetdmGitopsInfrastructure-As-CodeIntuneOsqueryPatch My PcWindows
Yesterday
Easy Apply
Remote
Easy Apply
Senior level
Senior level
Cloud • Security • Software • Cybersecurity • Automation
Ownership of backend features for Agentic Tools: design and implement GraphQL/REST APIs, build secure scalable Ruby on Rails services, improve RSpec automated tests, collaborate across product and AI teams, participate in Tier 2 on-call, and shape architecture for AI agent interactions with GitLab.
Top Skills: Gitlab McpGraphQLPythonRestRspecRuby On RailsVue
2 Hours Ago
Remote
Senior level
Senior level
Big Data • Security • Software • Analytics • Cybersecurity
Design and build scalable Lakehouse/Medallion data infrastructure and production-grade ETL/ELT pipelines on Azure/Databricks. Partner with stakeholders and data scientists to enable analytics and AI, optimize performance and cost, automate workflows, and mentor junior engineers.
Top Skills: AzureCi/CdDatabricksDelta LakeDockerFastapiGitInfrastructure-As-CodeKubernetesPandasPolarsPower BIPydanticPythonRest ApisSQLTableau

What you need to know about the Melbourne Tech Scene

Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account