Boomi Logo

Boomi

Software Principal Engineer - AI Quality

Posted An Hour Ago
Be an Early Applicant
Remote
Hiring Remotely in India
Expert/Leader
Remote
Hiring Remotely in India
Expert/Leader
As a Principal AI Quality Lead, you will establish automated evaluation frameworks and quality standards for AI systems, ensuring reliability and safety. You'll design pipelines, metrics, and methods for continuous evaluation and collaborate with engineering teams to embed quality practices in the development lifecycle.
The summary above was generated by AI

About Boomi and What Makes Us Special

Are you ready to work at a fast-growing company where you can make a difference? Boomi aims to make the world a better place by connecting everyone to everything, anywhere. Our award-winning, intelligent integration and automation platform helps organizations power the future of business. At Boomi, you’ll work with world-class people and industry-leading technology. We hire trailblazers with an entrepreneurial spirit who can solve challenging problems, make a real impact, and want to be part of building something big. If this sounds like a good fit for you, check out boomi.com  or visit our Boomi Careers page to learn more.

How You'll Make An Impact
 

As a Principal AI Quality Lead, you will define and drive the quality engineering strategy for our production Generative AI and Agentic systems. You will establish automated evaluation frameworks, quality standards, and testing infrastructure that ensure our AI agents operate reliably, safely, and efficiently at scale. This is a high-impact technical leadership role where you'll build the foundation for trustworthy AI deployment, bridging AI/ML engineering expertise with quality engineering discipline. You'll architect the systems and practices that transform our approach from manual spot-checking to continuous, automated evaluation of complex agentic workflows.
 

What You Will Do

Quality Infrastructure & Automation

  • Architect and build automated evaluation frameworks for agentic workflows that assess behavior across effectiveness, efficiency, robustness, and safety dimensions.
  • Design continuous evaluation pipelines that automatically test multi-step reasoning, tool selection patterns, error handling, and behavioral regressions across diverse scenarios.
  • Establish observability requirements for AI agents including structured logging, trajectory tracing, and metrics collection for reasoning steps, tool calls, and execution paths.
  • Build regression detection systems that identify quality degradation when prompts, models, tools, or system components change.
  • Create synthetic test data generation pipelines and curated evaluation datasets that cover edge cases, adversarial scenarios, and real-world variability.
  • Define comprehensive quality standards and evaluation methodologies specifically designed for agentic AI systems and LLM-based applications.
  • Establish key quality metrics, SLIs, and SLOs for agent behavior including task completion rates, reasoning efficiency, cost per resolution, and safety compliance.
  • Create quality gates and acceptance criteria that balance speed-to-production with reliability requirements.
  • Develop responsible AI testing practices including bias detection, fairness evaluation, safety guardrails validation, and alignment verification.
  • Build tooling and frameworks that enable both automated evaluation at scale and targeted diagnostic testing for failure investigation.
  • Establish benchmark suites and golden datasets for continuous quality assessment across agent capabilities.
  • Architect evaluation approaches for complex AI behaviors including chain-of-thought reasoning, tool orchestration, multi-turn conversations, and context management.
  • Establish model evaluation practices including prompt testing, output validation, semantic correctness assessment, and hallucination detection.
  • Partner closely with AI Engineering, Platform, and Product teams to embed quality practices into the development lifecycle from design through deployment.
  • Serve as the technical authority on AI quality, influencing architectural decisions and advocating for quality as a foundational pillar.
  • Collaborate with Data Science and ML Engineering teams to align evaluation methodologies with model development practices.
  • Communicate quality insights, risk assessments, and recommendations clearly to technical and non-technical stakeholders.
  • Build cross-functional alignment on quality standards, evaluation criteria, and production readiness requirements.
  • Mentor and develop AI quality engineers, elevating team capabilities in evaluation frameworks, AI/ML concepts, and automation practices.
  • Foster a culture of quality-first thinking, continuous improvement, and data-driven decision making.
  • Build the organizational capability for AI quality as a core competency, not a reactive testing phase.
     
The Experience You Bring
  • 7+ years of experience in AI/ML engineering, data science, or ML quality/evaluation roles with deep technical expertise in model development and evaluation.
  • Experience in LLM evaluation, Generative AI quality, or agentic system testing.
  • Strong understanding of transformer architectures, prompt engineering, retrieval-augmented generation (RAG), and agentic frameworks (ReAct, chain-of-thought, tool use patterns).
  • Deep knowledge of LLM failure modes including hallucinations, context limitations, prompt sensitivity, reasoning errors, and tool misuse.
  • Hands-on experience with major LLM platforms (OpenAI, Anthropic, AWS Bedrock, Azure OpenAI) and their evaluation capabilities.
  • Proven track record building automated evaluation systems for AI/ML models or agentic workflows at scale.
  • Strong experience with ML evaluation frameworks and tools (MLflow, Weights & Biases, LangSmith, custom evaluation pipelines).
  • Expertise in designing evaluation metrics for non-deterministic systems beyond simple accuracy measures.
  • Experience with A/B testing, experimentation frameworks, and statistical analysis for model comparison.
  • Background in observability, instrumentation, and monitoring for production AI systems.
  • Advanced programming skills in Python with experience building production-quality evaluation frameworks and automation tooling.
  • Strong understanding of software architecture, APIs, distributed systems, and data pipelines.
  • Experience with test automation frameworks (pytest, unittest) and CI/CD integration.
  • Familiarity with infrastructure as code, containerization, and cloud platforms (AWS, Azure, GCP).
  • Ability to write clean, maintainable, well-documented code and technical specifications.
  • Strong analytical skills with ability to design statistically sound evaluation methodologies.
  • Experience working with large-scale datasets, data quality assessment, and synthetic data generation.
  • Understanding of experimental design, hypothesis testing, and confidence intervals for evaluation results.
  • Ability to translate business requirements into measurable quality metrics and acceptance criteria.
     

Communication & Collaboration

  • Excellent written and verbal communication skills with ability to explain complex technical concepts clearly.
  • Experience presenting technical recommendations and quality assessments to senior leadership.
  • Proven ability to build consensus and drive adoption of new practices across engineering organizations.
  • Strong documentation skills for creating standards, runbooks, evaluation reports, and architectural specifications.
     

Learning & Innovation Mindset

  • Self-directed learner who stays current with rapidly evolving AI quality practices and industry research.
  • Comfortable operating in ambiguity and building new capabilities from the ground up.
     
Education & Experience
  • Master's in Computer Science, Machine Learning, Data Science, Statistics, or related field (or equivalent experience).
  • 7+ years of professional experience in AI/ML & Software engineering, data science, ML operations, or ML quality roles.
  • Hands-on experience with LLM evaluation, Generative AI testing, or agentic system quality.
  • Demonstrated experience building automated evaluation frameworks and quality infrastructure for production AI systems.
     
What Sets You Apart
  • Published research or contributions in AI evaluation, LLM quality, agent benchmarking, or responsible AI.
  • Experience building AI quality practices from the ground up in production environments.
  • Deep expertise in agentic AI architectures including multi-agent systems, tool use, and autonomous decision-making.
  • Background in both ML engineering/research and quality engineering/evaluation roles.
  • Contributions to open-source AI evaluation frameworks or benchmarking tools.
  • Hands-on experience fine-tuning or developing LLMs/SLMs with corresponding evaluation methodologies.
  • Domain expertise in specific agentic AI applications (customer support, process automation, code generation, etc.).

Be Bold. Be You. Be Boomi. We take pride in our culture and core values and are committed to being a place where everyone can be their true, authentic self. Our team members are our most valuable resources, and we look for and encourage diversity in backgrounds, thoughts, life experiences, knowledge, and capabilities.  

All employment decisions are based on business needs, job requirements, and individual qualifications.

Boomi strives to create an inclusive and accessible environment for candidates and employees. If you need accommodation during the application or interview process, please submit a request to [email protected]. This inbox is strictly for accommodations, please do not send resumes or general inquiries. 

Top Skills

AWS
Azure
GCP
Langsmith
Mlflow
Pytest
Python
Unittest
Weights & Biases

Similar Jobs at Boomi

10 Hours Ago
Remote
India
Expert/Leader
Expert/Leader
Cloud • Information Technology • Productivity • Software • Automation
The Technical Architect will lead the development of enterprise solution architectures, working with clients to design and implement complex integration solutions.
Top Skills: AWSAzureBoomi AtomsphereDockerGCPGroovyJavaJavaScriptKubernetesNoSQLSQL
10 Hours Ago
Remote
India
Senior level
Senior level
Cloud • Information Technology • Productivity • Software • Automation
The Software Principal Engineer (Frontend) will design scalable frontend applications, lead development with modern frameworks, mentor engineers, and collaborate on DevOps practices.
Top Skills: AWSCSSHTMLJavaScriptReact
2 Days Ago
Remote
India
Senior level
Senior level
Cloud • Information Technology • Productivity • Software • Automation
The Presales Systems Developer manages the presales platform, develops operational tools, supports data flows, and ensures reporting accuracy for the sales team.
Top Skills: Boomi IntegrationGongGoogle WorkspaceJIRAPower BIRest ApisSalesforceSlackSoap Apis

What you need to know about the Melbourne Tech Scene

Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account