Maincode Logo

Maincode

Signal Engineer

Reposted Yesterday
Be an Early Applicant
In-Office
Melbourne, Victoria, AUS
Entry level
In-Office
Melbourne, Victoria, AUS
Entry level
The Signal Engineer will develop data pipelines for cleaning and processing large datasets for training a language model, ensuring high quality data through editorial judgment and engineering expertise.
The summary above was generated by AI
About the role

Matilda is Australia's LLM. What ends up in the corpus is what the model learns, so the quality of the data sets the ceiling on the quality of the model.

We're hiring a Signal Engineer to own that ceiling. You will build the pipelines that turn massive, messy, raw data into the dataset Matilda trains on. The work is part engineering, part editorial judgment, done in code.

A lot of the real gains in frontier models come from the data, and most of that work is underinvested in across the field. It is one of the highest-leverage places you can spend your time as an engineer.


What you'll work on

- Pipelines that ingest, clean, dedupe, filter, and score training data at TB to PB scale

- Quality classifiers and heuristics that separate useful data from the rest

- Dataset mixture design and experiments on what actually improves the model

- Tools to explore, sample, and audit what's in the corpus

- Close work with researchers and training engineers so data choices connect to model behaviour

What we're looking for

- Strong engineer. Python, data tooling, distributed processing, clean pipelines.

- High attention to detail. Small errors compound fast at this scale.

- Taste and judgment about what good training data looks like.

- Comfort working with very large, very messy datasets.

- Curiosity about how data shapes model behaviour.

- High learning velocity. You don't need a PhD or prior LLM experience.

Nice to have

- Experience with web-scale corpora or pretraining data pipelines

- Experience working with unstructured text data

- Familiarity with distributed data frameworks (Spark, Ray, or similar)

- Exposure to deduplication, quality classification, or tokenisation

Note

Full-time role based in Melbourne, working closely with our in-person team. At this time we are not able to offer visa sponsorship, so applicants must have existing and unrestricted work rights in Australia.

HQ

Maincode Melbourne, Victoria, AUS Office

Melbourne, VIC, Australia, 3000

Similar Jobs

An Hour Ago
Remote or Hybrid
Australia
Senior level
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Build and maintain shared Go libraries and platform capabilities for data access, messaging, service communication, observability, and resilience. Design adoptable APIs, multi-cloud abstractions, and resilience primitives. Own security posture, operate libraries (on-call), consult engineering teams, participate in architectural governance, and collaborate with Data Services, Infrastructure, SRE, and Observability teams.
Top Skills: Data StoresDistributed TracingGoMessage BrokersMetrics PipelinesMulti-CloudStructured Logging
An Hour Ago
Remote or Hybrid
Australia
Senior level
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Build and own shared Go libraries and platform capabilities for data access, messaging, service communication, observability, resilience, and multi-cloud portability. Design adoptable APIs, implement resilience primitives, ensure security of dependencies, participate in architecture governance, consult with teams, operate libraries (on-call), and partner with Data Services, Infrastructure, SRE, and Observability teams.
Top Skills: Circuit BreakersData StoresDistributed TracingFeature ManagementGoLoad SheddingMessage BrokersMetrics PipelinesMulti-CloudRate LimitingResilience PatternsSdksService CommunicationStructured Logging
An Hour Ago
Remote or Hybrid
Melbourne, Victoria, AUS
Mid level
Mid level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Manage regional training coordinators and trainers to deliver consistent onboarding and training for Falcon Complete analysts. Oversee training progression, maintain and update curriculum, provide regional system and real-time training support, partner with Curriculum Developers, monitor training metrics to identify gaps, and support documentation, JIRA tickets, and hands-on facilitation as needed.
Top Skills: Articulate 360DoceboJIRARise

What you need to know about the Melbourne Tech Scene

Home to 650 biotech companies, 10 major research institutes and nine universities, Melbourne is among one of the top cities for biotech. In fact, some of the greatest medical advancements were conceptualized and developed here, including Symex Lab's "lab-on-a-chip" solution that monitors hormones to predict ovulation for conception, and Denteric's vaccine for periodontal gum disease. Yet, the thousands of people working in the city's healthtech sector are just getting started, to say nothing of the tech advancements across all other sectors.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account