Maincode is building Australian-made AI models from the ground up. We train foundation models from scratch, design new reasoning architectures, and deploy them on state-of-the-art GPU clusters. Our data and infrastructure are entirely homegrown, from curation to large-scale training, to ensure independence, transparency, and excellence in model performance.
We’re looking for a Data Scientist to work at the intersection of large-scale data, machine learning, and AI systems. You’ll help source, analyse, and shape the datasets that train next-generation models, working closely with engineers and researchers to make data the backbone of Australia’s AI capability.
This role suits someone with strong data science fundamentals who’s comfortable working with large datasets and curious about how data powers AI training. Experience in model training or deep learning is a plus but not required; we’re happy to teach and support the right candidate.
What you’ll do
Explore, process, and analyse massive and diverse datasets; from text and structured data to code and multimodal content.
Design and implement scalable data workflows for cleaning, transforming, and validating high-volume datasets.
Build and maintain data pipelines that prepare training-ready data for large-scale AI models.
Develop tools and metrics for assessing dataset quality, diversity, and performance impact.
Collaborate with AI Researchers to align datasets with evolving model architectures and training objectives.
Support continuous improvement of data ingestion, curation, and evaluation systems.
Contribute to open discussions on data quality, ethics, and responsible dataset creation.
Who you are
Experienced in Python and familiar with data processing frameworks (e.g., Pandas, PySpark, Dask, or Ray).
Strong background in data analysis, feature engineering, and statistical modeling.
Comfortable working with large datasets (multi-terabyte scale or distributed systems).
Understanding of data quality, validation, and reproducibility principles.
Interested in or curious about machine learning, deep learning, or AI training pipelines.
Pragmatic, hands-on, and excited to learn new systems, tools, and techniques.
Motivated to help build Australian-built AI capability and world-class data infrastructure.
Why Maincode
Maincode is a small, highly technical team working at the frontier of Australian AI. We build foundation models from scratch, not just fine-tune existing ones, and the data you work on will directly shape the behaviour of cutting-edge systems.
You’ll be surrounded by people who:
Care deeply about data quality and scientific rigour.
Build systems that scale cleanly and transparently.
Enjoy experimenting, learning, and shipping fast.
Want to see Australia lead in independent AI innovation.
Top Skills
Maincode Melbourne, Victoria, AUS Office
Melbourne, VIC, Australia, 3000