Site Reliability Engineer

Site Reliability Engineer

Site Reliability Engineer

Luminance

Sydney, New South Wales, Australia

2 hours ago

No application

About

  • The Role
  • Luminance’s Site Reliability team combines strong problem solving, infrastructure tooling and wider DevOps practices to provide a service of Luminance’s unique software applications. The team plays a crucial role in incident response and issue resolution, swiftly addressing and resolving service interruptions to maintain the highest level of customer satisfaction. With a focus on automation, scalability, reliability and security, the team enable Luminance to ensure a performant, seamless experience for its users. The Site Reliability team is a small, dynamic team of creative engineers and work together to tackle some of Luminance’s greatest challenges, with new problems and technology areas to dig into on a regular basis.
  • Roles and Responsibilities
  • System Monitoring: Implement, manage, and develop internal monitoring tools to ensure system health and quickly detect anomalies. Respond and resolve incidents efficiently to maintain uptime.
  • Automation: Develop automation solutions for infrastructure management, issue resolution and deployment processes, streamlining operations and reducing manual work.
  • Infrastructure Management: Manage cloud infrastructure to ensure reliability and scalability, collaborating with teams to design robust solutions.
  • Incident Management: Conduct post-incident analysis to identify root causes, implement preventive measures, and enhance system resilience.
  • Security and Compliance: Maintain best security practices and compliance standards, working with security teams to address vulnerabilities proactively.
  • Collaboration and Communication: Partner with development and operations teams, fostering communication and promoting reliability best practices across the organization.
  • Masters in Computer Science, Engineering or related subject from a Go8 University
  • Excellent problem-solving skills, including diagnosing issues within complex systems.
  • Ability and desire to identify root causes of issues, and propose and implement structural improvements.
  • Strong communication skills and capability to perform in scenarios with urgency.
  • Knowledge of the design and operation of web-based software applications, based on technologies such as node.js, PostgreSQL or Elasticsearch.
  • Knowledge of modern infrastructure and operational tooling within cloud-based architectures, such as Linux, python, AWS, ansible, Prometheus.