Site Reliability Engineer

Luminance

Sydney, New South Wales, Australia

•

2 hours ago

•

No application

About

The Role
Luminance’s Site Reliability team combines strong problem solving, infrastructure tooling and wider DevOps practices to provide a service of Luminance’s unique software applications. The team plays a crucial role in incident response and issue resolution, swiftly addressing and resolving service interruptions to maintain the highest level of customer satisfaction. With a focus on automation, scalability, reliability and security, the team enable Luminance to ensure a performant, seamless experience for its users. The Site Reliability team is a small, dynamic team of creative engineers and work together to tackle some of Luminance’s greatest challenges, with new problems and technology areas to dig into on a regular basis.

Roles and Responsibilities
System Monitoring: Implement, manage, and develop internal monitoring tools to ensure system health and quickly detect anomalies. Respond and resolve incidents efficiently to maintain uptime.
Automation: Develop automation solutions for infrastructure management, issue resolution and deployment processes, streamlining operations and reducing manual work.
Infrastructure Management: Manage cloud infrastructure to ensure reliability and scalability, collaborating with teams to design robust solutions.
Incident Management: Conduct post-incident analysis to identify root causes, implement preventive measures, and enhance system resilience.
Security and Compliance: Maintain best security practices and compliance standards, working with security teams to address vulnerabilities proactively.
Collaboration and Communication: Partner with development and operations teams, fostering communication and promoting reliability best practices across the organization.
Masters in Computer Science, Engineering or related subject from a Go8 University
Excellent problem-solving skills, including diagnosing issues within complex systems.
Ability and desire to identify root causes of issues, and propose and implement structural improvements.
Strong communication skills and capability to perform in scenarios with urgency.
Knowledge of the design and operation of web-based software applications, based on technologies such as node.js, PostgreSQL or Elasticsearch.
Knowledge of modern infrastructure and operational tooling within cloud-based architectures, such as Linux, python, AWS, ansible, Prometheus.

Remove Ads

Similar Positions

Food & Beverage Expert

Remotive

Remote

This description is a summary of our understanding of the job ...

27 minutes ago

Home & Garden Expert

Remotive

Remote

This description is a summary of our understanding of the job ...

27 minutes ago

Electronics Expert

Remotive

Remote

This description is a summary of our understanding of the job ...

27 minutes ago

Health & Beauty Expert

Remotive

Remote

This description is a summary of our understanding of the job ...

27 minutes ago

Senior Frontend Engineer

Remotive

Remote

About Smart WorkingAt Smart W...

27 minutes ago

Get our app today

Site Reliability Engineer

Site Reliability Engineer

About

Application