You are comfortable writing software to automate API-driven tasks at scale. SRE's use Python and Go regularly but are also encouraged to contribute to the product codebase in Java, Scala, and Python.

At least three years of experience using a public Cloud; AWS, GCP, Azure, SoftLayer or OpenStack

You have used Ansible, Puppet, Chef or another config management suite, know where it's broken, and open to trying new alternatives

You have experience with project and roadmap planning

You have experience in defining SLOs and measuring the SLOs.

Able to mentor junior engineers with SRE principles, tools, execution.

What We Are Looking For

A healthy knowledge of Linux (have compiled your own kernel at some point, know how to trace syscalls, understand TCP, care about the difference between sysvinit/runit/systemd, etc.)

Relentless desire to automate and build software tools

Desire to represent work in git, driven by a GitHub workflow through issues and pull requests

Love open source development, and have contributed to some project somewhere (doesn't have to be ours), whether through mailing lists, patches, documentation, etc.

Enjoy working remotely and the communication it requires

Love a diverse environment, working with men and women all over the world

Resource will be part of the SRE team and lead technical role to determine

Reliability Engineering needs of mission critical systems and business processes

Application development infrastructure and middleware teams to ensure stability and reliability of the system Engineering will proactive detect issues within the applications platform network.

Responsibilities Create operational tooling for monitoring self-healing infrastructures and testing

Design and create controlled in production systems

Work across teams identify and fix issues that affect systems reliability and performance

Partner with development team to identify anti patterns and optimization strategies create fallback options and help develop self-healing capabilities across the enterprise in a sustainable manner

Experience with cloud-based technologies and tools in configuration management deployment monitoring and operations

Experience with Engineering tools such as Terraform, Ansible, Consul and Linux development environment.

Experience in Application Performance Managing Real User Monitoring infrastructure monitoring and log analysis tool such as Apica Nagios Sensu and Sumologic NewRelic with DevOps Continuous Delivery

Job Types: Full-time, Contract

Salary: $60.00 - $65.00 per hour

Experience:

Kubernetes : 2 years (Required)
AWS: 1 year (Required)
SRE: 8 years (Preferred)
Overall : 10 years (Required)

Other Ansible contracts

What

Where

Remote

Site Reliability Engineer (Remote)

Other Ansible contracts

0 outside IR35 Ansible contracts