Sr. Site Reliability Lead Engineer - 100% remote

$55 - $65/hour (Estimated)Remote

Location restricted

This job is restricted to tax residents of , but we detected your IP as outside of the country. Please only apply if you are a tax resident.

Job details

Job Type

Contract

Full Job Description

Company Description

Spruce InfoTech is a leading information technology firm that provides varied services to help clients change manage and transform their businesses by means of high quality, innovative and cost effective solutions. We provide services to different companies from small scale level to even fortune 500 organizations and guide them in the best possible way to maximize IT investment and also reduce the cost of acquiring new technologies.

Job Description

Role – Site Reliability Lead Engineer

100% remote

1 year in length

Status: USC/GC

Skills:

1. Kubernetes container management
2. Config management tools (Ansible puppet etc preferred Ansible)
3. Medium VMware knowledge
4. Medium-high vRA and vRO knowledge including workflow and blue print configurations.
5. JavaScript (angularJS specifically) (TS should be fine if they can backward work angular JS)
6. Programming experience (Python or Java or grails) with Ci/cd experience.

NOTE - If candidate does not have VRA or VRO, it is ok, as long as the candidate can Manage and work with an API. Understanding what he/she needs to get done.

Also if candidate does not have Angular JS it is ok.

Site Reliability Engineering combines software and systems engineering to manage some of the most complex environments of our customers. Client’s large-scale, fault-tolerant environments are deployed for our customers run some of the most complex applications. You will be working as part of our newly formed SRE managed services group as an SRE engineer.

SRE looks for creative ways to automate and secure our environments. SRE is a mindset and a set of engineering approaches to running better production systems. Much of our managed services focuses on optimizing existing environments for our customers, building highly scalable infrastructure and eliminating work through automation. You, as a software developer are expected to use a variety of tools including Kubernetes, Jenkins, Prometheus, Grafana, and more to orchestrate these complex systems and ensure operational stability and increase reliability. You will be using your experience with platforms such as C, C++, Java, Python, NodeJS, Javascript/Typescript, GO etc to build custom tooling, modifying application code to improve operational stability and ease of functionality. You will also be working in a dynamic multi-cloud/hybrid-cloud environment including AWS, Azure, GCP and VMware. The ideal SRE wants to limit time spent on operational work, proactively identify potential ways our systems can fail, and enjoys a blameless post-mortem when incidents occur.

Reliability is at the heart of our promise to our customers, so the SRE role is at the heart of our technical team. We're always on call to keep our environments up and running, ensuring our investors reliably earn staking rewards. You will be responsible for designing, implementing and maintaining these systems alongside other members of the support organization.

Responsibilities:

Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation and refinement.
Maintain services once they are live by measuring and monitoring availability, latency and overall system health.
Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity.
Practice sustainable incident response and blameless post-mortems.
Understand and support the entire stack, from hardware and into the cloud.
Engage in governance models and publish best practices and consumption models.

-Learn new technologies -Implement software development practices and maintain operational code implementing SSDLC concepts.

Interface with end users and application developers in an agile framework to ensure continues improvement of the environment.
Implement all SRE principles and concepts as applicable in an environment.
Manage complex environments including VMware Suite of products including vRA suite and vSphere ESXi.
Manage and constantly improve upon cloud High Availability, Fault-tolerance, and Disaster Recovery, and automation
Design infrastructure, analyzing and troubleshooting large-scale distributed systems.
Strive to drive change with a complete sense of ownership.
Establish yourself as a strong customer trusted proactive advisory.
Perform RCA analysis on incidents.
Create custom reports as and if applicable to estimate and forecast environment capacity and costs.
Implement all SRE principles and concepts as applicable in an environment.
Manage all SRE and DevOPS/Software development framework tools for CI/CD, automated testing etc.
Train and coach customers and internal infrastructure teams in SRE principles.
Build and maintain environment auto-heal frameworks and operational code.

Minimum Qualifications:

BS degree in Computer Science or a related technical field involving coding (e.g., physics or mathematics), or equivalent practical experience
Minimum of 5 years of SRE/DevOPS or Software development experience preferred.
Hands on Experience with DevOps, IaC, algorithms, software development.
Hands on Experience in at least two or more of the following: C, C++, Java, Python, Go, Perl, Node, Javascript, Typescript, Go or Ruby.
Hands on Experience with container orchestration (Docker and Kubernetes).
Hands on Experience with Cloud Provider APIs and Best Practices (AWS, Azure and GCP).
Hands on experience in managing complex environments including VMware Suite of products including vRA suite and vSphere ESXi.
Ability to maintain, manage and constantly improve upon cloud High Availability, Fault-tolerance, and Disaster Recovery, and automation.
Be able to Interact with customers with an ability to act as a strong customer advocate providing trusted proactive advisory.
Ability to design infrastructure, analyzing and troubleshooting large-scale distributed systems.
Proven Experience in taking a Systematic problem-solving approach, coupled with strong communication skills, drive and a sense of ownership.
Ability to Think-on-your-feet and take ownership of the issue as a challenge.
Proven Experience in DevOps tools and principles.

Preferred Qualifications:

At least 5-7 years of Devops/SRE or software development experience preferred.
Proven experience in Systematic problem-solving approach, coupled with strong communication skills, drive and a sense of ownership.
AiOPS experience preferred – building event correlation frameworks etc..
Knowledge of advanced concepts of blockchain preferred.
Ability to work cross functionally across multiple business units.
Experience with data structures, complexity analysis and software design.
Experience contributing or maintaining open source projects.
Master’s in Computer Science degree preferred

Qualifications

null

Additional Information

All your information will be kept confidential according to EEO guidelines.

Other AngularJS contracts

What

Where

Remote

Sr. Site Reliability Lead Engineer - 100% remote

Other AngularJS contracts

0 outside IR35 AngularJS contracts