Site Reliability Engineer (SRE)

Total-TECH Co.

” The Job Description”

  1. Triage and Handle Node Health issues in-hours.
  2. Participate in Firefighting along with development engineers .
  3. Own the Design, execution and support the deployment topology of the product through infrastructure as code.
  4. Own and maintain the distribution, scaling, metrics collection, and monitoring of multiple clusters.
  5. Support the engineers in their needs to define resourcing for services that they are building as a stakeholder.
  6. Own the running of our CI/CD systems and work with the Testing Engineers to create a well tested product.
  7. Improve and own operational processes .
  8. Have knowledge and focus in the security of the topologies that we have running in production.
  9. Plan the growth of the infrastructure based on business needs and inputs.

    Requirements: 

  •  Kubernetes, Docker, and Helm.
  • Very comfortable operating in Linux, including a knowledge of BASH.
  • Cloud hosting platform (Ideally GCP, but AWS or Azure).
  • Able to write code in Python.
  • Experience deploying and maintaining modern CI/CD systems (Zuul, CircleCI, Concourse, etc.).
  • A knowledge and passion for infrastructure as Code.
Upload your CV/resume or any other relevant file. Max. file size: 3 GB.

Job Overview
Job Location