
Total-TECH Co.
” The Job Description”
1. Ensure System Uptime and Reliability: Monitor and maintain cloud-based applications and infrastructure, ensuring minimal downtime and efficient incident response.
2. Build and Optimize Monitoring and Alerting Systems: Set up and continuously improve comprehensive monitoring and alerting frameworks to detect and address issues proactively.
3. Cloud Infrastructure Management: Manage, optimize, and scale systems on Azure cloud platforms, ensuring high performance and cost-effectiveness.
4. Incident Management and Response: Act as the first line of defense in identifying, diagnosing, and resolving technical issues in real-time or escalate them to the appropriate teams.
5. Cloud platforms (AWS, Azure, GCP), Reliability and scalability testing, Monitoring tools, Incident response and disaster recovery.
6. Tooling and Observability: Leverage technologies such as Grafana for observability and Argo for CI/CD automation, enhancing our ability to respond swiftly and effectively to infrastructure needs.
7. Collaboration: Work closely with cross-functional teams to align on SRE best practices, share insights, and support development and operational goals.
8. Fluent spoken and written Arabic/English.