Systems Engineer - AIOps AWS
Chennai, India ยท Full-time
ID836PostedMay 27, 2026Openings1Experience4-6 YearsAnnual Salary12 to 18L
Hot Duties:
Manage million-dollar AWS infrastructure using automation, AIOps, GenAI, Terraform, Jenkins, Ansible, and GitHub Actions for intelligent operations.
Duties:
- Strong AWS expertise with Amazon ECS, container deployments, microservice architectures, and AI/ML infrastructure provisioning using Bedrock and SageMaker.
- Skilled in configuring Apache/Nginx with load balancing, SSL/TLS, and AI-driven traffic routing.
- Hands-on with Terraform for modular IaC provisioning, including AI-ready environment builds.
- Proficient in CI/CD using GitHub Actions and Jenkins, with AI-assisted testing and deployment confidence scoring.
- Experienced with Ansible for configuration management, system hardening, and AI assisted drift detection.
- Strong scripting in Python, Bash, and PowerShell, with experience calling LLM APIs for intelligent automation.
- Solid DevOps/SRE understanding including AIOps, observability, self-healing infrastructure, and incident response.
- Experienced with New Relic and CloudWatch, skilled in ML based alerting, APM, and AIOps dashboards.
- Knowledgeable in vulnerability management, AI assisted scanning, remediation, and LLM governance controls.
- Strong problem-solving, communication, and documentation skills with an AI first engineering mindset.
Requirements:
- Collaborate with technical teams, stakeholders, and vendors to architect and deliver AI-augmented solutions on AWS, spanning multi-account environments, IAM, VPC, and ECS, while effectively communicating and escalating issues.
- Write and maintain documentation covering Linux system administration, network topology, firewall configurations, LLMOps runbooks, and architecture decision records.
- Implement infrastructure best practices using Terraform for modular IaC and Ansible for configuration management, system hardening, code reviews, and Agile/Waterfall adherence.
- Manage containerized workloads independently with Docker and Kubernetes, orchestrating microservices, managing Helm charts, and identifying intelligent process improvements.
- Build and optimize GitHub Actions and Jenkins CI/CD pipelines with AI-assisted testing, deployment scoring, and automated release workflows.
- Oversee incident response using New Relic APM, performing AI-assisted root cause analysis, intelligent alert triage, and automated diagnostics across distributed systems.