Client Details
The customer is a telecom CSP and manages a fleet of servers which are running product for telecom companies.
Context
The operation team wanted to streamline the incidents by applying reusable workflows to fix the usual problems in an automated way.
Challenges
- A telecom CSP provider was looking for an efficient and automated way to deal with a large number of incidents.
- They wanted to build an auto-remediation platform which will execute various health check scripts and remediation scripts in a parallel workflow.
- The feedback from health check scripts would be fed into a remediation script workflow with probability score, and then execution of the remediation workflow would start.
Solution
- Infracloud recommended the Fission Function platform on top of Kubernetes for executing individual checks.
- To compose the individual checks into a workflow, we used a combination of Kafka queue and Fission workflow.
- This allowed parallel execution of health checks and faster response as a result of concurrency. We also modeled the remediation workflows similarly.
- Some of the remediation workflows required strict guarantees of sequential execution, which was natively available in Fission workflows.
Implementation Details
The platform for health check execution and remediation was in a private datacenter.
Why InfraCloud?
- Our long history in programmable infrastructure space from VMs to containers give us an edge.
- We are one of cloud native technology thought leaders (speakers at various global CNCF conferences, authors, etc.).
- DevOps engineers who have pioneered DevOps at Fortune 500 companies.
- Our teams have worked from data center to deploying apps and across all phases of SDLC, bringing a holistic view of systems.