If you have stumbled upon The Startup Zeitgeist post on HackerNews then as a operations person there is one thing you can not miss: the emergence of Slack and chatops ecosystem surrounding it. Slack has definitely disrupted how we communicate with teams and with machines too. But beyond just team communication – such tools have enabled a lot more:
In simple words Chatops enables people to get work done through Chat tools. Chatops enables self serviceability of complex tasks in a team environment so that feedback loop is faster and people are empowered.
In this post we will explore all capabilities that one should keep in mind when building a chatops platform. A lot of what is needed as “chatops platform” might be application/organization specific – but we want to draw a blueprint from which you can pick up and choose to build a chatops platform. We have intentionally focused on capabilities – and not talking from a tool/platform perspective. At the same time some tools have been mentioned in each section – which largely accomplish the capability being discussed.
Chat platform is no doubt one of main pieces of a ChatOps platform and a way to interface with ChatOps platform. The interface allows forming groups/discussions, file sharing etc. But probably the key differentiator in modern chat platforms as compared to traditional once is the integrations. These platforms integrate with a variety of services & chatbots to accomplish a lot more than traditional chatting platforms. A chat platform which does not integrate with anything external is absolutely deal breaker to build a chatops platform. The most common alternatives fulfilling this capability are Slack, Mattermost, Campfire and of course the good old IRC.
Chatbot platforms form the core of a ChatOps platform and does all orchestration between multiple systems. These platforms provide a wide variety of plugins to interact with multiple systems and extensibility to write your own plugins easily. This is one area where a lot of customization will happen over period of time and probably OOTB installation won’t be of much use. It is also important to choose a platform which is inline with your team’s comfort level of programming language in which bot is written so that customizations are easier to build in. The popular options are Lita written in Ruby, Hubot written at Github in javascript and Python based Err.
While the chat platform and ChatBots provide plenty of integrations OOTB, there are some integrations which are absolutely must for a successful ChatOps platform.
Most of system’s health information comes from monitoring systems (Such as Zabbix, Nagios, Sensu etc.) and log management platforms (Likes of ELK stack, Splunk etc.) It is essential to be able to integrate with these systems and pull out as much data as possible without leaving the chat console. It should be possible not only to monitor health of system but also services and APIs. For example in case of API – it may not be down but the service might have degraded due to 1/2 instances being down at times. If the API/service is a public facing service, updating the status with services such as StatusPage is also a critical factor.
A lot of developers and support engineer’s time and focus is spent interacting with systems which enable delivery of software. ChatOps platform should enable interacting with such systems for example getting status of a certain deployment or status of a given build etc. Some basic operations on source code management system is also useful in enabling faster communication. Being able to interact with ticketing systems is an important feature of chatops platform.
Chatops platform should enable ops teams to take action on infrastructure right from chat interface. This has advantage of enabling teams without access to machines but also tracking the changes closely as a team. What level of integrations exist with likes of Capistrano, Chef, Puppet, Ansible, Saltstack and what additional work will be required to enable team fully is a key criteria in building the platform.
Most of systems today are built for 24Ă—7 world and managing the on call rotation can get fairly complex. Integrating with a system which handles escalation policies, on call rotation and notifying right person at right time is critical for uptime and success of such online systems. Some systems which come to mind are PagerDuty, OpsGenie
Chatops enables great deal of collaboration and openness between teams while getting things done at super fast speed. We are in very nascent stage of ChatOps – the possibilities are endless, for example checkout the talk here. There is a dedicated ChatOps topic on reddit and discussions are defining the future. We would love to hear your ChatOps story.
Looking for help with your cloud native journey? do check our cloud native consulting capabilities and expertise to know how we can help with your transformation journey.