Mattermost is the industry’s leading open-source enterprise-grade messaging platform. Customers including Intel, Ubisoft, Samsung, Cigna, BNP, European Commission, Social Security Administration, and Affirm use Mattermost to enable their teams to collaborate securely and privately anywhere. Many of the world’s leading privacy-conscious enterprises like The US Department of Defense work better by connecting people, tools, and automation to increase developer collaboration using Mattermost. Our private cloud messaging platform offers secure, configurable, highly scalable messaging using web, mobile, and desktop applications and provides deep integrations with hundreds of SaaS and on-premises tools and applications.
We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you'll learn more about working at Mattermost
We are looking for an engineer to lead our Site Reliability Engineering team for Mattermost’s new SaaS offering. You have a strong blend of software development, infrastructure and networking skills with a keen sense for leadership. Leading a team that ensures the high reliability of user-facing production services is an area you have lots of experience in. You can keep cool under pressure and lead your team in the development of systems and processes to allow for the effective resolution of incidents.
Responsibilities:Lead a team of engineers focused on maintaining high reliability of Mattermost’s SaaS offeringBuild services and tools to ensure the stability of production servicesSet technical vision and innovate to be on the forefront of self-healing SaaS servicesHelp drive efforts for compliance certifications in our SaaS (SOC2, GDPR, FedRamp, etc.)Define infrastructure as code with Terraform and other toolsWrite thoughtful and high-quality code in GoDevelop services to handle automatic recovery from incidents and disastersAutomate incident or disaster simulations to identify blindspotsLead hiring for your teamExecute our performance management process to ensure a high level of performance within the teamWork with other Leads to follow our engineering best practices, and ensure alignment with our Leadership PrinciplesImplement, maintain and tune monitoring and alerting systemsDeploy applications to and manage Kubernetes clustersRespond on-call to incidents with quick and effective resolutions
Requirements:Bachelor's degree in Computer Science or related fields, or significant professional DevOps or SRE experienceExperience with SRE and DevOps methodologies3+ years as a lead engineer or similar role5+ years of previous experience as a developer or SRE with operational responsibilitiesStrong experience running reliable, high scale applications with Kubernetes in productionStrong experience with AWS and other cloud providersStrong knowledge of container systems such as Kubernetes & DockerExperience defining and leading on-call rotations for highly available SaaS servicesPrevious experience achieving compliance certifications, audits, and remediation for a public SaaS. Examples include one or more of: SOC2, PCI, HIPPA, GDPR, FedRampSolid programming skills and experience with or an ability to quickly become proficient in GoExperience working with infrastructure as code tools, such as TerraformAbility and willingness to be on-call
Preferences:Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub, etc. at scaleOpen source contributions to related projectsKnowledge of Grafana and PrometheusComfortable with GitHub, Jira, Jenkins, CircleCIExperience working in open source communities