Okta is seeking a Site Reliability Manager (SRE) to lead our Microservices and Container team.
This is an exciting time at Okta. As our company grows we are investing heavily in new self-healing services, containerization and next generation automation.
The ideal candidate:
- Has a track record of leading or managing high performing teams whilst still being hands-on.
- Has experience in production quality containers on cloud-based infrastructure such as Amazon Web Services
- Has operated complex custom applications on UNIX/Linux and/or Enterprise Java platforms
- Is passionate about automation and leveraging agile software development methodologies to deliver automation
Job Duties and Responsibilities:
- Mentor and manage a team of experienced engineers using agile development
- Partner with recruiting to hire staff in our HQ and remote sites
- Manage and own delivery of new infrastructure components:
- Collaborate with TPM, architects and executive management
- Design and code reviews
- Partner with Okta security teams.
- Continuously refine monitoring processes, thresholds, and configuration
- Respond to issues and escalations and participate in a management on-call rotation
- Work closely with product developers to ensure new features have the proper operational support and maintainability
Minimum REQUIRED Knowledge, Skills, and Abilities:
- Demonstrate a track record of leading or managing a team
- Experience with microservices and container services such as Kubernetes and Docker
- Experience with managing Linux Systems in production.
- Proficient in at least one scripting language (bash, Perl, Ruby, Python)
- Experience operating and troubleshooting a complex, multi-tier service running in the cloud
- Prior experience in software development, DevOps role, or SRE role
#LI-RA1