At Okta our motto is "Always On", and nowhere do we embrace that more than in Technical Operations. We strive to build the most reliable and performant systems on the planet through the skillful use of automation. We've created an integrated system that securely connects any person via any device to the technologies they need to do their most significant work.
If you like to be challenged and have a passion for solving problems at scale with automation, testing and tuning, then we would love to hear from you. The ideal candidate is someone who exemplifies the ethics of, “If you have to do something more than once, automate it” and who can rapidly self-educate on new concepts and tools.
Job Duties and Responsibilities:
As a Database Reliability Engineer, you will have ownership of all technical aspects of our data services tier. Reporting to the Manager of Technical Operations, you will partner with our core product engineers and growing DBRE team to be the last word on performance and scaling. Additionally, you will play a key role as we evolve our architecture to meet the demands of Okta's enormous growth and the millions of users who rely on us to provide uninterrupted access to business critical enterprise and consumer applications.
- Ensure effective performance and 24X7 availability of the production database systems
- Design, automate and document operational processes, tasks and configuration management
- Lead efforts on performance tuning, scaling and benchmarking the data services infrastructure
- Work closely with performance engineers and core product engineers on a myriad of topics
- Contribute to automation such as configuration automation using chef, provisioning as well as automate any other repetitive tasks.
- Track resource usage trends and take preventative actions to restore full health
- Monitor security and database operation related alerts, take preventive or corrective action to resolve issues
- Participate in on-call rotation and occasional off-hour activities
Minimum Required Knowledge, Skills, Abilities and Qualities:
- Proficient to expert level MySQL Administration
- Experience with MySQL / Percona Server 5.6 / 5.7 / 8, Aurora.
- Proficient level Chef or Puppet to manage configuration
- Proficient in a Linux environment including Linux internals and tuning
- Experience as a first responder for the data tier on a high-traffic site
- Experience working in AWS (EC2 / EBS / S3 Snapshots / Aurora / RDS)
- Identify with: security conscious, self-motivated, accountable, collaborative, humble and reliable
- Proficiency automating administrative tasks using (Ruby, Python, Shell, Ansible)
Bonus Knowledge, Skills, Abilities:
- Experience using Ansible to manage infrastructure
- Experience administering Cassandra / Elasticsearch
- Tech blogging / Open source projects contributions a plus