Descripción
Summary of This Role
Responsible for availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. Creates a bridge between development and operations by applying a software engineering mindset to system administration topics. Splits time between operations/on-call duties and developing systems and software that help increase site reliability and performance.
What Part Will You Play?
- Chaos engineering - you’re expected to think laterally about how our systems might fail in theory, design tests to demonstrate how they behave in practice, and then formulate and implement remediation plans, as appropriate.
- Pushing our systems to their limits, and then coming up with designs for how to get them to the next performance tier.
- Use practices from DevOps and GitOps to improve automation and processes to make self service possible.
- Safeguarding reliability. Ensuring that our services are highly available, resilient against disasters, self-monitoring, and self-healing.
- Running “game days” to test assumptions about reliability and learn what will break before it matters to customers.
- Reviewing designs with an eye toward increasing the holistic stability of our platform and identifying potential risks.
- Building systems to proactively monitor the health, performance and security of our production and non-production virtualized infrastructure.
- Improving our monitoring and alerting systems to make sure engineers get paged when it matters (and don’t get paged when it doesn’t).
- Troubleshooting systems and network issues, alongside our Technical Operations Team.
- Evolving our SDLC, practices, and tooling to account for Site Reliability considerations and best practices.
- Developing runbooks and improving documentation.
What Are We Looking For in This Role?
Minimum Qualifications
- BS in Computer Science, Information Technology, Business / Management Information Systems or related field
- Typically minimum of 2 years of relevant experience
Preferred Qualifications
- Linux experience with RHEL and AIX Systems
- Ability to read, write, and update shell scripting.
- Understand PCI requirements and workflows
- Familiarity with the Key Encryption process and procedures
What Are Our Desired Skills and Capabilities?
- Skills / Knowledge - Developing professional expertise, applies company policies and procedures to resolve a variety of issues.
- Job Complexity - Works on problems of moderate scope where analysis of situations or data requires a review of a variety of factors. Exercises judgment within defined procedures and practices to determine appropriate action. Builds productive internal/external working relationships.
- Supervision - Normally receives general instructions on routine work, detailed instructions on new projects or assignments.
- Experience in Public and Private Clouds, Jenkins, Terraform, Ansible, OpenShift, Kubernetes or AWS EKS
Como contratamos
Este es un ejemplo de un contenido de trabajo común que se puede mostrar en la parte inferior de cada descripción de trabajo. Se agrega en el CMS y luego se muestra en cada trabajo. Se puede utilizar para complementar el contenido del trabajo que proviene del ATS.
Egestas faucibus lacus a ac aptent ac condimentum risus iaculis a parturent a enim suscipit semper hendrerit feugiat suspendisse lobortis facilisis vel at dolor ornare rutrum a elementum mi. Rhoncus mollis curae penatibus scelerisque suspendisse faucibus phasellus porttitor maecenas amet a amet hac facilisi a urna a vestibulum vestibulum maecenas per adipiscing ultrices.