Job Description:
Paymentology is the first truly global issuer-processor, giving banks and fintechs the technology, team and experience to rapidly issue and process Mastercard, Visa and UnionPay cards across more than 60 countries, at scale.
Our advanced, multi-cloud platform, offering both shared and dedicated processing instances, vast global presence and richer, real-time data, set us apart as the leader in payments.
Role Summary:
* Maintain, improve, and ensure high availability, scalability, and performance of our platform.
Key Responsibilities:
Platform Reliability and Scalability:
* Build software that enhances Paymentology services' scalability and reliability.
* Ensure platform services meet required uptime and service quality levels.
* Contribute to the design of reliable cloud infrastructure and implement reusable cloud-uptime components as code.
* Regularly review and optimise SRE practices, tools, and methodologies to enhance overall system reliability and team efficiency.
Observability and Automation:
* Contribute to the design, implementation, and maintenance of observability and monitoring solutions to track the platform health, its cost-effectiveness, the reliability, and scalability, and identify potential issues.
* Develop and implement automation scripts and tools to streamline operations and reduce manual interventions.
* Enable product teams to self-serve by participating in the development of a developer platform.
* Play an active role with the incident response teams, diagnosing and resolving production issues quickly to minimise downtime.
Standards Compliance:
* Support product teams in building services that adhere to our security and quality standards.
* Work closely with engineering, operations, and product teams to ensure reliability is considered throughout the end-to-end software development lifecycle.
Requirements:
* Bachelor's Degree in Computer Science, Information Technology, or related field.
* A minimum of 3 years in a dedicated SRE role, as well as 5+ years of prior software development experience.
* Comprehensive understanding of large-scale distributed platform architecture.
* Extensive hands-on cloud experience, particularly with AWS.
* Proven experience developing scalable, modular infrastructure-as-code projects using tools such as Terraform, CloudFormation, Puppet, and Ansible.
* Practical experience with Docker and container orchestrators, including AWS ECS & EKS, and Kubernetes.
* Experience in administering or integrating identity management systems for SSO, including AWS IAM, Okta, and Active Directory.
* Experience with disaster recovery and redundancy strategies in both cloud and on-premises environments.
* Proficiency with leading monitoring tools, such as Datadog, Honeycomb.io, Splunk, Prometheus, Grafana, ELK Stack, and New Relic.
* Programming expertise, especially in systems programming languages (e.g., Java, Kotlin, Scala) and databases (e.g., SQL Server, PostgreSQL).
* Familiarity with industry-leading CI/CD tools such as Jenkins, GitHub Actions, Gitlab CI, CodePipelines, CircleCI, and ArgoCD.
* Track record of achieving platform-level and end-to-end SLIs, SLOs, and SLAs, and fostering accountability.
* Ability to navigate complex situations and lead effective post-incident reviews (PIRs).
* Knowledge of implementing solutions to reduce Mean Time to Identify (MTTI) and Mean Time to Resolve (MTTR).
* Expertise in implementing best practices for load balancing, fault tolerance, and resource allocation to maintain service quality and efficiency at scale.
* Understanding of security best practices within cloud environments.
About Us:
We're a global company, fuelled by our diverse, experienced and innovative Paymentologists who play a crucial part in our global mission to advance the world through payments.
We value making a difference to the lives of the people who work for us and who live in the communities where we operate. You can look forward to working with a diverse, global team where Paymentologists at all levels play an important part in our global mission.