Site Reliability Engineer
National Express LTD, Digbeth, Birmingham
Site Reliability Engineer
Salary not available. View on company website.
National Express LTD, Digbeth, Birmingham
- Full time
- Permanent
- Onsite working
Posted 1 week ago, 5 Nov | Get your application in now before you're too late!
Closing date: Closing date not specified
job Ref: 56e45513b38349b18389b0cf8357bc10
Full Job Description
National Express are recruiting an experienced Site Reliability Engineer to join our team, based at Head Office, Birmingham. As the successful candidate, you will with a focus on infrastructure, you will play a pivotal role in ensuring the reliability, performance, and security of our distributed infrastructure environment. You will leverage your deep technical expertise and problem-solving skills to support, evaluate, build, deliver, and maintain a high-quality infrastructure that meets the evolving needs of our business.,
- Design, implement, and maintain highly available and scalable systems on AWS
- Develop and maintain automation scripts and tools to streamline operations and reduce manual tasks
- Monitor system performance, identify bottlenecks, and implement optimizations to improve response times
- Forecast resource requirements and ensure adequate capacity to meet business needs
- Operate and maintain traditional IT infrastructures, cloud ecosystems, and IT services to meet business needs. Manage compute, storage, and networking environments in a MSP environment
- Manage infrastructure using IaC tools (e.g., Terraform, CloudFormation) to ensure consistency and reproducibility
- Implement robust monitoring and alerting systems to proactively identify and address issues
- Contribute to security best practices and implement measures to protect our systems and data
- Manage relationships with 3rd party vendors for technical support, build, and maintenance
- Participate in projects and service improvements related to infrastructure and data centre services. Identify, own, and implement proactive maintenance plans, including upgrades and patches
- Provide specialist-level incident and problem management support. Perform problem identification, root cause analysis, and recommend service improvements
- Meet service level agreements (SLAs) for infrastructure services
- Adhere to National Express's processes, including change controls, problem records, and supportability of technology
- Maintain infrastructure supporting documentation in line with improvements and maintenance activities
- Drive continuous improvement initiatives to enhance system reliability and efficiency
Three years plus of hands-on experience leading to a deep understanding and proficiency of AWS services (e.g., EC2, S3, RDS, Lambda, CloudFront) and best practices - Experience working within an ITIL framework in organisations of 3000+ users
- Excellent customer-facing skills, including critical issue escalation resolution, root cause analysis, and accountability
- Understanding of Microsoft Server platforms, Hyper Converged Infrastructure, Domain services with hand-on experience, Backup, business continuity, Disaster recovery, and data centre operations
- Proven experience with Infrastructure as Code (IaC) tools like Terraform
- Strong scripting skills in Python or Bash
- Experience with monitoring and observability tools such as AWS CloudWatch, Grafana, or Datadog
- Knowledge of containerisation technologies (Docker, Kubernetes)
- Understanding of serverless technologies and their use cases
- Knowledge of Microservices architectures
- Solid understanding of networking concepts (TCP/IP, DNS, routing)
Complimentary coach travel for a Nominated Person or complimentary bus travel for a Spouse or Partner - 50% discount for friends and family on full fares on our coach services
- Life Assurance
- Company pension
- Employee Assistance programme
- Private online GP service