Site Reliability Engineer

National Express LTD, Digbeth, Birmingham

Site Reliability Engineer

Salary not available. View on company website.

National Express LTD, Digbeth, Birmingham

  • Full time
  • Permanent
  • Onsite working

Posted 1 week ago, 5 Nov | Get your application in now before you're too late!

Closing date: Closing date not specified

job Ref: 56e45513b38349b18389b0cf8357bc10

Full Job Description

National Express are recruiting an experienced Site Reliability Engineer to join our team, based at Head Office, Birmingham. As the successful candidate, you will with a focus on infrastructure, you will play a pivotal role in ensuring the reliability, performance, and security of our distributed infrastructure environment. You will leverage your deep technical expertise and problem-solving skills to support, evaluate, build, deliver, and maintain a high-quality infrastructure that meets the evolving needs of our business.,

  • Design, implement, and maintain highly available and scalable systems on AWS
  • Develop and maintain automation scripts and tools to streamline operations and reduce manual tasks
  • Monitor system performance, identify bottlenecks, and implement optimizations to improve response times
  • Forecast resource requirements and ensure adequate capacity to meet business needs
  • Operate and maintain traditional IT infrastructures, cloud ecosystems, and IT services to meet business needs. Manage compute, storage, and networking environments in a MSP environment
  • Manage infrastructure using IaC tools (e.g., Terraform, CloudFormation) to ensure consistency and reproducibility
  • Implement robust monitoring and alerting systems to proactively identify and address issues
  • Contribute to security best practices and implement measures to protect our systems and data
  • Manage relationships with 3rd party vendors for technical support, build, and maintenance
  • Participate in projects and service improvements related to infrastructure and data centre services. Identify, own, and implement proactive maintenance plans, including upgrades and patches
  • Provide specialist-level incident and problem management support. Perform problem identification, root cause analysis, and recommend service improvements
  • Meet service level agreements (SLAs) for infrastructure services
  • Adhere to National Express's processes, including change controls, problem records, and supportability of technology
  • Maintain infrastructure supporting documentation in line with improvements and maintenance activities
  • Drive continuous improvement initiatives to enhance system reliability and efficiency

    Three years plus of hands-on experience leading to a deep understanding and proficiency of AWS services (e.g., EC2, S3, RDS, Lambda, CloudFront) and best practices
  • Experience working within an ITIL framework in organisations of 3000+ users
  • Excellent customer-facing skills, including critical issue escalation resolution, root cause analysis, and accountability
  • Understanding of Microsoft Server platforms, Hyper Converged Infrastructure, Domain services with hand-on experience, Backup, business continuity, Disaster recovery, and data centre operations
  • Proven experience with Infrastructure as Code (IaC) tools like Terraform
  • Strong scripting skills in Python or Bash
  • Experience with monitoring and observability tools such as AWS CloudWatch, Grafana, or Datadog
  • Knowledge of containerisation technologies (Docker, Kubernetes)
  • Understanding of serverless technologies and their use cases
  • Knowledge of Microservices architectures
  • Solid understanding of networking concepts (TCP/IP, DNS, routing)

    Complimentary coach travel for a Nominated Person or complimentary bus travel for a Spouse or Partner
  • 50% discount for friends and family on full fares on our coach services
  • Life Assurance
  • Company pension
  • Employee Assistance programme
  • Private online GP service