About Lithium

Lithium is a nearshore staffing company that helps IT companies offering an extra hand for their dev teams with highly-skilled, trusted, nearshore developers from Latin America

We are a people-first organization that was founded to be a great place for all people. A place where team members can be open, transparent, and find big challenges. And a place where you will find a horizontal structure and a chill environment.

We value people with strong technical skills that are collaborative, curious, results-driven, and take ownership. We embrace people that want to be themselves, have daily flexibility, grow, learn and make a difference wherever the opportunity presents itself.

About the Client

Our client is a leading provider of technology solutions to the global airline and travel industry, with a vision to provide a next-generation, end-to-end platform that enables airline commerce, offering a full suite of innovative retailing, distribution, and fulfillment solutions.

About the Role

We are looking for a talented and experienced Site Reliability Engineer (SRE) to join our dynamic team. As an SRE, you will play a crucial role in ensuring the reliability, scalability, and performance of our systems.

Responsibilities

Collaborate with cross-functional teams to design, implement, and maintain highly available and scalable infrastructure on the AWS cloud platform.
Manage and optimize AWS services such as EKS/ECS, RDS, EC2 (ELB, NLB, ASG), DMS, and Lambda (Python) to ensure maximum efficiency and reliability.
Build and maintain infrastructure as code (IaC) using Terraform to automate provisioning and configuration management.
Utilize Spinnaker for automated deployments and continuous integration/continuous delivery (CI/CD) pipelines.
Implement and maintain monitoring, alerting, and observability solutions to detect and respond to incidents.
Collaborate with the development teams to design and implement application performance improvements.
Manage and optimize Elastic Cloud for Elasticsearch and support real-time search and analytics capabilities.
Work with Kafka for event streaming and data integration.

Technical requirements

3-5 years’ experience in an SRE or a similar role like DevOps.
Expertise in AWS, including services like EKS/ECS, RDS, EC2, DMS, and Lambda (Python).
Proficiency in Terraform for infrastructure as code.
Strong experience in managing and optimizing cloud resources to ensure high availability and performance.

Qualifications

Expertise in AWS, including services like EKS/ECS, RDS, EC2, DMS, and Lambda (Python).
Proficiency in Terraform for infrastructure as code.
Strong experience in managing and optimizing cloud resources to ensure high availability and performance.
Bachelor’s degree in computer science, Information Technology, or a related field (or equivalent experience).
Proven experience as a Site Reliability Engineer or in a similar role.
Strong problem-solving skills and the ability to work in a fast-paced, collaborative environment.
Excellent communication and teamwork skills.
Relevant AWS certifications are a plus.

Nice-to-Have

Experience with Spinnaker for CI/CD.
Familiarity with Elastic Cloud for Elasticsearch.
Knowledge of Kafka for event streaming.

Other important requirements

Upper-Intermediate English Level
Good communication skills
Full-time availability to join the team

Job Category: Site Reliability Engineer

Job Type: Full Time

Job Location: Remote

English Level: Advanced

SRE (Site Reliability Engineer)