Richard Manso

Contact Your Job Consultant

Richard Manso

MY EMAIL

r.manso@digitalrepublictalent.com


Share this job:

OVERVIEW

Our client is a modern marketing company. They are a team of  smart, dedicated, creative individuals across multiple offices. Their clients are some of the biggest names in tech, media, and entertainment and they work closely with their customers to develop highly customized data-driven software to address their most pressing marketing and advertising challenges. They are seeking a talented Cloud Site Reliability Engineer to join their team.

As a Cloud Site Reliability Engineer, you can expect to work on a wide variety of projects alongside world-class software engineers and data scientists. You’ll work with different teams across the organization, designing, implementing and operationalizing infrastructure. Along the way, you will have the freedom to build your own tooling and automation where applicable, prototype systems engineering ideas, and advise on improvements to our infrastructure.

Their work is project-centric and focused on using the best tools for the job all while embracing core SRE principles. These include, but are not limited to Infrastructure as Code [Terraform, Packer], monitoring and observability [Prometheus, Grafana, Kiali], continuous integration and delivery [CircleCI], logging and alerting [Cloudwatch, Fluentd, VictorOps] and elimination of toil by automating and streamlining existing systems. 

Most of their infrastructure is cloud-based, and they make heavy use of Docker and container orchestrators [Kubernetes, ECS] for running their microservice applications. Their data stores range from PostgreSQL to Redshift to Redis depending on the use case and scale of the data and they  like using Apache Airflow for ETL pipelining and process automation.

Client Stack

  • Docker & Kubernetes – kops, EKS
  • Python
  • Terraform
  • AWS – EC2, ELB, S3, ECS, RDS Postgres, Redshift
  • Airflow
  • CircleCI
  • RabbitMQ
  • Monitoring (ELK, Prometheus, Grafana, Sentry)

Your responsibilities would include:

  • Creating, configuring and maintaining cloud-based infrastructure and services for the rapid development and monitoring of complex data science and analytics applications.
  • Building really cool stuff! They are constantly engineering and automating to improve the reliability and stability of our applications (and to minimize repetitive work).
  • Being an active member of the larger software engineering team, helping to improve the organization’s SDLC process and minimizing time from code-complete to production.
  • Answering the call – responding to pages and alerts as needed to restore services.

What you will bring:

  • 2-5 years of experience as a Software/DevOps/Site Reliability Engineer.
  • Solid Bash/Linux skills and proficiency in at least one high-level language (like Python).
  • Experience with containerized development using Docker.
  • Proficiency building, configuring and maintaining cloud resources; particularly in AWS
  • Experience with modern datastores at medium-large scale (Redshift, Postgres, MySQL, Mongo).
  • Ability to conduct root-cause analysis and troubleshoot errors/outages across the stacks.
  • Measured opinions on system architecture and app design backed by sound logic and experience.
  • Ability to create your own tooling for automating or simplifying day-to-day operational tasks.
  • This role is required to report to the office once they reopen.

Bonus points if you:

  • Are familiar with general tools and principles used in data engineering/ETL processes (e.g., Redshift, S3, Airflow, etc.).
  • Have scripted cloud resources using Terraform.
  • Have a solid understanding of AWS cloud resources.
  • Have written and shipped microservices of your own to production.
  • Have setup and administered container orchestrators like ECS and Kubernetes.
  • Are familiar with building CI/CD tooling around existing apps.
  • Are experienced in Python.
  • Have experience with the AWS EMR, Athena, Glue or Apache Spark

You’ll get in return 

    • You’ll work with really smart people and on some really cool stuff
    • You’ll receive a competitive basic salary and bonus 
    • Unlimited PTO
    • Equity plan with profit-sharing
    • Vacation and anniversary cash bonuses
    • Generous medical plan
    • Paid parental leave
    • Company-paid cell phone service
    • In-house barista plus cold brew on tap!
    • Fully stocked kitchen
    • Weekly company lunches

How do You Apply?

If you’re interested in Site Reliability Engineer role, please apply via the link on this page or contact Digital Republic Talent on the phone or email. Get in contact with Digital Republic by sending a mail to r.manso@digitalrepublictalent.com or call the office on +44 7823 530190/ +1 628 239 3600. Check out the website on www.digitalrepublictalent.com. You can also find out more on LinkedIn, Instagram or Facebook.

Our client is a modern marketing company. They are a team of  smart, dedicated, creative individuals across multiple offices. Their clients are some of the biggest names in tech, media, and entertainment and they work closely with their customers to develop highly customized data-driven software to address their most pressing marketing and advertising challenges. They are seeking a talented Cloud Site Reliability Engineer to join their team.

As a Cloud Site Reliability Engineer, you can expect to work on a wide variety of projects alongside world-class software engineers and data scientists. You’ll work with different teams across the organization, designing, implementing and operationalizing infrastructure. Along the way, you will have the freedom to build your own tooling and automation where applicable, prototype systems engineering ideas, and advise on improvements to our infrastructure.

Their work is project-centric and focused on using the best tools for the job all while embracing core SRE principles. These include, but are not limited to Infrastructure as Code [Terraform, Packer], monitoring and observability [Prometheus, Grafana, Kiali], continuous integration and delivery [CircleCI], logging and alerting [Cloudwatch, Fluentd, VictorOps] and elimination of toil by automating and streamlining existing systems. 

Most of their infrastructure is cloud-based, and they make heavy use of Docker and container orchestrators [Kubernetes, ECS] for running their microservice applications. Their data stores range from PostgreSQL to Redshift to Redis depending on the use case and scale of the data and they  like using Apache Airflow for ETL pipelining and process automation.

Client Stack

  • Docker & Kubernetes – kops, EKS
  • Python
  • Terraform
  • AWS – EC2, ELB, S3, ECS, RDS Postgres, Redshift
  • Airflow
  • CircleCI
  • RabbitMQ
  • Monitoring (ELK, Prometheus, Grafana, Sentry)

Your responsibilities would include:

  • Creating, configuring and maintaining cloud-based infrastructure and services for the rapid development and monitoring of complex data science and analytics applications.
  • Building really cool stuff! They are constantly engineering and automating to improve the reliability and stability of our applications (and to minimize repetitive work).
  • Being an active member of the larger software engineering team, helping to improve the organization’s SDLC process and minimizing time from code-complete to production.
  • Answering the call – responding to pages and alerts as needed to restore services.

What you will bring:

  • 2-5 years of experience as a Software/DevOps/Site Reliability Engineer.
  • Solid Bash/Linux skills and proficiency in at least one high-level language (like Python).
  • Experience with containerized development using Docker.
  • Proficiency building, configuring and maintaining cloud resources; particularly in AWS
  • Experience with modern datastores at medium-large scale (Redshift, Postgres, MySQL, Mongo).
  • Ability to conduct root-cause analysis and troubleshoot errors/outages across the stacks.
  • Measured opinions on system architecture and app design backed by sound logic and experience.
  • Ability to create your own tooling for automating or simplifying day-to-day operational tasks.
  • This role is required to report to the office once they reopen.

Bonus points if you:

  • Are familiar with general tools and principles used in data engineering/ETL processes (e.g., Redshift, S3, Airflow, etc.).
  • Have scripted cloud resources using Terraform.
  • Have a solid understanding of AWS cloud resources.
  • Have written and shipped microservices of your own to production.
  • Have setup and administered container orchestrators like ECS and Kubernetes.
  • Are familiar with building CI/CD tooling around existing apps.
  • Are experienced in Python.
  • Have experience with the AWS EMR, Athena, Glue or Apache Spark

You’ll get in return 

    • You’ll work with really smart people and on some really cool stuff
    • You’ll receive a competitive basic salary and bonus 
    • Unlimited PTO
    • Equity plan with profit-sharing
    • Vacation and anniversary cash bonuses
    • Generous medical plan
    • Paid parental leave
    • Company-paid cell phone service
    • In-house barista plus cold brew on tap!
    • Fully stocked kitchen
    • Weekly company lunches

How do You Apply?

If you’re interested in Site Reliability Engineer role, please apply via the link on this page or contact Digital Republic Talent on the phone or email. Get in contact with Digital Republic by sending a mail to r.manso@digitalrepublictalent.com or call the office on +44 7823 530190/ +1 628 239 3600. Check out the website on www.digitalrepublictalent.com. You can also find out more on LinkedIn, Instagram or Facebook.