Site Reliability Engineer

Anova
The go-to IoT company for the Oil & Gas industry

Job details

  • Full-time
  • map markerPorto, PTMore details
  • cardRequires Work Permit
    Requires that you're a citizen or have a valid work permit / visa sponsorship to work in the country in which this position is based.
  • routeSenior
  • routeRequired language: English
  • routeNice to have language: Portuguese
  • DevOps / Sysadmin
  • tagMust Have: Distributed SystemsOther Required: C#, Java, PythonNice to Have: Agile Methodologies, Internet of Things, Microsoft Azure

Apply now

Sign up to apply

Or sign up to refer and earn a reward of €300

Intro

ISA is an Internet of Things company for the Industrial Gases, Rail, Energy, Oil & Gas, Water/Wastewater markets.

We are looking for a Site Reliability Engineer who will be our champion for availability, scalability, latency, and efficiency. You will be part of a team that will build the next generation of cloud-native high-scalable software micro-services for digital twins, analytics, web and mobile apps.

Your responsibilities will include:

  • Building a world-class IoT platform
  • Developing a micro-services architecture, ensuring security, high-availability and scalability
  • Ensure that cloud operations can be executed with no customer downtime
  • Collaborate with the product teams to design and develop systems that are resilient and highly performant at scale
  • Monitor infrastructure, measuring availability and system health
  • Perform blameless root cause analyses on outages and ensure action items are done
  • Collaborate with customer support in recovering form outages
  • Troubleshoot complex incidents in highly distributed systems
  • Shorten time to detecting by improving the accuracy of alarms
  • Be a key stakeholder in the design of services so that they are resilient from day 0

Main requirements

  • Computer Science or another related engineering degree
  • Minimum 2 years on an SRE role or similar
  • Experience in designing resilient and fault-tolerant systems
  • Experience with at least one of the following: Java, C#, Python
  • Experience in debugging complex, distributed systems 
  • Love for automation
  • Fluency in English, written and spoken

Nice to have

  • Familiar with IoT and Cloud projects
  • Experience working on a high growth startup-like environment 
  • Experience with Azure services is a plus
  • Experience with Docker and Kubernetes is a plus
  • Experience with automation and IaC is plus (Terraform, chef, etc)
  • Understanding of monitoring tools such as Prometheus, ELK, Grafana
  • Experience in troubleshooting and debugging
  • Experienced with public clouds such as AWS and Azure
  • Understanding of data stream platforms, messages brokers and queues (eg. Kaffka, RabbitMq, Azure Service Bus, Azure Events Hub)
  • Experience of working with Agile methodologies (eg. Scrum).

Benefits & Perks

  • We are constantly looking for brilliant people who appreciate complexity and challenges
  • We offer a relaxed working environment with no dress code. Be part of a small product development team, where everybody as a voice and your opinion is heard
  • We offer health insurance and fruit for healthy snacks
  • We provide high-end portable workstation and 27" 1440p extra monitors as working tools

Remote Details

First 3 months onsite.
Lisbon, Porto, and GMT-5 to GMT+1 are ok

Apply now

Sign up to apply

Or sign up to refer and earn a reward of €300