Site Reliability Engineer

Bangalore, Karnataka, India
Job ID: JR0031409


Job Title:

Site Reliability Engineer

Role Overview:

We are hiring for Site Reliability Engineer who will improve and maintain software development, test and live infrastructure and services. You will articulate and have experience with Linux and other *NIX- derivatives. Your primary mission as an SRE engineer is working with the development, technical operations, quality assurance, and product management teams, to ensure the uptime and performance of Skyhigh Security Enterprise Cloud Security Solution.

About Us

At Skyhigh Security, we have a bold vision to Secure the World’s Data. Our mission is to protect organizations with cloud-native security solutions that are both data-aware and simple to use. We go beyond data access and focus on data use, allowing organizations to collaborate from any device and from anywhere without sacrificing their security. We strive to live your values every day.  We want to Lead the Industry, we actively embrace our differences, we love to learn together, and we choose to celebrate each other often! 

About the role:

  • Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services

  • Ensure all SRE and operating procedures are maintained and executed.

  • Maintain a 24×7 production environment with a high level of service availability and Perform quality reviews, manage operational issues.

  • Explore and innovate new cloud technologies, features, and tools to improve the platform and automate using Bash, Python or Perl, etc…

  • Implement automation and orchestration for manual processes required to operate and deploy cloud services, be at the heart of developing new ideas into internal tools by working closely with teams.

  • Analyze alarms and dashboards to identify problem areas, report incidents, troubleshoot, and escalate as required.

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.

  • Perform ticket review and updates through the JIRA ticketing tool.

  • Manage and Maintain Runbooks / Standard Operating procedures

  • Manage, coordinate, and document all types of maintenances / outage events.

  • Must take initiative and be proactive.

  • Must take on the responsibility to learn new products and procedures.

  • Implementation of proactive monitoring, alerting, trend analysis, and self-healing systems.

  • Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality Global production service.

  • You are responsible to debug and identify the cause of the problem/outage.

  • You will work flexible to work in a 24X7 environment (rotational shifts).

About you:

  • You will have 8+ years of production applications and systems support

  • System admin experience on Linux environments.

  • Ability to understand networking and its components

  • Good experience with Public Cloud Technology AWS

  • Experience with identifying the thresholds and monitoring setup for infra and application

  • Experience with Grafana, ELK, Cloud watch, OpsGenie, Pager duty, etc.

  • Strong communication and analytical/problem-solving skills.

  • Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is a big plus•

  • Experience in writing Root Cause Analysis documents

  • Experience with source control tools such as Github, SVN, or Perforce

  • Systematic approach and to drive problems to resolution

  • Experience configuring and managing web servers (Apache, Tomcat, Nginx)

  • Ability to script/program with one or more high level languages, such as Python, Go, etc…

  • Good to have experience/knowledge of GCP, Azure.

  • Experience with deployment tools Jenkins, Team city, Harness ,etc.

  • Experience with any configuration management tools like Salt, Puppet, Ansible,etc.

  • Experience in Security domain will be added advantage

  • Experience with continuous integration and deployment automation tools such as Jenkins, Harness, AWS CloudFormation, Salt, or Puppet, Chef, Ansible• Experience with SQL (MySQL) NoSQL databases (Redis, CouchBase, Cassandra, Crate)

  • Experience with open-source technologies (Kafka, Memcached, Redis, Hadoop, HBase, Zookeeper, Oozie)

Company Benefits and Perks:

We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Pension and Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

We're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.


Share This Job