Site Reliability Engineer

Bangalore, Karnataka, India
Job ID: JR0032850

APPLY NOW

Job Title:

Site Reliability Engineer

Role Overview:

We are hiring a Site Reliability Engineer who will improve and maintain software development, test and live infrastructure and services. You will articulate and have experience with Linux and other *NIX- derivatives. Your primary mission as an SRE engineer is working with the development, technical operations, quality assurance, and product management teams, to ensure the uptime and performance of Skyhigh Security Enterprise Cloud Security Solution.

About Skyhigh Security:

Skyhigh Security is a dynamic, fast-paced, cloud company that is a leader in the security industry.  Our mission is to protect the world’s data, and because of this, we live and breathe security. We value learning at our core, underpinned by openness and transparency.

Since 2011, organizations have trusted us to provide them with a complete, market-leading security platform built on a modern cloud stack. Our industry-leading suite of products radically simplifies data security through easy-to-use, cloud-based, Zero Trust solutions that are managed in a single dashboard, powered by hundreds of employees across the world. With offices in Santa Clara, Aylesbury, Paderborn, Bengaluru, Sydney, Tokyo and more, our employees are the heart and soul of our company.

Skyhigh Security Is more than a company; here, when you invest your career with us, we commit to investing in you. We embrace a hybrid work model, creating the flexibility and freedom you need from your work environment to reach your potential. From our employee recognition program, to our ‘Blast Talks' learning series, and team celebrations (we love to have fun!), we strive to be an interactive and engaging place where you can be your authentic self.

We are on these too! Follow us on LinkedIn and Twitter@SkyhighSecurity.

About the role:

  • Perform Incident Management and Change Management to maintain the continuous availability of all Cloud Infrastructure services

  • Ensure all SRE and operating procedures are maintained and executed.

  • Maintain a 24×7 production environment with a high level of service availability and Perform quality reviews, manage operational issues.

  • Explore and innovate new cloud technologies, features, and tools to improve the platform and automate using Bash, Python or Perl, etc…

  • Implement automation and orchestration for manual processes required to operate and deploy cloud services, be at the heart of developing new ideas into internal tools by working closely with teams.

  • Analyze alarms and dashboards to identify problem areas, report incidents, troubleshoot, and escalate as required.

  • Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding.

  • Perform ticket review and updates through the JIRA ticketing tool.

  • Manage and Maintain Runbooks / Standard Operating procedures

  • Manage, coordinate, and document all types of maintenances / outage events.

  • Must take initiative and be proactive.

  • Must take on the responsibility to learn new products and procedures.

  • Implementation of proactive monitoring, alerting, trend analysis, and self-healing systems.

  • Understand the existing architecture and work with various Engineering teams to develop and execute strategies to provide a high-quality Global production service.

  • You are responsible to debug and identify the cause of the problem/outage.

  • You will work flexible to work in a 24X7 environment (rotational shifts).

About you:

  • You will have 8+ years of production applications and systems support

  • System admin experience on Linux environments.

  • Ability to understand networking and its components

  • Good experience with Public Cloud Technology AWS

  • Experience with identifying the thresholds and monitoring setup for infra and application

  • Experience with Grafana, ELK, Cloud watch, OpsGenie, Pager duty, etc.

  • Strong communication and analytical/problem-solving skills.

  • Network knowledge (TCP/IP, UDP, DNS, Load balancing) and prior network administration experience is a big plus•

  • Experience in writing Root Cause Analysis documents

  • Experience with source control tools such as Github, SVN, or Perforce

  • Systematic approach and to drive problems to resolution

  • Experience configuring and managing web servers (Apache, Tomcat, Nginx)

  • Ability to script/program with one or more high level languages, such as Python, Go, etc…

  • Good to have experience/knowledge of GCP, Azure.

  • Experience with deployment tools Jenkins, Team city, Harness ,etc.

  • Experience with any configuration management tools like Salt, Puppet, Ansible,etc.

  • Experience in Security domain will be added advantage

  • Experience with continuous integration and deployment automation tools such as Jenkins, Harness, AWS CloudFormation, Salt, or Puppet, Chef, Ansible• Experience with SQL (MySQL) NoSQL databases (Redis, CouchBase, Cassandra, Crate)

  • Experience with open-source technologies (Kafka, Memcached, Redis, Hadoop, HBase, Zookeeper, Oozie)

Company Benefits and Perks:

We work hard to embrace diversity and inclusion and encourage everyone to bring their authentic selves to work every day. We offer a variety of social programs, flexible work hours and family-friendly benefits to all of our employees.

  • Retirement Plans
  • Medical, Dental and Vision Coverage
  • Paid Time Off
  • Paid Parental Leave
  • Support for Community Involvement

We're serious about our commitment to diversity which is why we prohibit discrimination based on race, color, religion, gender, national origin, age, disability, veteran status, marital status, pregnancy, gender expression or identity, sexual orientation or any other legally protected status.

APPLY NOW

Share This Job