Principal EngineerApply Now Job ID: R0000079969 job family: Technology Engineering schedule: Full time Location: Target Corporation India Pvt. Ltd., Bangalore, Karnataka, India, 560045;
Target is an iconic brand, a Fortune 50 company and one of America’s leading retailers.
Target as a tech company? Absolutely. We’re the behind-the-scenes powerhouse that fuels Target’s passion and commitment to cutting-edge innovation. We anchor every facet of one of the world’s best-loved retailers with a strong technology framework that relies on the latest tools and technologies—and the brightest people—to deliver incredible value to guests online and in stores. Target Technology Services is on a mission to offer the systems, tools and support that guests and team members need and deserve. Our high-performing teams balance independence with collaboration, and we pride ourselves on being versatile, agile and creative. We drive industry-leading technologies in support of every angle of the business, and help ensure that Target operates smoothly, securely and reliably from the inside out.
As a Principal Engineer you set the strategy for software development and/or infrastructure engineering at Target. You set the direction for how software and infrastructure engineering efforts will be designed, developed, and operationalized across multiple portfolios and drives adoption across TTS. You lead and approve software and infrastructure engineering efforts to meet functional and non-functional requirements. You are a thought leader and mentor for internal and external technical talent and actively contribute to the external technical community.
As a member of the SRE team you will work with other SRE and portfolio Engineers to produce mission-critical infrastructure, tools, and processes that will ensure the highest level of availability and reliability of our platforms and services. As a senior member of the team you will be expected to work with management, peers, team members and guests to define and implement the technical vision of the team.
You're right for the job if you're comfortable with deep technical Linux, networking topics, and distributed architectures. You will work cross-functionally amongst a variety of teams and be a core contributor in every significant engineering service or solution that we deliver to our stakeholders. You'll excel if you have enthusiasm for digging deep, and a flare for sharp technical communication, prioritization and organization. You will work directly with our Software Engineering teams to build our next generation “always up” cloud-based ecommerce/Retail and Enterprise platform.
Site Reliability Engineers are hybrid systems and software engineers who are responsible and take ownership for reliability, scalability, automation, and other issues related to availability of Target’s e-commerce/Retail and Enterprise platforms. Our goal is to build, scale and guard the systems that delight our guests. To do so, you will need strong skills in following areas:
- Design, write and build tools to improve the reliability, latency, availability and scalability of Target’s e-commerce/Retail and Enterprise products.
- Engender reliability and availability starting with metrics and measurements
- Enable scaling by providing tools, developing training and/or augmenting processes
- Build tools/automate to prevent re-occurrence of problems in mission critical products/services.
- Augment existing instrumentation to build a cohesive picture of the characteristics of our systems with special attention to points of failure.
- Participate in capacity planning, demand forecasting, software performance analysis and system tuning.
- Develop a deep understanding of the various services and applications that come together to deliver Target’s e-commerce/Retail and Enterprise products
- Drive the definition and adoption of SLIs and SLOs at both the service and experience levels.
- Design new tools to monitor and create smart alerts that help discover failures/issues in a timely fashion and work with engineers to identify root cause and fix issues
- Influence, design and create new architectures, standards and methods for large-scale enterprise systems.
- Root-cause complex problems involving multiple parties, networks, hardware and software that relate to scaling and performance
- Participate in on-call rotation.
- Secure the system from issues, be they real, perceived or theoretical
- High focus on collecting and inferring metrics
- Experience with configuration management tools such as Ansible, Saltstack, Chef and Puppet
- Build and drive the automation systems that maintain system health
- Eliminate Single Point of failure and test disaster recovery and HA regularly.
Additional responsibilities may include:
- Drives standardization and service focused instrumentation. Provides subject matter expertise. Resolves break/fix scenarios, engaging broader teams as necessary; and partners/leads to achieve continuous improvement. Contributes to command and control related activities focused on restoration of complex outages, and rapid restoration. Participate on 24/7 on-call rotation. May work independently or as part of a team on more complex projects. Provides mentoring and guidance to more junior team members.
- Creates systems engineering and architectural documentation to be used by others to build and maintain systems.
- Scripting and Development responsibilities: Design and develop software in several modern languages. Design large/complex database-backed systems and has an expert understanding of DB schema and query performance. Given a broad set of goals, can create detailed requirements and technical design specifications. Designs modular systems to be co-developed by teams of less experienced SRE. Designs horizontally-scalable solutions with innovative use of storage and networking including good understanding of APIs for integration with other systems. Utilizes professional best practices in day-to-day work like revision control, unit testing, or other. Applies statistical data analysis techniques.
- Networking responsibilities: Recommends or helps architect an entire system. Acts as an expert in understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.)
- Application Technologies: Provides expert recommendations and advice to the team and/or department in the areas of web services, OS, and storage, including being an active liaison to Development, product and the Business.
- Analyzes systems and makes recommendations to prevent possible problems. Takes lead on issue resolution activities using knowledge of complex and company-wide systems.
- Lead end-to-end audit of monitors and alarms based on subsystem knowledge. Takes the lead on defining the requirements for new tools for command / control
- Utilizes time management and project management skills to lead the resolution of issues in a timely and organized manner, effectively communicating necessary information. May consult directly with developers or third-party vendors; provides subject matter expertise.
- Consistent exercise of independent judgment and discretion in matters of significance.
- Other duties and responsibilities as assigned.
- 12+ years in a software development, DevOps role, or SRE role.
- Experience in designing, investigating, analyzing and troubleshooting large-scale enterprise systems.
- Methodical and systematic problem-solving approach, combined with a solid awareness of ownership, initiative and drive.
- Fluency with running services at scale; In depth understanding of Unix systems internals and networking.
- Networking knowledge and in depth understanding of network concepts, such as different protocols (TCP/IP, UDP, ICMP, etc.), MAC addresses, IP packets, DNS, OSI layers, and load balancing).
- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way. Experience administering Linux systems in a production environment
- Programming experience in one or more of the following languages: Go, Java, Python, Ruby, Shell
- Bachelor's Degree in Computer Science or a related field, or relevant work experience
- Experience with distributed version control like Git or similar
- Experience with IaaS and PaaS providers such as AWS, AZURE, GCP, private cloud
- Experience with enterprise monitoring solutions like AppDynamics, New Relic, Prometheus, Graphite, Nagios, Sensu and Splunk
- Familiarity with continuous integration/deployment processes and tools such as Travis, Drone, Jenkins, Docker, Maven, Nexus, etc.,
What’s it like to work here? We’re asked that a lot. Target respects and values the individuality of all team members and guests—and we have lots of fun in all that we do.experience our culture