Infra L1 Support Engineer
UST
Job Details
Location
Bangalore
Experience
2+
Salary
10 LPA
Last Date
31/05/2026
Job Description
Join our Global Infrastructure team and work on high-performance GPU environments supporting critical operations and platform stability.
We are looking for an Infra Support Engineer to handle L1/L2 support activities across GPU infrastructure environments. The role involves monitoring systems, identifying and triaging incidents, performing basic troubleshooting and remediation, executing operational runbooks, and collaborating with the SRE team for escalations and service reliability improvements.
The ideal candidate should have strong troubleshooting skills, knowledge of Linux and infrastructure fundamentals, exposure to monitoring tools, and a passion for infrastructure operations and reliability engineering. Excellent communication and incident management skills will be an added advantage.
Key Responsibilities
• Provide L1/L2 technical support for AI infrastructure including GPU/CPU nodes, networking, storage, orchestration, and platform services through ticketing tools, emails, Slack, and messaging platforms.
• Support GPU cluster delivery activities such as provisioning, image deployment, network validation, BIOS/firmware upgrades, and GPU driver/runtime installation.
• Monitor infrastructure health, dashboards, and service indicators while responding to alerts as part of scheduled operations support.
• Perform incident triage, analyze impact, gather required logs/details, and execute standard operational runbooks for quick mitigation.
• Escalate critical issues to SRE teams with proper incident documentation, logs, and troubleshooting details.
• Maintain incident records, provide timely updates to stakeholders, and support status communication during outages.
• Handle routine operational tasks including health checks, log reviews, capacity monitoring, and basic automated fixes.
• Participate in post-incident reviews and contribute ideas to improve reliability and reduce recurring issues.
• Support SOP improvements, validate operational runbooks, and document new infrastructure procedures.
• Collaborate with infrastructure, development, and SRE teams to improve system stability and operational efficiency.
Required Skills
Server administrationSRE
Eligibility Criteria
Over 2+ years of experience in IT operations, server administration, SRE, DevOps or technical support.
Interview Preparation Guide
1. Linux & Shell
Practice top, vmstat, iostat, dmesg, journalctl for live troubleshooting
Know systemd service management (systemctl start/stop/status/restart)
Be fluent in log parsing with grep, awk, sed, tail -f
Understand filesystem hierarchy (/etc, /var/log, /proc, /sys)
Have a scripting example ready (Bash automation you've written)
2. Networking
Explain DNS resolution → TCP handshake → TLS → HTTP flow clearly
Know tools: ping, traceroute, tcpdump, curl -v, ss, nslookup
Understand TCP vs UDP, common ports (22, 53, 80, 443)
Be ready to explain VLANs, subnets, and firewall rules
3. Monitoring & Logging
Describe a Prometheus + Grafana stack you've used or built
Know PromQL basics (rate, increase, histogram_quantile)
Explain metrics vs logs vs traces and when each is useful
Mention alerting setup and how you tuned noisy alerts
4. Kubernetes
Know core objects: Pod, Deployment, Service, ConfigMap, Namespace
Practice debug commands: kubectl logs, describe, exec, get events
Understand CrashLoopBackOff, ImagePullBackOff, OOMKilled scenarios
Explain how rolling updates and health checks work
5. Nvidia GPU Infrastructure
Read and interpret nvidia-smi output (utilization, memory, temperature)
Understand CUDA driver vs toolkit compatibility
Know how Kubernetes schedules GPUs via device plugins
Be aware of common issues: GPU memory leaks, driver mismatches
6. Troubleshooting Mindset
Interview Process
1st Round : Written Test
2nd and 3rd Round : Technical Interview
4th Hr Round
📚
Preparing for this UST role?
Ace your UST interview with our comprehensive preparation guide covering interview process, frequently asked questions, salary insights, and insider tips.
📖Interview Prep Guides→