5 Infrastructure Lead jobs in Pakistan
DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)
Posted 13 days ago
Job Viewed
Job Description
1 week ago Be among the first 25 applicants
Company Description
Runware is the fastest AI-as-a-Service platform for media generation
Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.
Company Description
Runware is the fastest AI-as-a-Service platform for media generation
Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.
Join Runware to power the AI products that are changing the world
At Runware you'll collaborate with the world's leading AI teams, turning cutting-edge research into breakthrough products for thousands of clients. New models hit the market every week, and our job isn't just to keep pace—it's to stay two steps ahead, delivering unbeatable speed and performance every time.
That takes a special kind of teammate: driven, self-directed, lightning-quick to learn, and rock-solid reliable. If you thrive on building ambitious things with people who work hard, care for one another, and refuse to settle for "good enough," you'll feel right at home.
Resumés matter, but passion, grit, and proof of excellence matter more—whether you honed your skills in a research lab, at work, or taught yourself at 2 a.m. If that sounds like you, let's talk.
About The Role
This is a full-time remote role for a DevOps Lead - Bare-Metal & GPU Infrastructure (Linux). The successful candidate will be responsible for ensuring 99.999% service availability and optimum usage/scale infrastructure ratios while shipping code across hundreds of Linux GPU servers in multiple data-center locations.
Responsibilities
- 5+ yrs Linux SRE/DevOps with 100+ bare-metal node fleets; 2+ yrs as technical lead
- Deep knowledge of NVIDIA/AMD GPU servers, high-speed interconnects (40 GbE+/InfiniBand/RoCE), NVMe/RDMA storage
- Proven record sustaining ≥ 99.999% uptime in latency-sensitive, high-variance demand environments
- Expert in Kubernetes on bare metal (Cluster-API, Kube-Virt, GPU Operator), advanced CNI, custom schedulers, and etcd care-and-feeding
- Strong skills in Go or Python, plus Bash; you write the tools you can't find
- Infrastructure-as-Code mastery (Terraform, Ansible, Packer), GitOps workflows, and container build systems
- Monitoring/alerting stacks (Grafana), chaos/latency testing, synthetic probes
- Clear architectural thinking, crisp documentation, and calm communication under pressure
Benefits
We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.
Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.
- Generous paid time off - vacation, sick days, public holidays
- Meaningful stock options - share in the upside you create
- Remote-first setup - work from home anywhere we can employ you
- Flexible hours - own your schedule outside core collaboration blocks
- Family leave - paid maternity, paternity, and caregiver time
- Company retreats - twice-yearly gatherings in inspiring locations
- Seniority level Mid-Senior level
- Employment type Full-time
- Job function Other
- Industries IT Services and IT Consulting
Referrals increase your chances of interviewing at Runware by 2x
Get notified about new DevOps Manager jobs in Pakistan .
Senior DevOps Engineer (Golang) - Fully REMOTE Senior Software Engineer - Oracle Retail Dev, Lotus's - REMOTE Senior C++ Software Engineer (100% Remote - Pakistan)We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-LjbffrDevOps Lead - Bare-Metal & GPU Infrastructure (Linux)
Posted 13 days ago
Job Viewed
Job Description
Runware is the fastest AI-as-a-Service platform for media generation
Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime. Company Description
Runware is the fastest AI-as-a-Service platform for media generation
Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.
Join Runware to power the AI products that are changing the world
At Runware you'll collaborate with the world's leading AI teams, turning cutting-edge research into breakthrough products for thousands of clients. New models hit the market every week, and our job isn't just to keep pace—it's to stay two steps ahead, delivering unbeatable speed and performance every time.
That takes a special kind of teammate: driven, self-directed, lightning-quick to learn, and rock-solid reliable. If you thrive on building ambitious things with people who work hard, care for one another, and refuse to settle for "good enough," you'll feel right at home.
Resumés matter, but passion, grit, and proof of excellence matter more—whether you honed your skills in a research lab, at work, or taught yourself at 2 a.m. If that sounds like you, let's talk.
About The Role
This is a full-time remote role for a DevOps Lead - Bare-Metal & GPU Infrastructure (Linux). The successful candidate will be responsible for ensuring 99.999% service availability and optimum usage/scale infrastructure ratios while shipping code across hundreds of Linux GPU servers in multiple data-center locations.
Responsibilities
Fleet reliability - design and automate HA architectures that tolerate node, rack, or site failure without user impact Ultra-fast delivery - build zero-touch CI/CD pipelines (GitOps, progressive rollout, instant rollback) that push config or container changes globally in under 10m Bare-metal lifecycle - PXE/Redfish/IPMI bootstrapping, firmware & driver orchestration, per-node GPU tuning, automated de-commissioning Kubernetes on metal - multi-cluster control-plane HA, GPU scheduling, CNI overlay (Cilium/Calico), MetalLB/Ingress →
Observability at scale - end-to-end metrics, logs, traces, actionable SLO dashboards, and predictive auto-healing Incident command - primary on-call lead; run blameless post-mortems and automate root-cause fixes Capacity bursts - script server bring-up (Ansible/Terraform/Cluster-API) so 100+ new GPUs go live in minutes Security & compliance - kernel-level hardening, secrets management, GPU multi-tenancy isolation, continuous CVE patching Mentorship - guide a small SRE/DevOps pod, set coding standards, and champion best practices
In Your First 12 Months You Will:
Cut average deployment latency to ≤ 2m end-to-end, with one-click rollbacks Maintain ≤ 5 min total annual user-visible downtime (five nines) across all sites Automate server bring-up to
Reduce P1 incidents by ≥ 60% through predictive alerting and auto-remediation Deliver fully auditable, Git-centric change pipelines adopted by 100% of engineering
Requirements
5+ yrs Linux SRE/DevOps with 100+ bare-metal node fleets; 2+ yrs as technical lead Deep knowledge of NVIDIA/AMD GPU servers, high-speed interconnects (40 GbE+/InfiniBand/RoCE), NVMe/RDMA storage Proven record sustaining ≥ 99.999% uptime in latency-sensitive, high-variance demand environments Expert in Kubernetes on bare metal (Cluster-API, Kube-Virt, GPU Operator), advanced CNI, custom schedulers, and etcd care-and-feeding Strong skills in Go or Python, plus Bash; you write the tools you can't find Infrastructure-as-Code mastery (Terraform, Ansible, Packer), GitOps workflows, and container build systems Monitoring/alerting stacks (Grafana), chaos/latency testing, synthetic probes Clear architectural thinking, crisp documentation, and calm communication under pressure
Ready to architect zero-downtime, sub-minute rollouts for thousands of GPUs? Apply and let's run the world's AI together.
Benefits
We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.
Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.
Generous paid time off - vacation, sick days, public holidays Meaningful stock options - share in the upside you create Remote-first setup - work from home anywhere we can employ you Flexible hours - own your schedule outside core collaboration blocks Family leave - paid maternity, paternity, and caregiver time Company retreats - twice-yearly gatherings in inspiring locations
Seniority level
Seniority level Mid-Senior level Employment type
Employment type Full-time Job function
Job function Other Industries IT Services and IT Consulting Referrals increase your chances of interviewing at Runware by 2x Get notified about new DevOps Manager jobs in
Pakistan . Senior DevOps Engineer (Golang) - Fully REMOTE
Senior Software Engineer - Oracle Retail Dev, Lotus's - REMOTE
Senior C++ Software Engineer (100% Remote - Pakistan)
We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.
#J-18808-Ljbffr
Cloud DevOps Engineer / Cloud Infrastructure Administrator
Posted 13 days ago
Job Viewed
Job Description
AlphaBOLD is actively seeking a Sr DevOps Engineer to enhance its IT/DevOps team. In this role, you will work closely with the DevOps Lead to support and implement effective cloud infrastructure and deployment strategies across various platforms. We are seeking a DevOps Engineer with expertise in designing and operating CI/CD automation pipelines using Azure DevOps.
The ideal candidate will develop and manage CI/CD processes for various applications, automate processes, support development teams with technical questions on continuous integration/delivery, and manage the full release process.
Technology Stack:Our technology stack includes Azure, Azure DevOps, GitHub, AWS, Bitbucket, Jenkins, Docker, Kubernetes, Puppet, MySQL, MongoDB, PHP, Node.js, Flutter, Go, SignalR, and more.
Requirements:- Strong knowledge of public cloud platforms (Azure, AWS, GCP).
- Sound knowledge of CI/CD processes, tools, and best practices.
- Expertise in Docker, Kubernetes, AWS ECS, Azure container app/instances, and ECR.
- Ability to implement infrastructure as code, with expertise in Terraform for infrastructure deployments.
- Expertise in Azure DevOps, Jenkins for automated deployments.
- Familiarity with DevOps as a process to enhance product delivery and support.
- Ability to transform an organization from Agile/Waterfall to DevOps and CI/CD.
- Help reduce deployment time from days to minutes with simple GIT push deployments.
- Intelligent monitoring of application performance to optimize operations.
- Hands-on knowledge of cloud computing, containers, and Kubernetes from a system developer or SRE perspective.
- Strong knowledge of Windows, Linux and its popular distributions.
- Experienced in Bash scripting and python.
- Minimum Bachelor’s Degree in Computer Science or Information Technology.
- 4-6 years of relevant experience.
- Proven work experience as a DevOps Engineer or similar role.
- Excellent problem-solving, communication, and interpersonal skills.
- Excellent organizational skills with the ability to work collaboratively on multiple projects in a deadline-oriented environment.
- Excellent customer service skills
- Passion for technology and learning
- Ability to communicate and interact effectively, in a professional manner with technical & non-technical staff (both verbal & written
- Ability to quickly adapt to technology and/or application changes and business delivery priorities.
- Microsoft Azure Administrator and DevOps Engineer Expert certification will be plus.
- Competitive salary and benefits
- Dollar Pegging
- Internet and Gym Reimbursements.
- Company Sponsored Subsidized Lunch
- Paid holidays and vacations.
- Medical outpatient reimbursement and inpatient facility.
- Opportunities to make a difference in a small, yet highly productive environment.
- Provident Fund
- Employee Centric Benefits and Policies
- Company sponsored certifications
- 1-5 Year service rewards
- USA-H1 Visa Sponsorship
Cloud Infrastructure and Database Operations Engineer
Posted 13 days ago
Job Viewed
Job Description
We are looking for highly skilled Cloud Infrastructure and Database Operations Engineer. The purpose of the Cloud Infrastructure and Database Operations Engineer is to deliver a sustainable, high-quality, end-to-end cloud infrastructure hosting service that supports the Company’s products and the Inspire platform. This role is responsible for continuously improving the infrastructure service of the SaaS platform to enhance the organization’s and its clients’ capabilities.
The primary areas of focus for this role include platform infrastructure delivery, platform infrastructure operations, service management, and database administration.
Responsibilities:- Maintain the continuous availability of the Inspire SaaS Product cloud infrastructure through automated monitoring, redundancy, resilience, and service protection.
- Perform critical and major incident services related to the Inspire SaaS Infrastructure to minimize service impact.
- Work closely with software development and support teams on product delivery, security, compliance, and benchmarking.
- Maintain and administer highly available database systems in a 24/7 cloud environment. Ensure databases are secure, available, and recoverable.
- Demonstrate strong database performance tuning skills in a large database environment serving millions of requests.
- Work with internal stakeholders to ensure governance of Company’s data and compliance with policies (GDPR, CCPA, SOC, PCI, etc.).
- Manage stakeholder expectations by keeping them informed of relevant SaaS Infrastructure risks and issues.
- Manage Windows and Network Operations within VM/Azure.
- Monitor, troubleshoot, and install cloud-based platforms across all systems to meet defined business service levels.
- Research and resolve customer-impacting database events efficiently.
- Coordinate scheduled maintenance activities.
- Other duties as needed.
- Undergraduate degree in a related field (e.g., BSc in Computer Science).
- 5+ years of experience in Azure, AWS, SaaS, or Cloud environments.
- Experience in cloud and IT infrastructure managed services with a global delivery model for enterprise SaaS companies.
- Strong understanding of cloud computing concepts, virtualization, storage solutions, and networking. Certifications such as CCNA, AWS, or Azure are preferred.
- Experience in IT software development with knowledge of SDLC methodologies (Agile, Kanban, Scrum).
- Familiarity with next-gen technologies, including Product Engineering, Cloud, Data, SaaS products, Artificial Intelligence, and Machine Learning.
- Strong understanding of database technologies, administration, performance tuning, and high availability (ideally SQL Server DBA).
- Technical knowledge of Microsoft Windows Server technologies and administration.
- Ability to troubleshoot TCP/IP, DNS, and firewall issues.
- Provident Fund and Medical Allowances
- Professional Training and Certifications
- Paid Time Off
- Semi-Annual Performance Bonus and Awards
Cloud Infrastructure and Database Operations Engineer
Posted 25 days ago
Job Viewed
Job Description
Cloud Infrastructure and Database Operations Engineer.
The purpose of the
Cloud Infrastructure and Database Operations Engineer
is to deliver a sustainable, high-quality, end-to-end cloud infrastructure hosting service that supports the Company’s products and the Inspire platform. This role is responsible for continuously improving the infrastructure service of the SaaS platform to enhance the organization’s and its clients’ capabilities. The primary areas of focus for this role include platform infrastructure delivery, platform infrastructure operations, service management, and database administration. Responsibilities:
Maintain the continuous availability of the Inspire SaaS Product cloud infrastructure through automated monitoring, redundancy, resilience, and service protection. Perform critical and major incident services related to the Inspire SaaS Infrastructure to minimize service impact. Work closely with software development and support teams on product delivery, security, compliance, and benchmarking. Maintain and administer highly available database systems in a 24/7 cloud environment. Ensure databases are secure, available, and recoverable. Demonstrate strong database performance tuning skills in a large database environment serving millions of requests. Work with internal stakeholders to ensure governance of Company’s data and compliance with policies (GDPR, CCPA, SOC, PCI, etc.). Manage stakeholder expectations by keeping them informed of relevant SaaS Infrastructure risks and issues. Manage Windows and Network Operations within VM/Azure. Monitor, troubleshoot, and install cloud-based platforms across all systems to meet defined business service levels. Research and resolve customer-impacting database events efficiently. Coordinate scheduled maintenance activities. Other duties as needed. Requirements:
Undergraduate degree in a related field (e.g., BSc in Computer Science). 5+ years of experience in Azure, AWS, SaaS, or Cloud environments. Experience in cloud and IT infrastructure managed services with a global delivery model for enterprise SaaS companies. Strong understanding of cloud computing concepts, virtualization, storage solutions, and networking. Certifications such as CCNA, AWS, or Azure are preferred. Experience in IT software development with knowledge of SDLC methodologies (Agile, Kanban, Scrum). Familiarity with next-gen technologies, including Product Engineering, Cloud, Data, SaaS products, Artificial Intelligence, and Machine Learning. Strong understanding of database technologies, administration, performance tuning, and high availability (ideally SQL Server DBA). Technical knowledge of Microsoft Windows Server technologies and administration. Ability to troubleshoot TCP/IP, DNS, and firewall issues. Perks and Benefits:
Provident Fund and Medical Allowances Professional Training and Certifications Paid Time Off Semi-Annual Performance Bonus and Awards
#J-18808-Ljbffr
Be The First To Know
About the latest Infrastructure lead Jobs in Pakistan !