Didn't find the right job?

Get expert career advice to help you find the ideal role and improve your job search strategy.

5 Infrastructure Lead jobs in Pakistan

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Runware Inc.

Posted 13 days ago

Tap Again To Close

Job Description

1 week ago Be among the first 25 applicants

Company Description

Runware is the fastest AI-as-a-Service platform for media generation

Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.

Company Description

Runware is the fastest AI-as-a-Service platform for media generation

Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.

Join Runware to power the AI products that are changing the world

At Runware you'll collaborate with the world's leading AI teams, turning cutting-edge research into breakthrough products for thousands of clients. New models hit the market every week, and our job isn't just to keep pace—it's to stay two steps ahead, delivering unbeatable speed and performance every time.

That takes a special kind of teammate: driven, self-directed, lightning-quick to learn, and rock-solid reliable. If you thrive on building ambitious things with people who work hard, care for one another, and refuse to settle for "good enough," you'll feel right at home.

Resumés matter, but passion, grit, and proof of excellence matter more—whether you honed your skills in a research lab, at work, or taught yourself at 2 a.m. If that sounds like you, let's talk.

About The Role

This is a full-time remote role for a DevOps Lead - Bare-Metal & GPU Infrastructure (Linux). The successful candidate will be responsible for ensuring 99.999% service availability and optimum usage/scale infrastructure ratios while shipping code across hundreds of Linux GPU servers in multiple data-center locations.

Responsibilities

Fleet reliability - design and automate HA architectures that tolerate node, rack, or site failure without user impact

Ultra-fast delivery - build zero-touch CI/CD pipelines (GitOps, progressive rollout, instant rollback) that push config or container changes globally in under 10m

Bare-metal lifecycle - PXE/Redfish/IPMI bootstrapping, firmware & driver orchestration, per-node GPU tuning, automated de-commissioning

Kubernetes on metal - multi-cluster control-plane HA, GPU scheduling, CNI overlay (Cilium/Calico), MetalLB/Ingress → <50 ms failover

Observability at scale - end-to-end metrics, logs, traces, actionable SLO dashboards, and predictive auto-healing

Incident command - primary on-call lead; run blameless post-mortems and automate root-cause fixes

Capacity bursts - script server bring-up (Ansible/Terraform/Cluster-API) so 100+ new GPUs go live in minutes

Security & compliance - kernel-level hardening, secrets management, GPU multi-tenancy isolation, continuous CVE patching

Mentorship - guide a small SRE/DevOps pod, set coding standards, and champion best practices

In Your First 12 Months You Will:

Cut average deployment latency to ≤ 2m end-to-end, with one-click rollbacks

Maintain ≤ 5 min total annual user-visible downtime (five nines) across all sites

Automate server bring-up to <10 min from rack power-on to production workload

Reduce P1 incidents by ≥ 60% through predictive alerting and auto-remediation

Deliver fully auditable, Git-centric change pipelines adopted by 100% of engineering

Requirements

5+ yrs Linux SRE/DevOps with 100+ bare-metal node fleets; 2+ yrs as technical lead
Deep knowledge of NVIDIA/AMD GPU servers, high-speed interconnects (40 GbE+/InfiniBand/RoCE), NVMe/RDMA storage
Proven record sustaining ≥ 99.999% uptime in latency-sensitive, high-variance demand environments
Expert in Kubernetes on bare metal (Cluster-API, Kube-Virt, GPU Operator), advanced CNI, custom schedulers, and etcd care-and-feeding
Strong skills in Go or Python, plus Bash; you write the tools you can't find
Infrastructure-as-Code mastery (Terraform, Ansible, Packer), GitOps workflows, and container build systems
Monitoring/alerting stacks (Grafana), chaos/latency testing, synthetic probes
Clear architectural thinking, crisp documentation, and calm communication under pressure

Ready to architect zero-downtime, sub-minute rollouts for thousands of GPUs? Apply and let's run the world's AI together.

Benefits

We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

Generous paid time off - vacation, sick days, public holidays
Meaningful stock options - share in the upside you create
Remote-first setup - work from home anywhere we can employ you
Flexible hours - own your schedule outside core collaboration blocks
Family leave - paid maternity, paternity, and caregiver time
Company retreats - twice-yearly gatherings in inspiring locations

Seniority level

Seniority level Mid-Senior level

Employment type

Employment type Full-time

Job function

Job function Other
Industries IT Services and IT Consulting

Referrals increase your chances of interviewing at Runware by 2x

Get notified about new DevOps Manager jobs in Pakistan .

Senior DevOps Engineer (Golang) - Fully REMOTE Senior Software Engineer - Oracle Retail Dev, Lotus's - REMOTE Senior C++ Software Engineer (100% Remote - Pakistan)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Karachi, Sindh Runware Inc.

Posted 13 days ago

Tap Again To Close

Job Description

1 week ago Be among the first 25 applicants Company Description

Runware is the fastest AI-as-a-Service platform for media generation

Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime. Company Description

Runware is the fastest AI-as-a-Service platform for media generation

Runware is an AI-as-a-Service platform that delivers real-time inference at 5-10× lower cost than competitors. Our platform is purpose-built for speed & efficiency: custom GPU design, server setup, and datacenter architecture matched with performance-optimized software and a best-in-class API. Engineering teams who work with Runware save up to 80% on inference, improve response times, and scale instantly across 300K+ AI models, all through a single flexible API. Usage-based pricing and on-demand capacity are already battle-tested by Wix, OpenArt, NightCafe, Freepik, and thousands more. Backed by Insight Partners, a16z Speedrun, Begin Capital, and Zero Prime.

Join Runware to power the AI products that are changing the world

At Runware you'll collaborate with the world's leading AI teams, turning cutting-edge research into breakthrough products for thousands of clients. New models hit the market every week, and our job isn't just to keep pace—it's to stay two steps ahead, delivering unbeatable speed and performance every time.

That takes a special kind of teammate: driven, self-directed, lightning-quick to learn, and rock-solid reliable. If you thrive on building ambitious things with people who work hard, care for one another, and refuse to settle for "good enough," you'll feel right at home.

Resumés matter, but passion, grit, and proof of excellence matter more—whether you honed your skills in a research lab, at work, or taught yourself at 2 a.m. If that sounds like you, let's talk.

About The Role

This is a full-time remote role for a DevOps Lead - Bare-Metal & GPU Infrastructure (Linux). The successful candidate will be responsible for ensuring 99.999% service availability and optimum usage/scale infrastructure ratios while shipping code across hundreds of Linux GPU servers in multiple data-center locations.

Responsibilities

Fleet reliability - design and automate HA architectures that tolerate node, rack, or site failure without user impact Ultra-fast delivery - build zero-touch CI/CD pipelines (GitOps, progressive rollout, instant rollback) that push config or container changes globally in under 10m Bare-metal lifecycle - PXE/Redfish/IPMI bootstrapping, firmware & driver orchestration, per-node GPU tuning, automated de-commissioning Kubernetes on metal - multi-cluster control-plane HA, GPU scheduling, CNI overlay (Cilium/Calico), MetalLB/Ingress →

Observability at scale - end-to-end metrics, logs, traces, actionable SLO dashboards, and predictive auto-healing Incident command - primary on-call lead; run blameless post-mortems and automate root-cause fixes Capacity bursts - script server bring-up (Ansible/Terraform/Cluster-API) so 100+ new GPUs go live in minutes Security & compliance - kernel-level hardening, secrets management, GPU multi-tenancy isolation, continuous CVE patching Mentorship - guide a small SRE/DevOps pod, set coding standards, and champion best practices

In Your First 12 Months You Will:

Cut average deployment latency to ≤ 2m end-to-end, with one-click rollbacks Maintain ≤ 5 min total annual user-visible downtime (five nines) across all sites Automate server bring-up to

Reduce P1 incidents by ≥ 60% through predictive alerting and auto-remediation Deliver fully auditable, Git-centric change pipelines adopted by 100% of engineering

Requirements

5+ yrs Linux SRE/DevOps with 100+ bare-metal node fleets; 2+ yrs as technical lead Deep knowledge of NVIDIA/AMD GPU servers, high-speed interconnects (40 GbE+/InfiniBand/RoCE), NVMe/RDMA storage Proven record sustaining ≥ 99.999% uptime in latency-sensitive, high-variance demand environments Expert in Kubernetes on bare metal (Cluster-API, Kube-Virt, GPU Operator), advanced CNI, custom schedulers, and etcd care-and-feeding Strong skills in Go or Python, plus Bash; you write the tools you can't find Infrastructure-as-Code mastery (Terraform, Ansible, Packer), GitOps workflows, and container build systems Monitoring/alerting stacks (Grafana), chaos/latency testing, synthetic probes Clear architectural thinking, crisp documentation, and calm communication under pressure

Ready to architect zero-downtime, sub-minute rollouts for thousands of GPUs? Apply and let's run the world's AI together.

Benefits

We're a remote-first collective, meeting in person twice a year to plan, brainstorm, celebrate wins, and enjoy some face-to-face time. We have core hours for cooperative working and calls, but outside of that your calendar is yours. Work the hours that let you perform at your peak while also building a healthy life.

Our release cycles are fast and intense, but they're followed by real downtime. After big pushes we expect the team to unplug, recharge, and come back ready & stronger than ever for the next leap.

Generous paid time off - vacation, sick days, public holidays Meaningful stock options - share in the upside you create Remote-first setup - work from home anywhere we can employ you Flexible hours - own your schedule outside core collaboration blocks Family leave - paid maternity, paternity, and caregiver time Company retreats - twice-yearly gatherings in inspiring locations

Seniority level

Seniority level Mid-Senior level Employment type

Employment type Full-time Job function

Job function Other Industries IT Services and IT Consulting Referrals increase your chances of interviewing at Runware by 2x Get notified about new DevOps Manager jobs in

Pakistan . Senior DevOps Engineer (Golang) - Fully REMOTE

Senior Software Engineer - Oracle Retail Dev, Lotus's - REMOTE

Senior C++ Software Engineer (100% Remote - Pakistan)

We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI.

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Cloud DevOps Engineer / Cloud Infrastructure Administrator

Lahore, Punjab AlphaBOLD

Posted 13 days ago

Tap Again To Close

Job Description

Job Description:

AlphaBOLD is actively seeking a Sr DevOps Engineer to enhance its IT/DevOps team. In this role, you will work closely with the DevOps Lead to support and implement effective cloud infrastructure and deployment strategies across various platforms. We are seeking a DevOps Engineer with expertise in designing and operating CI/CD automation pipelines using Azure DevOps.

The ideal candidate will develop and manage CI/CD processes for various applications, automate processes, support development teams with technical questions on continuous integration/delivery, and manage the full release process.

Technology Stack:

Our technology stack includes Azure, Azure DevOps, GitHub, AWS, Bitbucket, Jenkins, Docker, Kubernetes, Puppet, MySQL, MongoDB, PHP, Node.js, Flutter, Go, SignalR, and more.

Requirements:

Strong knowledge of public cloud platforms (Azure, AWS, GCP).
Sound knowledge of CI/CD processes, tools, and best practices.
Expertise in Docker, Kubernetes, AWS ECS, Azure container app/instances, and ECR.
Ability to implement infrastructure as code, with expertise in Terraform for infrastructure deployments.
Expertise in Azure DevOps, Jenkins for automated deployments.
Familiarity with DevOps as a process to enhance product delivery and support.
Ability to transform an organization from Agile/Waterfall to DevOps and CI/CD.
Help reduce deployment time from days to minutes with simple GIT push deployments.
Intelligent monitoring of application performance to optimize operations.
Hands-on knowledge of cloud computing, containers, and Kubernetes from a system developer or SRE perspective.
Strong knowledge of Windows, Linux and its popular distributions.
Experienced in Bash scripting and python.

Required Skills & Experience:

Minimum Bachelor’s Degree in Computer Science or Information Technology.
4-6 years of relevant experience.
Proven work experience as a DevOps Engineer or similar role.
Excellent problem-solving, communication, and interpersonal skills.

Roles & Responsibilities:

Excellent organizational skills with the ability to work collaboratively on multiple projects in a deadline-oriented environment.
Excellent customer service skills
Passion for technology and learning
Ability to communicate and interact effectively, in a professional manner with technical & non-technical staff (both verbal & written
Ability to quickly adapt to technology and/or application changes and business delivery priorities.
Microsoft Azure Administrator and DevOps Engineer Expert certification will be plus.

What We Offer:

Competitive salary and benefits
Dollar Pegging
Internet and Gym Reimbursements.
Company Sponsored Subsidized Lunch
Paid holidays and vacations.
Medical outpatient reimbursement and inpatient facility.
Opportunities to make a difference in a small, yet highly productive environment.
Provident Fund
Employee Centric Benefits and Policies
Company sponsored certifications
1-5 Year service rewards
USA-H1 Visa Sponsorship

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Cloud Infrastructure and Database Operations Engineer

Lahore, Punjab Cinnova Technologies, LLC

Posted 13 days ago

Tap Again To Close

Job Description

We are looking for highly skilled Cloud Infrastructure and Database Operations Engineer. The purpose of the Cloud Infrastructure and Database Operations Engineer is to deliver a sustainable, high-quality, end-to-end cloud infrastructure hosting service that supports the Company’s products and the Inspire platform. This role is responsible for continuously improving the infrastructure service of the SaaS platform to enhance the organization’s and its clients’ capabilities.

The primary areas of focus for this role include platform infrastructure delivery, platform infrastructure operations, service management, and database administration.

Responsibilities:

Maintain the continuous availability of the Inspire SaaS Product cloud infrastructure through automated monitoring, redundancy, resilience, and service protection.
Perform critical and major incident services related to the Inspire SaaS Infrastructure to minimize service impact.
Work closely with software development and support teams on product delivery, security, compliance, and benchmarking.
Maintain and administer highly available database systems in a 24/7 cloud environment. Ensure databases are secure, available, and recoverable.
Demonstrate strong database performance tuning skills in a large database environment serving millions of requests.
Work with internal stakeholders to ensure governance of Company’s data and compliance with policies (GDPR, CCPA, SOC, PCI, etc.).
Manage stakeholder expectations by keeping them informed of relevant SaaS Infrastructure risks and issues.
Manage Windows and Network Operations within VM/Azure.
Monitor, troubleshoot, and install cloud-based platforms across all systems to meet defined business service levels.
Research and resolve customer-impacting database events efficiently.
Coordinate scheduled maintenance activities.
Other duties as needed.

Requirements:

Undergraduate degree in a related field (e.g., BSc in Computer Science).
5+ years of experience in Azure, AWS, SaaS, or Cloud environments.
Experience in cloud and IT infrastructure managed services with a global delivery model for enterprise SaaS companies.
Strong understanding of cloud computing concepts, virtualization, storage solutions, and networking. Certifications such as CCNA, AWS, or Azure are preferred.
Experience in IT software development with knowledge of SDLC methodologies (Agile, Kanban, Scrum).
Familiarity with next-gen technologies, including Product Engineering, Cloud, Data, SaaS products, Artificial Intelligence, and Machine Learning.
Strong understanding of database technologies, administration, performance tuning, and high availability (ideally SQL Server DBA).
Technical knowledge of Microsoft Windows Server technologies and administration.
Ability to troubleshoot TCP/IP, DNS, and firewall issues.

Perks and Benefits:

Provident Fund and Medical Allowances
Professional Training and Certifications
Paid Time Off
Semi-Annual Performance Bonus and Awards

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Cloud Infrastructure and Database Operations Engineer

Lahore, Punjab Cinnova Technologies, LLC

Posted 25 days ago

Tap Again To Close

Job Description

We are looking for highly skilled

Cloud Infrastructure and Database Operations Engineer.

The purpose of the

Cloud Infrastructure and Database Operations Engineer

is to deliver a sustainable, high-quality, end-to-end cloud infrastructure hosting service that supports the Company’s products and the Inspire platform. This role is responsible for continuously improving the infrastructure service of the SaaS platform to enhance the organization’s and its clients’ capabilities. The primary areas of focus for this role include platform infrastructure delivery, platform infrastructure operations, service management, and database administration. Responsibilities:

Maintain the continuous availability of the Inspire SaaS Product cloud infrastructure through automated monitoring, redundancy, resilience, and service protection. Perform critical and major incident services related to the Inspire SaaS Infrastructure to minimize service impact. Work closely with software development and support teams on product delivery, security, compliance, and benchmarking. Maintain and administer highly available database systems in a 24/7 cloud environment. Ensure databases are secure, available, and recoverable. Demonstrate strong database performance tuning skills in a large database environment serving millions of requests. Work with internal stakeholders to ensure governance of Company’s data and compliance with policies (GDPR, CCPA, SOC, PCI, etc.). Manage stakeholder expectations by keeping them informed of relevant SaaS Infrastructure risks and issues. Manage Windows and Network Operations within VM/Azure. Monitor, troubleshoot, and install cloud-based platforms across all systems to meet defined business service levels. Research and resolve customer-impacting database events efficiently. Coordinate scheduled maintenance activities. Other duties as needed. Requirements:

Undergraduate degree in a related field (e.g., BSc in Computer Science). 5+ years of experience in Azure, AWS, SaaS, or Cloud environments. Experience in cloud and IT infrastructure managed services with a global delivery model for enterprise SaaS companies. Strong understanding of cloud computing concepts, virtualization, storage solutions, and networking. Certifications such as CCNA, AWS, or Azure are preferred. Experience in IT software development with knowledge of SDLC methodologies (Agile, Kanban, Scrum). Familiarity with next-gen technologies, including Product Engineering, Cloud, Data, SaaS products, Artificial Intelligence, and Machine Learning. Strong understanding of database technologies, administration, performance tuning, and high availability (ideally SQL Server DBA). Technical knowledge of Microsoft Windows Server technologies and administration. Ability to troubleshoot TCP/IP, DNS, and firewall issues. Perks and Benefits:

Provident Fund and Medical Allowances Professional Training and Certifications Paid Time Off Semi-Annual Performance Bonus and Awards

#J-18808-Ljbffr

This advertiser has chosen not to accept applicants from your region.

Be The First To Know

About the latest Infrastructure lead Jobs in Pakistan !

Set Email Alert:

Enter your email

Job title

Location

Industry

View All Infrastructure Lead Jobs

Menu

Search Suggestions

Recent Searches

Popular Searches

Location Suggestions

Popular Locations

Nearby Locations

Other Jobs Near Me

Industry

5 Infrastructure Lead jobs in Pakistan

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Job Description

DevOps Lead - Bare-Metal & GPU Infrastructure (Linux)

Job Description

Cloud DevOps Engineer / Cloud Infrastructure Administrator

Job Description

Cloud Infrastructure and Database Operations Engineer

Job Description

Cloud Infrastructure and Database Operations Engineer

Job Description

Be The First To Know

Nearby Locations

Other Jobs Near Me

Industry