Meta is seeking a forward thinking, experienced AI/ML (Artificial Intelligence/Machine Learning) Product Hardware Platform Lead Engineer to join the Data Center Site Operations team. The Product Hardware Platform Engineering (PHE) team is responsible for the overall performance of Meta's production compute, storage, and AI/ML platforms through their life-cycles in our data centers. This role will lead the subset of the PHE team that focuses on AI/ML platform hardware. AI/ML is an important priority for Meta that involves complex GPU based systems operating in shared computing clusters. The role scope is focused on maintaining and improving the health of the AI/ML platforms from verification testing into mass production through end-of-life. Key responsibilities include identifying systemic hardware, firmware, and tooling issues; engaging in hands-on problem solving; and collaborating effectively with cross-functional engineering and tooling teams to improve performance of the fleet. Our data centers, and the tens of thousands of servers installed in them, are the foundation upon which our rapidly scaling infrastructure efficiently operates and upon which our innovative services are delivered. Meta is at the leading edge of the global data center industry both in terms of how data centers are designed and operated. This person should enjoy working in a fast-paced environment where adaptability and flexibility will be key to their success.We seek an individual who can quickly absorb and understand the technical challenges of subject matter experts and local site operations teams, create alignment between these globally distributed teams as well as partner organizations, and can set informed priorities and direction while getting buy-in and commitment from relevant stakeholders.
Subscribe to job alerts and upload your resume!
*By registering with our site, you agree to our
Terms and Privacy Policy.