Help us discover
a new generation
of medicines

Back to open positions

Lead Data Engineer Distributed Computing

Location: San Diego, CA or Remote

Position Description: Empirico, an early-stage biotechnology company, is looking for a Lead Data Engineer that is motivated by the opportunity to develop high performance distributed data systems that allow us to analyze massive and complex biological datasets.  You will work closely with scientists and engineers that have a passion for building out novel systems and analytical approaches toward the treatment and prevention of disease.


Your responsibilities will focus on the design, implementation, and deployment of modern distributed data systems and pipelines.  You will be expected to:

  • Be a domain expert in the area of distributed data systems
  • Collaborate closely with an interdisciplinary team of scientists and engineers to design, develop, deploy, and support scalable data systems and pipelines.
  • Take lead on complex data-related problems around the modeling, processing, and analysis of the largest genetic and phenotypic datasets available
  • Take a leading role in identifying and resolving performance bottlenecks and pain points throughout our data infrastructure
  • Enhance and support our data platform as it scales


  • 5+ years of industry experience building distributed data systems
  • Strong technical skillset that spans a broad range of technologies, programming languages, and paradigms
  • Expert with Apache Spark or similar distributed frameworks and technologies
  • Proficient with multiple programming languages (preferably Scala, Python, and/or Java)
  • Experience processing, modeling, and analyzing large and heterogenous datasets
  • Demonstrated ability for writing robust, readable, and testable code
  • An interest in learning about the intersection of software engineering, genetics, and drug discovery
  • Applicants must have authorization to work in the United States

Empirico is a venture-backed, next-generation therapeutics company founded on utilizing huge biological datasets, human genetics and programmable biology to power novel target discovery and development. Empirico’s Precision Insights Platform was purpose-built for therapeutic discovery and leverages a world-leading dataset and advanced algorithmic approaches to identify and prioritize therapeutic targets with a high probability of translational success. High priority therapeutic targets are experimentally validated in-house prior to progressing through pre-clinical development with the most optimal therapeutic modality. Empirico is headquartered in San Diego, CA with laboratories in Madison, WI.

To all agencies: Please do not contact any employee of Empirico about this position. All resumes submitted by agencies to any employee of Empirico via-email or in any form and by any method will be deemed the sole property of Empirico, unless such agencies were engaged by Empirico for this position and a valid agreement is in place.

Empirico is an equal opportunity employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, sexual orientation, gender identity, disability status, protected veteran status, or any other characteristic protected by law.