Applying machine learning to maximize the potential in sequencing data
Looking forward, new sequencing instruments can lead to dramatic breakthroughs in the field. We believe machine learning (ML) can further unlock the potential of these instruments. Our new research partnership with Pacific Biosciences (PacBio), a developer of genomic sequence platforms, is a great example of how Google’s machine learning and algorithm development tools can help researchers unlock more information from sequencing data.
PacBio’s long-read HiFi sequencing provides the most comprehensive view of genomes, transcriptomes and epigenomes. Using PacBio’s technology in combination with DeepVariant, our award-winning variant detection method, researchers have been able to accurately identify diseases that are otherwise difficult to diagnose with alternative methods.
Additionally, we developed a new open source method called DeepConsensus that, in combination with PacBio’s sequencing platforms, creates more accurate reads of sequencing data. This boost in accuracy will help researchers apply PacBio’s technology to more challenges, such as the final completion of the Human Genome and assembling the genomes of all vertebrate species.
Supporting more equitable genomics resources and methods
Like other areas of health and medicine, the genomics field grapples with health equity issues that, if not addressed, could exclude certain populations. For example, the overwhelming majority of participants in genomic studies have historically been of European ancestry. As a result, the genomics resources that scientists and clinicians use to identify and filter genetic variants and to interpret the significance of these variants are not equally powerful across individuals of all ancestries.
In the past year, we’ve supported two initiatives aimed at improving methods and genomics resources for under-represented populations. We collaborated with 23andMe to develop an improved resource for individuals of African ancestry, and we worked with the UCSC Genomics Institute to develop pangenome methods with this work recently published in Science.
In addition, we recently published two open-source methods that improve genetic discovery by more accurately identifying disease labels and improving the use of health measurements in genetic association studies.
We hope that our work developing and sharing these methods with those in the field of genomics will improve overall health and the understanding of biology for everyone. Working together with our collaborators, we can apply this work to real-world applications.