Denis Jacob Machado
I am a bioinformatician and phylogeneticist aiming to bridge the gap between pure and applied research. Therefore, I am involved in various research projects that range from viral genome annotation and evolution to comparative genomics of non-model organisms. These projects are connected by the methodology used to understand evolutionary processes and reconstruct the resulting evolutionary histories. In response to the COVID -19 pandemic, I've shifted my attention to developing phylogenetic methodologies and bioinformatics tools to understand the fundamental origins of emergent viral diseases and prose strategies to prevent or respond faster to new viral threats.
The HPC bottlenecks in the genomic surveillance of SARS-CoV-2
The coronavirus disease 2019 (COVID-19) pandemic has seen a massive application of in silico approaches from deciphering the virus infectivity in humans to drug discovery. Unfortunately, all these applications have different high-performance computing (HPC) bottlenecks. This work will focus on the HPC bottlenecks in genomic surveillance of SARS- CoV-2, the virus that causes COVID-19. Genomic surveillance of SARS-CoV-2 can help monitor how new variants emerge, identify the most significant changes associated with each new variant, and understand how these changes might impact health. Some HPC challenges are not unique to the genomic surveillance of SARS-CoV-2. They include database, disk, CPU, network, and memory bottlenecks. However, other bottlenecks involve co-opting phylogenetic tools developed by evolutionary biologists to categorize the diversity of life. On the one hand, many of these phylogenetic tools were not designed to be highly parallelizable or to process the amount of data that the high -throughput DNA sequencing produces. On the other hand, these tools frequently involve heuristic solutions that require expert interpretation and are based on biological and philosophical assumptions that may be opaque to the end-user. This work will highlight the importance of data and knowledge sharing in overcoming the bioinformatics bottlenecks for tracking pandemic variants faster. It will also argue that HPC applications to genomic surveillance depend on continuous rather than reactionary research.