Tag Archives: comparative genomics

SFASTA: Fast Index building

What is SFASTA? Genomic and bioinformatic-adjacent sequences (RNA, Protein, Peptides) are stored as FASTA files. Sequencing reads off a machine are stored as FASTQ files, adding a quality score associated with each nucleotide. Currently, these are non-human-readable plaintext files. As sequencing increases, we need to be able to process many more gigabytes and terabytes of files rapidly and… Read More »

Machine Learning for Variant calling with DeepVariant from Google Brain

Last December Google Brain released DeepVariant, a machine-learning based variant caller using convolutional neural networks. While PacBio and Nanopore (long-read) sequencing become more mainstream, there exist massive amounts of data from 2nd generation sequencing* for populations which still have lots of use. For the Medicago HapMap project, we have 262 accessions with various depth of 2nd generation sequencing.… Read More »

ODG, the Omics Database Generator, has been published

ODG: Omics Database Generator has been published in BMC Bioinformatics and is available online now. ODG is a tool that allows users to supply -omics data and ODG will integrate the data into a coherent database and generate a web-based user-interface. Advanced users can query the database directly, through a programming language or by using the CYPHER query language. ODG uses Neo4j’s… Read More »