Category Archives: Machine Learning

SFASTA: Fast Index building

What is SFASTA? Genomic and bioinformatic-adjacent sequences (RNA, Protein, Peptides) are stored as FASTA files. Sequencing reads off a machine are stored as FASTQ files, adding a quality score associated with each nucleotide. Currently, these are non-human-readable plaintext files. As sequencing increases, we need to be able to process many more gigabytes and terabytes of… Read More »

Machine Learning for Variant calling with DeepVariant from Google Brain

Last December Google Brain released DeepVariant, a machine-learning based variant caller using convolutional neural networks. While PacBio and Nanopore (long-read) sequencing become more mainstream, there exist massive amounts of data from 2nd generation sequencing* for populations which still have lots of use. For the Medicago HapMap project, we have 262 accessions with various depth of… Read More »

Importing GloVe Embeddings into Tensorflow

GloVe is a useful tool for rapidly generating word embeddings. I am using this with DNA sequences now to experiment with machine learning techniques in genomics. Loading these embeddings into TensorFlow is essential for my experiments. Here is how to do it in Python.

Machine Learning Crash Course from Google

Earlier this month Google made their internal Machine Learning Crash Course available. You can read more about it on their developer blog. I have a few machine learning projects going, mostly to learn but also to create an alignment-free sequence origin-identification tool. The (unorganized, incomplete) code is available at my GitHub repository. I’m curious about methods… Read More »