Bio* Library for Clojure.

By | March 13, 2014

Biotools is my basic bioinformatics file parsing library. You can find it at GitHub. It can parse BLAST+ Tab output (-outfmt “6 std qlen slen”), ExPASY ENZYME.dat, FASTA, GFF3, FPKM Tracking files from Cufflinks, Interproscan tab delimited output, Gene Ontology/Plant Ontology OBO format (any Ontology in OBO format), PMN Pathways format, and a PSI-MITAB 2.5 format. These are all set up to work with reducers and other parallel Clojure constructs, and some, such as GFF and FASTA have built in fn’s to allow random access to very large files and will create the appropriate index for you.

It serves as the foundation for my Omics Database Generator (ODG), handling all the parsing for the database creation software.

I plan to work on the documentation, examples, and tests in the future, but if you have any questions feel free to ask right away. I built this library as I needed to parse many files quickly, using parallel techniques, and thought it would be a good way to learn Clojure and reducers. This is a mostly independent library, as it only depends on iota and clojure core files, although that may change in the future. It does not depend on BioJava at this time, and I do not want to write idiomatic wrappers for other libraries as I usually don’t have that kind of time (unfortunately). Any comments / critiques are welcome.