Bioinformatics版のProject Euler

Drkcore

27 07 2014 bioinformatics Python Tweet

Bioinformatics版のProject Euler

RosalindというProject Eulerっぽいサイトがあったので少し遊んでみた。

楽しくbioinformaticsの問題を解きながらアルゴリズムを覚えていく感じなので、BioPythonを覚えるのにちょうどいいかなと思った。

GC Contentを計算するメソッドくらいあるだろうとは思うんだけど知らないので素朴に計算。リスト内包表記でmaxで取り出すのがいいかなと思ったけど、面倒くさいのでこれまたループで素朴に。

import sys
from Bio import SeqIO

def count_gc(seq):
    return float((seq_record.seq.count("G") + seq_record.seq.count("C"))) * 100 / len(seq_record.seq)

if __name__ == '__main__':
    fasta = sys.argv[1]
    highest_id = ""
    highest_content = 0
    for seq_record in SeqIO.parse(fasta, "fasta"):
        c = count_gc(seq_record)
        if c > highest_content:
            highest_id = seq_record.id
            highest_content = c

    print highest_id
    print "{0:.6f}".format(highest_content)

僕はBioPerlでバイオインフォマティクス関連のプログラミングをしてたので、BioPythonで配列解析をしたことは殆ど無いんですよね。

Bioinformatics Programming Using Python: Practical Programming for Biological Data (Animal Guide)
Mitchell L Model
O'Reilly Media / ?円 ( 2009-12-08 )

About

もう5年目(wishlistありマス♡)
最近はPythonとDeepLearning
日本酒自粛中
ドラムンベースからミニマルまで
ポケモンGOゆるめ