BiopythonはbioperlよりもPDBファイルの処理が速かった

06 09 2006 perl bioinformatics Python bioperl Tweet

BiopythonはbioperlよりもPDBファイルの処理が速かった

みんなのpythonも読んだし、フレームワークでも試してみるかと、djangoとかTurbogearsに手をだしたら、火傷した。

djangoは一通りサンプル動作をさせることができたけど、Turbogearsはコントローラのあたりからさっぱりわからん。
いきなりフレームワークは無謀すぎたかなということで、Biopythonでも入れて、もう少しpython慣れすることに。

easy_install numpy

wget http://www.egenix.com/files/python/egenix-mx-base-2.0.6.tar.gz
python setup.py install

svn co http://www.reportlab.co.uk/svn/public/reportlab/trunk
python setup.py install

wget http://biopython.org/DIST/biopython-1.42.tar.gz
python setup.py install

easy_installはperlでいうところのcpanコマンドみたいで楽チンです（ちゃんと動けば）。動かないのは地味にダウンロード、展開、setup.pyを行なった。

さて、インストールも無事に終了したところで、bioperlと比べてみた。比較したのはpdbファイルの処理。というのは、FMO用のインプット作るときに使っているPDBパーザーつまりBio::Structure::IOがやたらと遅く、不満だったからというありがちな理由。

コードはこんな感じで、1fatっていうpdbファイル読み込んでCalphaを出力してみた。

python

from Bio.PDB.PDBParser import PDBParser

parser=PDBParser(PERMISSIVE=1)
structure=parser.get_structure("1fat", "1fat.pdb")
for model in structure.get_list():
    for chain in model.get_list():
        for residue in chain.get_list():
            if residue.has_id("CA"):
                ca_atom=residue["CA"]
                print ca_atom.get_coord()

perl

use strict;
use Bio::Structure::IO;

my $pdb_file = "1fat.pdb";
my $structio =  Bio::Structure::IO->new(-file => "$pdb_file",
                            -format => 'PDB');

my $struc = $structio->next_structure;
for my $chain ($struc->get_chains) {
    for my $res ($struc->get_residues($chain)) {
        for my $atom ($struc->get_atoms($res)) {
            print join " ",$atom->xyz,"\n" if $atom->pdb_atomname eq " CA ";        
        }
    }
}

で、ベンチマーク

$ time python test.py
real    0m1.161s
user    0m0.996s
sys     0m0.116s

$ time perl pdbtest.pl
real    0m3.183s
user    0m3.056s
sys     0m0.044s

平均とってないけど、何回か実行してみてもpythonのほうがbioperlよりも3倍くらい速かった。bioperlのコードってなんだかなと思うことがままある。pdb_atomnameなんかも4文字で前後にスペース入ってるし。というわけで、SBDD周りのプログラミングはbiopythonのほうが扱いやすい感じがしてる。

みんなのPython
柴田淳
ソフトバンククリエイティブ / ?円 ( 2006-08-22 )

余談だが、Turbogears普通に触れる程度までpython覚えてやる（意地でも）とか思ったので、Dive Into Pythonも読み中。

Drkcore

BiopythonはbioperlよりもPDBファイルの処理が速かった