Drkcore

full_results %>% + mutate(Significant = adj.P.Val < p_cutoff & abs(logFC) > fc_cutoff ) %>% + mutate(Rank = 1:n(), Label = ifelse(Rank < topN, Symbol,"")) %>% + ggplot(aes(x = logFC, y = B, col=Significant,label=Label)) + geom_point() + geom_text_repel(col="black")

26012014 R Python

pandasとggplotを学ぶための本

最近グラフはPandas+ggplotで描いていて、深夜のバッチ処理で毎日沢山のグラフを生成させている。

rpy2？

使ってないなーみたいな。

Python版ggplotはR版のAPIと同じものを提供することをゴールにしているのでggplotのクックブックを読んでおくといいことがあります。六角形の密度プロットとか早く実装されないかなー。

Rグラフィックスクックブック ―ggplot2によるグラフ作成のレシピ集
Winston Chang
オライリージャパン / 3570円 ( 2013-11-30 )

尚、下の本も参考になります。

グラフィックスのためのRプログラミング
H. ウィッカム
丸善出版 / 4200円 ( 2012-02-29 )

そして、データの操作はPandasで。

Pythonによるデータ分析入門 ―NumPy、pandasを使ったデータ処理
Wes McKinney
オライリージャパン / 3780円 ( 2013-12-26 )

機械学習はscikit-learnを使えばよい。

library(plyr) library(ggplot2) library(scales) setwd("/Users//kzfm/lang/rcode/tw") tweets <- read.delim("data.tsv", sep="\t", stringsAsFactors=FALSE, header=TRUE) tweet.counts <- ddply(tweets, .(Date), nrow) date.range <- seq.Date(from=as.Date("2012-10-20"), to=as.Date("2013-8-12"), by="day") date.strings <- strftime(date.range, "%Y-%m-%d") dates <- data.frame(date.strings) all.data <- merge(dates, tweet.counts, by.x=c("date.strings"), by.y=c("Date"), all=TRUE) names(all.data) <- c("Date", "Counts") all.data$Counts[is.na(all.data$Counts)] <- 0 all.data$Date <- as.Date(all.data$Date) ggplot(all.data, aes(x=Date, y=Counts)) + geom_line() + scale_x_date(breaks="2 week", labels=date_format("%Y-%m-%d")) + theme(axis.text.x=element_text(angle=-90))