【利用perl的基因数据处理】1.基础的的DNA文件读写和碱基特征统计

2017-10-05 21:24:03来源:CSDN作者:bincheng_人点击

分享

我尚且处于生物信息学的小白状态,前两天刚学习了perl的基础语法,之后通过对“Begin Perl for  Bioinformatics ”书籍的学习,将生物信息学的基础数据处理和perl语言的编程有了基础的知识框架和应用了解。

一下是在学习过程中对DNA的基本信息统计写的一串小代码,之后可能会将自己在学习过程中发现的一些实用小工具整合成pm包,方便以后使用:

use strict;use warnings;package dna_analyze;
# clean the data sub clean_data{my$dna_filename =shift @_;open(DNAFILE,$dna_filename)||die"Can't open the file!/nThe subroutine 's gonna exit!/n";my@DNA = <DNAFILE>;close DNAFILE;print"Have read the sequence!/n";my$DNA = join('',@DNA);$DNA =~s/[/s,/n]//g;@DNA =split('',$DNA);print"Have clean the data!/n";return@DNA;}
# the base number countersub base_analyze {my$dna_filename =shift @_;@DNA = clean_data($dna_filename);my$count_of_A = 0;my$count_of_T = 0;my$count_of_C = 0;my$count_of_G = 0;my$count_of_others = 0;my$total_count = 0;foreach(@DNA){$total_count++;if($_eq "A"){$count_of_A++;}elsif($_eq "T"){$count_of_T++;}elsif($_eq "G"){$count_of_G++;}elsif($_eq "C"){$count_of_C++;}else{print"This is a wrong base: $_/n";$count_of_others++;}}my$percent_of_A =$count_of_A /$total_count * 100;my$percent_of_T =$count_of_T /$total_count * 100;my$percent_of_C =$count_of_C /$total_count * 100;my$percent_of_G =$count_of_G /$total_count * 100;my$percent_of_others =$count_of_others /$total_count * 100;print"The count of A is: $count_of_A/n";print"The percent of A is: $percent_of_A/%/n";print"The count of T is: $count_of_T/n";print"The percent of T is: $percent_of_T/%/n";print"The count of C is: $count_of_C/n";print"The percent of C is: $percent_of_C/%/n";print"The count of G is: $count_of_G/n";print"The percent of G is: $percent_of_G/%/n";print"The count of wrong base is: $count_of_others/n";print"The percent of wrong base is: $percent_of_others/%/n";return ($total_count,$count_of_A,$count_of_C,$count_of_T,$count_of_G,$count_of_others);}
# the amount of CG base analysissub CG_analyze{my$dna_filename =shift @_;my@basic_data = base_analysis(my$dna_filename);my$count_of_CG =$basic_data[2] +$basic_data[4];my$percent_of_CG =$count_of_CG /$basic_data[0];print"The total number of base 'C' and 'G' is :$count_of_CG/n";print"The percent of 'C' and 'G' is $percent_of_CG/n";}


以上就是相关的代码,主要用于实现DNA序列的碱基计数和各个碱基的占比计算,CG碱基含量分析,是一些非常基础的基因组研究的必须代码。

相关文章

    无相关信息

最新文章

123

最新摄影

闪念基因

微信扫一扫

第七城市微信公众平台