小编给大家分享一下perl如何提取miRNA前后500bp序列,相信大部分人都还不怎么了解,因此分享这篇文章给大家参考一下,希望大家阅读完这篇文章后大有收获,下面让我们一起去了解一下吧!
在没有位置信息时,提取miRNA前后500bp的序列
在提取miRNA前后500bp序列时,若没有其位置信息,提取会比较麻烦,可用以下方法提取:
1.首先利用blast软件将miRNA的前体序列与基因组比对,获取前体序列的位置信息,比对方法:
makeblastdb -in Donkey_Hic_genome.20180408.fa -dbtype nucl -title Donkey_Hic_genome.20180408.fa blastall -i miRNA_Pre.fa -d Donkey_Hic_genome.20180408.fa -p blastn -e 0.01 -b 5 -v 5 -m 8 -o blast.out #Donkey_Hic_genome.20180408.fa 参考基因组染色体序列 #miRNA_Pre.fa miRNA的前体序列
由于序列可能较短,所以e-value值可调高一些。比对结果如下:
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224 bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216 bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222 bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218 bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222 bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220 bta-miR-378 Chr13 96.88 32 1 0 67 98 32631593 32631624 3e-06 56.0 bta-miR-378 Chr13 93.94 33 2 0 62 94 29900614 29900582 2e-04 50.1 bta-miR-378 Chr07 100.00 27 0 0 71 97 65072541 65072515 1e-05 54.0 bta-miR-378 Chr30 96.55 29 1 0 69 97 28926810 28926782 2e-04 50.1 bta-miR-378 Chr18 93.94 33 2 0 62 94 21721834 21721866 2e-04 50.1 bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222 bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214 bta-miR-1 Chr07 88.89 63 7 0 32 94 26712206 26712268 2e-10 69.9 bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222 bta-let-7c Chr20 89.19 74 8 0 18 91 29703021 29703094 1e-14 83.8
然后对比对结果筛选,尽量选择完全匹配,并且匹配长度最长的。筛选后的blast.out结果如下:
bta-miR-34c Chr20 100.00 113 0 0 1 113 39385825 39385713 6e-57 224 bta-miR-370 Chr07 100.00 109 0 0 1 109 70327160 70327268 1e-54 216 bta-miR-34b Chr20 100.00 112 0 0 1 112 39386420 39386309 2e-56 222 bta-miR-383 Chr27 100.00 110 0 0 1 110 21958895 21959004 4e-55 218 bta-miR-449a Chr01 100.00 112 0 0 1 112 103603665 103603554 2e-56 222 bta-miR-378 Chr09 100.00 111 0 0 1 111 51049164 51049054 9e-56 220 bta-miR-206 Chr08 100.00 112 0 0 1 112 78794660 78794771 2e-56 222 bta-miR-1 Chr15 100.00 108 0 0 1 108 48711439 48711546 5e-54 214 bta-let-7c Chr18 100.00 112 0 0 1 112 32412721 32412610 2e-56 222
3.接下来就可以提取序列啦,我有提供脚本哦,用法:
perl /share/work/wangq/script/lv/miRNA_500_fasta.pl -id blast.out -fa Donkey_Hic_genome.20180408.fa -out result.fa -miR miRNA.fa -l 500
-id 后跟筛选的blast结果;-fa后跟参考基因组染色体序列;-miR后跟miRNA的序列文件;-l后跟提取miRNA前后多长序列,默认500。
提取结果(miRNA序列大写标注,其余小写):
>bta-miR-34c acaaggcacagcatcacccgccgccctgctgggaggaggccgccaccctccgcggcgaactgccgacccgaagcgtttgagaggagaaagctgcgcttcgagactggatgcgtcagcatttcttccgcgcggcgcggcgagcgcgtggtccgcgtgcgagcaaaccccctgcaaacgcaggcgggctgatgcttctagctggagttaaacagttagctatcactaggagtaaaagcaagattgggcgatcccatgtaaggaaagcaaaatctggggcgtttagtagctttaattgtaaggtgtaaagacatttgaattgctgggggaagccccctgtgtaacgttcagagctgtgagtcactgtgcctattttccattgtttatcagggtactcaccaatccaccaactaaagtgagtaccacggagccagtgatctgcctgtcacgacgcatgggggcaccaacttgagactgaagtttgtgatgaaagtaaagcttttttgctgtgagtctagttactAGGCAGTGTAGTTAGCTGATTGctaataataccaatcactaaccacacggccaggtaaaaagatttgggaattcatccagatgagctgcgtgtgcacaccagtgggtttgggggcaagaaggggattggaataccctaatagtacgcattgcctgtttatccatagctcagccaagagagaaatcagcattttagctgctaaatatacaacatatgtagtaaatatacatttttaacatataatttttcaatattccttcaggcgcttaaccaaaaaaattttcagatatattgaggaatagaggttttctctcttagctcatttatgtcattgtgtaaacttgtgattattttgaactatctgtaagactgtattactattttgaaagaataaagtgccctaaagtcataaattgtggtcattcatgtgtccattgcctttcctaagttggctttatgatgtccttctcagcgcctgccacagtaattataactttcctatcgctaaaatggttaaattgttcctagattaacaaaatctcccgggaaggctattcacaagcactttttagtatttttctaagaacacgtta >bta-miR-370 tgtttattgttcgtgtctcagagttgctctgaggccagtgtgctgggcacccagcaggccatctgggaggggagaaaggaagctagccatgtatgtcagtaggggtcagcgtggaggcctttggggtttctgggaaacggttcgaacatggagaatctcgtgatgtggacgcccccgagggcagccccatttcatgaactggatcccttagtcggatttctgttttccagggcacttgtttaagctttcatgccgtgtccaaaaaaaagagcagatcccaagagtttcctgcccgaggcccgtcttgccagaagccctcggcttggcctgagcatcgagatcattcctctaatcagcctgggggtggtctgacccgcggggtgggcggcgtggggcggggaggggcagggcgtgtgcggggcgggagctgctgggggagctggcggcggttcctttcaccctcgccgtggacccgcgggggcgtggctgtcctcggtctacaaatcgcgcaagtcggggcacaagacagagaggccaggtcacgtctctgcagttacacagctcatgagtGCCTGCTGGGGTGGAACCTGGTctgtctgtctgtctagcaccacagctcgggcgctgctgcagagggaacaaagatttgggtgagggcctcagagacgggctggggaaggggtttactcgggctgactttgacatgaaaacaatagctaattctgcttggggcctagtgctgtgactatcttacagatgggaaacaggcacagacaggttagttaactttcctgatgctttccagctcatcaggattggaggctggatttggaggcaggcggtgtggttcgagcatatgtgctctaaccattgggaaacactgcctcctgactgtgagcacacgtagagatggcacatggagtccaagaatctgggtttgagtccttttaccactgcctggtgggtgtactgtccagtcaatcatacatatttcattcttcctcttgacagagtctctgcagaggcccagcgagtcgttggcccgtgggaaatatttttttaagaatgcatgtgtgtgatagaattttatatctatcgaaggggagggg
脚本代码如下:
#!/usr/bin/perl -w use strict; use Getopt::Long; use Bio::SeqIO; use Bio::Seq; my $version = "1.3"; my %opts; GetOptions(\%opts, "id=s", "fa=s", "miR=s", "l=s","out=s","h"); if(!defined($opts{out}) || !defined($opts{fa}) || !defined($opts{miR}) || !defined($opts{id}) ||defined($opts{h})) { print <<"Usage End."; Description: $version:lefse analysis Usage Forced parameter: -id blast.out <infile> must be given -out outfile <outfile> must be given -fa genome fasta file <infile>must be given -miR miRNA fasta file <infile>must be given -l length <int> Other parameter: -h Help document Usage End. exit; } my $len = $opts{l}; $len ||=500; my $in = Bio::SeqIO->new(-file => "$opts{fa}" , -format => 'Fasta'); my %fasta; while ( my $seq = $in->next_seq() ) { my($id,$sequence)=($seq->id,$seq->seq); $fasta{$id}=$sequence; } my $ina = Bio::SeqIO->new(-file => "$opts{miR}" , -format => 'Fasta'); my %mi; while ( my $seq = $ina->next_seq() ) { my($id,$sequence)=($seq->id,$seq->seq); $mi{$id}=$sequence; } open(IN,"$opts{id}") ||die "open file $opts{id} faild.\n"; open(OUT,">$opts{out}") ||die "open file $opts{out} faild.\n"; while(<IN>){ chomp; my @line = split("\t"); if($line[8] < $line[9]){ my $miR = lc(substr( $fasta{$line[1]},$line[8]-1, $line[9] - $line[8]+1)); my $small = lc($mi{$line[0]}); $miR =~ s/$small/$mi{$line[0]}/; my $before = lc(substr( $fasta{$line[1]},$line[8]-$len-1, $len)); my $laft = lc(substr( $fasta{$line[1]},$line[9], $len)); print OUT ">$line[0]\n$before$miR$laft\n"; }elsif($line[8] > $line[9]){ my $miR = lc(substr( $fasta{$line[1]},$line[9]-1, $line[8] - $line[9]+1)); my $before = lc(substr( $fasta{$line[1]},$line[9]-$len-1, $len)); my $laft = lc(substr( $fasta{$line[1]},$line[8], $len)); my $gene = $before.$miR.$laft; $gene = &reverse_complement_IUPAC($gene); my $small = lc($mi{$line[0]}); $gene =~ s/$small/$mi{$line[0]}/; print OUT ">$line[0]\n$gene\n"; } } close(IN); close(OUT); sub reverse_complement_IUPAC { my $dna = shift; # reverse the DNA sequence my $revcomp = reverse($dna); # complement the reversed DNA sequence $revcomp =~ tr/ABCDGHMNRSTUVWXYabcdghmnrstuvwxy/TVGHCDKNYSAABWXRtvghcdknysaabwxr/; return $revcomp; }
以上是“perl如何提取miRNA前后500bp序列”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注亿速云行业资讯频道!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。