这篇文章主要为大家展示了“merged.gtf如何合并同一转录本的exon位置”,内容简而易懂,条理清晰,希望能够帮助大家解决疑惑,下面让小编带领大家一起研究并学习一下“merged.gtf如何合并同一转录本的exon位置”这篇文章吧。
在merged.gtf文件中有所有外显子的信息,下面的脚本可根据此文件提取转录本的所有外显子位置信息。
merged.gtf文件实例:
Chr00 Cufflinks exon 37990 38333 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "1"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 38607 38710 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "2"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 38814 38898 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "3"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 42611 42713 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "4"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1"; Chr00 Cufflinks exon 42906 43203 . + . gene_id "XLOC_000001"; transcript_id "TCONS_00000001"; exon_number "5"; gene_name "MD00G1000200"; oId "CUFF.2.1"; nearest_ref "mRNA:MD00G1000200"; class_code "j"; tss_id "TSS1";
输出文件示例:
Chr00 + XLOC_000001 MD00G1000200 TCONS_00000001 exon 37990-38333 38607-38710 38814-38898 42611-42713 42906-43203 Chr00 + XLOC_000001 MD00G1000200 TCONS_00000002 exon 38005-38333 38607-38710 38814-38898 42611-42726 42906-43167 Chr00 + XLOC_000002 MD00G1000400 TCONS_00000003 exon 50386-50877 Chr00 + XLOC_000003 MD00G1000500 TCONS_00000004 exon 76659-76991 77468-77544 77649-77715 77889-77970 78355-78424 Chr00 + XLOC_000004 MD00G1000600 TCONS_00000005 exon 101951-102138 102228-102398 102957-103004 103099-103138 103227-103327 Chr00 + XLOC_000004 MD00G1000600 TCONS_00000006 exon 102003-102138 102228-102398 102957-103004 103099-103138 103227-103327 Chr00 + XLOC_000005 MD00G1000700 TCONS_00000007 exon 105542-105626 105926-106541 108356-108832 Chr00 + XLOC_000005 MD00G1000700 TCONS_00000009 exon 105542-105626 105926-106541 108902-109696 Chr00 + XLOC_000005 MD00G1000700 TCONS_00000008 exon 105542-105626 105926-106541 108949-109696 Chr00 + XLOC_000006 MD00G1001100 TCONS_00000010 exon 276592-277221 280928-280975
其中第一列为染色体;第二列为正负链;第三列是gene_id;第四列为gene_name;第五列为转录本ID;之后是外显子的起始位置信息
代码如下:
#!/usr/bin/perl -w use strict; use warnings; use Getopt::Long; use Config::General; use Cwd qw(abs_path getcwd); use FindBin qw($Bin $Script); my $version = "1.2"; ## prepare parameters ####################################################################### ## ------------------------------------------------------------------------------------------- ## GetOptions my %opts; GetOptions(\%opts, "gtf=s", "od=s", "h"); my $od = $opts{od}; $od = abs_path($od); mkdir $od unless(-d $od); open(IN,"$opts{gtf}") ||die "open file $opts{gtf} failed."; open(OUT,">$opts{od}/merged.tpm") ||die "open file $opts{od}/merged.tpm failed."; while(<IN>){ next if(/^#/); chomp; my($chr,$a,$exon,$start,$end,$c,$link,$d,$lin) = split("\t",$_); $lin=~/transcript_id \"([^\"]*)\"/; my $trans = $1; $lin=~/gene_name \"([^\"]*)\"/; my $gene_name= $1; $lin =~/gene_id \"([^\"]*)\"/; my $gene_id= $1; $lin =~/transcript_id \"([^\"]*)\"/; my $trans_id = $1; print OUT join("\t",$chr,$exon,$start,$end,$link,$gene_id,$trans_id)."\n"; } close(IN); close(OUT); open(IN,"$opts{od}/merged.tpm") ||die "open file $opts{od}/merged.tpm failed."; open(OUT,">$opts{od}/merged.gtf") ||die "open file $opts{od}/merged.gtf failed."; my $cmd=""; my $key=""; while(<IN>){ next if(/^#/); chomp; my ($chr,$exon,$start,$end,$link,$gene_id,$gene_name,$trans_id) = split("\t",$_); if($key eq $trans_id){ $cmd .= "\t".$start."-".$end; }else{ $key = $trans_id; if($cmd ne ""){ print OUT $cmd."\n"; } $cmd = join("\t",$chr,$link,$gene_id,$trans_id,$exon,$start."-".$end); } } close(IN); close(OUT);
以上是“merged.gtf如何合并同一转录本的exon位置”这篇文章的所有内容,感谢各位的阅读!相信大家都有了一定的了解,希望分享的内容对大家有所帮助,如果还想学习更多知识,欢迎关注亿速云行业资讯频道!
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。