awk: create a table from a file -
i have commandlog file , want select info in table format. input this:
#################################################################################################### # starting pipeline @ mon jul 29 12:22:56 cest 2013 # input files: test.fastq # output log: .bpipe/logs/27790.log # stage results mkdir ./qc_graphics_results/ #################################################################################################### # starting pipeline @ mon jul 29 12:22:57 cest 2013 # input files: test.fastq # output log: .bpipe/logs/27790.log # stage statistics_graph_2 fastqc test.fastq -o ./qc_graphics_results/ mv .qc_graphics_results/*fastqc .qc_graphics_results/fastqc #################################################################################################### # starting pipeline @ mon jul 29 12:24:18 cest 2013 # input files: test.fastq # output log: .bpipe/logs/27790.log # stage gc_content [all] # stage dinucleotide_odds [all] # stage sequence_duplication [all] prinseq-lite.pl -fastq test.fastq -graph_data test.dinucleotide_odds.gd -graph_stats dn -out_good null -out_bad null prinseq-lite.pl -fastq test.fastq -graph_data test.sequence_duplication.gd -graph_stats da -out_good null -out_bad null prinseq-lite.pl -fastq test.fastq -graph_data test.gc_content.gd -graph_stats gc -out_good null -out_bad null
the desired output table each stage , command, this:
stage result mkdir./qc_grahics_results/ stage statistics_graph_2 fastqc test.fastq -o ./qc_graphics_results/ stage gc_content [all] prinseq-lite.pl -fastq test.fastq -graph_data test.gc_content.gd -graph_stats gc -out_good null -out_bad null dinucleotide_odds [all] prinseq-lite.pl -fastq test.fastq -graph_data test.sequence_duplication.gd -graph_stats da -out_good null -out_bad null stage sequence_duplication [all] prinseq-lite.pl -fastq test.fastq -graph_data test.gc_content.gd -graph_stats gc -out_good null -out_bad null
i have been trying awk using following code doesn't work. suggestions?
cat commandlog.txt | awk '/^#\ stage*/{print $0} !/^#.*/{print $0}' | awk '{ if ($0 ~ /^#*/){ if (b=1){next} else {a=$0 b=1 next;} else { if (nf!=0){func=$0 b=0 print $a\t$func\n}}' > ./statistic_files/user_options
save in file named awk0.
nf == 0 {next} substr($1,1,1) == "#" && $2 != "stage" {next} $2 == "stage" && nf == 3 {stage_name = $2 " " $3 next } stage_name != "" {print stage_name, $0 stage_name = "" next} $2 == "stage" {arr[$3] = "" next} { {for (i in arr) { if (match($0, i) != 0) print "stage", i, $0 }; } }
then run with: cat commandlog.txt | awk -f awk0 > ./statistic_files/user_options
output:
stage results mkdir ./qc_graphics_results/ stage statistics_graph_2 fastqc test.fastq -o ./qc_graphics_results/ stage dinucleotide_odds prinseq-lite.pl -fastq test.fastq -graph_data test.dinucleotide_odds.gd -graph_stats dn -out_good null -out_bad null stage sequence_duplication prinseq-lite.pl -fastq test.fastq -graph_data test.sequence_duplication.gd -graph_stats da -out_good null -out_bad null stage gc_content prinseq-lite.pl -fastq test.fastq -graph_data test.gc_content.gd -graph_stats
good luck!
Comments
Post a Comment