Keep one ID based on value of a column?

anjalianjali

New Member
Joined
Dec 24, 2019
Messages
1
Reaction score
0
Credits
0
hellow,


i have a tab delimeted file:

GH76.hmm - 358 VENTURIA_I_00885.t1 - 411 7.50E-83 273.9 26.1 1 1 7.80E-85 2.30E-82 272.3 26.1 15 354 24 406 21 410 0.87
GH105.hmm - 332 VENTURIA_I_00885.t1 - 411 7.80E-10 33.7 5.3 1 2 8.80E-07 0.00026 15.5 1.9 63 153 159 250 131 260 0.78
GH105.hmm - 332 VENTURIA_I_00885.t1 - 411 7.80E-10 33.7 5.3 2 2 2.70E-07 7.90E-05 17.2 0.1 12 104 275 378 268 383 0.73
AA3_2.hmm - 570 VENTURIA_I_04612.t1 - 614 2.80E-98 324.9 0 1 1 3.70E-100 3.60E-98 324.5 0 2 566 34 608 33 610 0.87
AA3.hmm - 618 VENTURIA_I_04612.t1 - 614 7.50E-91 300.5 0 1 1 9.70E-93 9.50E-91 300.1 0 81 398 28 609 22 613 0.86
AA3_3.hmm - 591 VENTURIA_I_04612.t1 - 614 2.30E-57 189.7 0 1 2 5.00E-49 4.90E-47 155.6 0 3 463 36 508 34 515 0.81
AA3_3.hmm - 591 VENTURIA_I_04612.t1 - 614 2.30E-57 189.7 0 2 2 3.40E-11 3.30E-09 30.7 0 511 583 531 604 525 611 0.87

I want to keep one id from column 4 based on the smallest e-value in column 7.
I have tried using below command but no output:

$cat ./file2.csv | sed '/#/d'| sed '/\n/d' | awk -F'[\t]' '$7 > smallest[$4] { smallest[$7]=$4; line[$1] = $0 };END { for (id in smallest) { print line[id] }}'



Output should be like:


GH76.hmm - 358 VENTURIA_I_00885.t1 - 411 7.50E-83 273.9 26.1 1 1 7.80E-85 2.30E-82 272.3 26.1 15 354 24 406 21 410 0.87
AA3_2.hmm - 570 VENTURIA_I_04612.t1 - 614 2.80E-98 324.9 0 1 1 3.70E-100 3.60E-98 324.5 0 2 566 34 608 33 610 0.87


Thankyou.
 

Members online


Top