how to use awk,sed, grep

Discussion in 'Command Line' started by xiaoxiaodong2013, Oct 31, 2013.

  1. xiaoxiaodong2013

    xiaoxiaodong2013 New Member

    Messages:
    4
    Likes Received:
    2
    Trophy Points:
    3
    Question:
    I want to replace the line beginning with ">gi" to a simple "NW_*.1" , so what linux command should i do to promise just changing the ">gi" line?

    >gi|417531841|ref|NW_004080164.1| Ovis aries breed Texel chromosome 1 genomic scaffold, Oar_v3.1 OAR1, whole genome shotgun sequence
    ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
    >gi|417531785|ref|NW_004080165.1| Ovis aries breed Texel chromosome 2 genomic scaffold, Oar_v3.1 OAR2, whole genome shotgun sequence
    ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
    ……
    ……
    ……
    >gi|5835554|ref|NC_001941.1| Ovis aries mitochondrion, complete genome
    ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC

    DevynCJohnson likes this.
  2. GrumpyOldMan

    GrumpyOldMan Member

    Messages:
    72
    Likes Received:
    27
    Trophy Points:
    18
  3. xiaoxiaodong2013

    xiaoxiaodong2013 New Member

    Messages:
    4
    Likes Received:
    2
    Trophy Points:
    3
    It' s a very good linkage, but I want to retain NW_*1, they are changing in every line started with ">gi", so the replaced name should change, I still don't how to operate it . The output lines should be :

    NW_004080164.1
    ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
    NW_004080165.1
    ATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCCCATGGGGACATGACCGGGAGGTGGGCAAGGAGAGCGTCTACAGCTCAGGGGAGCCAGGGATCACCGCTCC
  4. GrumpyOldMan

    GrumpyOldMan Member

    Messages:
    72
    Likes Received:
    27
    Trophy Points:
    18
    You have the lines you want to operate on, so based on what's in the tutorial, simply try some modifications via command line. That's how I figure things out.
  5. bioinfornatics

    bioinfornatics New Member

    Messages:
    1
    Likes Received:
    1
    Trophy Points:
    1
    In bioinformatics we do not change a standard format (fasta) by a non standard in more to write these data into space disk and duplicate information …
    DevynCJohnson likes this.
  6. xiaoxiaodong2013

    xiaoxiaodong2013 New Member

    Messages:
    4
    Likes Received:
    2
    Trophy Points:
    3
    Yeah, I resolve it. Thanks for your linkage.

    Because the ID in .fa file should be similar with .gff file if I use the tophat to apply the two files together. If I don't change , there will be an error like: could not build bowtie index with err=1.

Share This Page