J'essaie de lire un fichier de séquences nucléotidiques contenant l'identifiant et la séquence. Les séquences sont par défaut séparées par de nouvelles lignes après 70 bits de séquences nucléotidiques.
Le fichier d'entrée (seq.txt) ressemble à ceci.
id seq 0 seqgb_AY741213_Organism_Influenza_A_virus__A_b... NaN 1 ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAA... NaN 2 ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAA... NaN 3 CGTACTGGACAAGACACACAACGGGAAGCTCTGCGAGCTAGATGGA... NaN 4 TGTAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGTGACGAAT... NaN 5 ACATAGTAGAGAAGGCCAGTCCAGCCAATGACCTCTGTTACCCAGG... NaN 6 GAAACACCTATTGAGCAGAATAAACCATTTTGAGAAAATTCAGATC... NaN 7 CATGAAGCCTCATCAGGGGTGAGCTCAGCATGTCCATACCAGGGGA... NaN 8 TATGGCTTATCAAAAAGAACAGTGCATACCCAACAATAAAGAGGAG... NaN 9 TCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCAGAG... NaN 10 ACCACCTATATTTCCGTTGGAACATCAACACTAAACCAGAGATTGG... NaN 11 AAGTAAATGGGCAAAGTGGAAGAATGGAGTTCTTCTGGACAATTTT... NaN 12 CGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCATACAAAATT... NaN 13 ATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGTCAAA... NaN 14 GTATGCCATTCCACAACATACACCCTCTCACCATCGGGGAATGCCC... NaN 15 AGTCCTTGCGACAGGGCTCAGAAATAGCCCTCAAAGAGAGAGAAGA... NaN 16 GCTATAGCAGGGTTTATAGAGGGAGGATGGCAGGGAATGGTAGATG... NaN 17 ATGAGCAGGGGAGTGGATACGCTGCAGACAAAGAATCCACTCAAAA... NaN 18 GGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTT... NaN 19 AGGAGAATAGAAAATTTAAACAAGAAGATGGAGGACGGATTCCTAG... NaN 20 TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTC... NaN 21 GGTCCGACTACAACTTAGGGATAATGCAAAGGAGCTGGGTAACGGT... NaN 22 GATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGACTACC... NaN 23 TAAACAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAAC... NaN 24 AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTA... NaN 25 TCGTTACAATGCAGAATTTGCATTTGA NaN 26 seqgb_EU676325_Organism_Influenza_A_virus__A_b... NaN 27 TTTAGCAAAAGGCAGGGGTATATCTGTCAAAATGGAGAAAATAGTG... NaN 28 GTTAAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGA... NaN 29 AAAAGAACGTTACTGTTACACATGCCCAAGACATACTGGAAAAGAC... NaN .. ... ... 598 GATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGACTACC... NaN 599 TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAAT... NaN 600 TACAGTGGCAAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTA... NaN 601 TCATTACAATGCAGAATTTGCATTTAAATTG NaN 602 seqgb_DQ999880_Organism_Influenza_A_virus__A_c... NaN 603 ATGGAGAGAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTTAAAA... NaN 604 ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAGGAA... NaN 605 CATACTGGAAAAGACACACAACGGGAAGCTCTGCGATCTAGATGGA... NaN 606 TGTAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGTGACGAAT... NaN 607 ACATAGTGGAGAAGGCCAATCCAGTCAATGACCTCTGTTACCCAGG... NaN 608 GAAACACCTATTGAGCAGAATAAACCATTTTGAGAAAATTCAGATC... NaN 609 CATGAAGCCTCATTAGGGGTGAGCTCAGCATGTCCATACCTGGGAA... NaN 610 TATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAGAGGAG... NaN 611 TCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCAGAG... NaN 612 ACCACCTATATTTCTGTTGGGACATCAACACTAAACCAGAGATTGG... NaN 613 AAGTAAACGGGCAAAGTGGAAGGATGGAGTTCTTCTGGACAATTTT... NaN 614 CGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCATACAAAATT... NaN 615 ATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGTCAAA... NaN 616 GTATGCCATTCCACAATATACACCCTCTCACTATCGGGGAATGCCC... NaN 617 AGTCCTTGCGACTGGGCTCAGAAATAGCCCTCAAAGAGAGAGAAGA... NaN 618 GCTATAGCAGGTTTTATAGAGGGGGGATGGCAGGGAATGGTAGATG... NaN 619 ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAA... NaN 620 GGTCAACTCGATAATTGACAAAATGAACACTCAGTTTGAGGCCGTT... NaN 621 AGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTCCTAG... NaN 622 TTCTGGTTCTCATGGAAAATGAGAGAACCCTAGACTTTCATGACTC... NaN 623 GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGT... NaN 624 GATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGACTACC... NaN 625 TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAAT... NaN 626 TACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTA... NaN 627 TCGTTACAATGCAGAATTTGCATTAAATTG NaN [628 rows x 2 columns]
J'ai écrit ce code:
import pandas as pd
import numpy as np
data = pd.read_csv('seq.txt', sep=',',delim_whitespace = True, names=["id", "seq"], skip_blank_lines = True, index_col=False) # , dtype='unicode'
dataframe = pd.DataFrame(data)
print(dataframe)
Et le résultat est:
seqgb_AY741213_Organism_Influenza_A_virus__A_blackbird_Hunan_1_2004_H5N1___Strain_Name_A_blackbird_Hunan_1_2004_Segment_4_Subtype_H5N1_Host_Blackbird, ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCTCAAGA CGTACTGGACAAGACACACAACGGGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA AGGAGAATAGAAAATTTAAACAAGAAGATGGAGGACGGATTCCTAGATGTCTGGACTTATAATGCTGAAC TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTCAAGAACCTTTACGAAAA GGTCCGACTACAACTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT GATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGACTACCCGCAGTATTCAGAAGAAGCAAGAC TAAACAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCTTTATGGATGTGCTCCAATGGA TCGTTACAATGCAGAATTTGCATTTGA seqgb_EU676325_Organism_Influenza_A_virus__A_brown-head_gull_Thailand_vsmu-4_2008_H5N1___Strain_Name_A_brown-head_gull_Thailand_vsmu-4_2008_Segment_4_Subtype_H5N1_Host_Brown-Headed_Gull, TTTAGCAAAAGGCAGGGGTATATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTT GTTAAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGG AAAAGAACGTTACTGTTACACATGCCCAAGACATACTGGAAAAGACACACAACGGGAAGCTCTGCGATCT AGATGGAGTGAAGCCTCTAATTTTGAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGT GACGAATCTCCAATGGGGGCGATAAACTCTAGTATGCCATTCCACAATATACACCCTCTCACCATCGGGG AATGCCCCAAATATGTGAAATCAAACAGATTAGTCCTTGCGACTGGGCTCAGAAATAGCCCTCAAAGAGA GAGAAGAAGAAAAAAGAGAGGATTATTTGGAGCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATG GTAGATGGTTGGTATGGGTACCACCATGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTC ATGACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGG TAACGGTTGTTTCGAGTTCTATCATAAATGTGATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTAT GACTACCCACAGTATTCAGAAGAAGCAAGACTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAA TAGGAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGC TGGTCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTC AGATTGAG seqgb_EF178528_Organism_Influenza_A_virus__A_brown-headed_gull_Thailand_VSMU-28-SPK_2005_H5N1___Strain_Name_A_brown-headed_gull_Thailand_VSMU-28-SPK_2005_Segment_4_Subtype_H5N1_Host_Brown-Headed_Gull, AGCAAAAGCAGGGGTATAATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTT AAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAA AGAACGTTACGAATGATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAGTATGCATACAA AATTGTCAAGAAAGGGGACTCAACAATTATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGT CAAACTCCAATGGGGGCGATAAACTCAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGC CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC CTAGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCCTGGAAAATGAGAGAACTCTAGACTTTCATG ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA CGGTTGTTTCGAGTTCTATCATAAATGTGATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGAC TACCCACAGTATTCAGAAGAAGCAAGACTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGG TCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGA T seqgb_CY091790_Organism_Influenza_A_virus__A_chicken_Ampenan_BBVD-282_2007_H5N1___Strain_Name_A_chicken_Ampenan_BBVD-282_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAAGACATACTGGAAAAGGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATAC TGTCGATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGATTGTAGTTAAA seqgb_KT216634_Organism_Influenza_A_virus__A_chicken_Anhui_MG08_2008_H9N2___Strain_Name_A_chicken_Anhui_MG08_2008_Segment_4_Subtype_H9N2_Host_Chicken, AGCAAAAGCAGGGGAATTTCACAACCACTCAAGATGGAGACAGTATCACTAATAAATATACTACTAGTAG TAACAGTAAGCAATGCAGATAAAATCTGCATCGGCTATCAATCAACAAATTCCACAGAAACTGTAGACAC ACTAACAGAAAACAATGTCCCTGTGATTGTAATTGCAATGGGGTTTGCTGCCTTCTTGTTCTGGGCCATG TCCAATGGGTCTTGCAGATGCAACATTTGTATATAATTGGCAAAAACACCCTTGTTTCTACT seqgb_KY005855_Organism_Influenza_A_virus__A_chicken_Anhui_MZ33_2016_H5N6___Strain_Name_A_chicken_Anhui_MZ33_2016_Segment_4_Subtype_H5N6_Host_Chicken, ATGGAGAAAATAGTGCTTCTTCTTGCAGTGGTTAGCCTTGTTAAAGGTGATCAGATTTGCATTGGTTACC ATGCAAACAACTCGACTGAGCAGGTTGACACGATAATGGAAAAAAACGTCACTGTTACACATGCTCAAGA CATACTAGAAAGGAATATGGCAATTGCAACACCAAATGTCAAACTCCAATAGGGGCGATAAACTCTAGTA TGCCATTCCACAATATACACCCTCTCACTATCGGGGAGTGCCCCAAATATGTGAAATCAAACAAATTAGT CCTTGCGACTGGGCTCAGAAATAGTCGAATCCACCCAAAAGGCAATAGATGGAGTTACCAATAAGGTCAA CTCGATAATTGACAAAATGAACACTCAGACGGATTCCTAGATGTCTGGACTTATAATGCTGAACTTTTAG TTCTCATGGAAAATGAGAGAACTCTAGATTTCCATGACTCAAATGTCAAGAACCTTTATGACAAAGTCCG ACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAATGGTTGTTTCGAGTTCTATCACAAATGTGATAAT GAATGTATGGAAAGTGTGAGGAATGGGACGTATGACTACCCCCAGTATTCAGAAGAAGCAAGATTAAAAA GGGAAGAAATAAGCGGAGTGAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTCAACAGT GGCGGGTTCCCTAGCACTGGCAATCATTGTGGCTGGTCTATCTTTATGGATGTGCTCCAATGGGTCGTTA CAATGCAGAATTTGCATTTAA seqgb_KY005863_Organism_Influenza_A_virus__A_chicken_Anhui_MZ34_2016_H5N6___Strain_Name_A_chicken_Anhui_MZ34_2016_Segment_4_Subtype_H5N6_Host_Chicken, ATGGAGAAAAGAAGAACGATGCATACCCAACAATAAAAATGAGCTACAATAACACCAATAGGGAAGATCT TTTGATACTGTGGGGGATTCATCATTCCAATAATGCAGAAGAGCAGACAAATCTCTATAAAAACCCAACC ACCTATGTTTCCGTTGGGACATCAACATTAAACCAGAGAGTGGTGCCAAAAATAGCTACTAGATCCCAAG TAAACGGGCAAAGTGGAAGAATGGATTTCTTCTGGACAATTTTAAAACCGGATGATGCAATCCACTTCGA GAGTAATGGAAATTTTATTGCTCCAGACTATCGGGGAGTGCCCCAAATATGTGAAATCAAACAAATTAGT CCTTGCGACTGGGCTCAGAAATAGTCCTCTAAGAGAAAGAAGAAGAAAAAGAGGATTATTTGGAGCCATA GCAGGGTTTATAGAGGGAGGATGGCAAGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCAATGCAC AAGGGAGTGGGTATGCTGCAGACAGAGAATCCACCCAAAAGGCAATAGATGGAGTTACCAATAAGGTCAA CTCGATAATTGACAAAATGAACACTCAATTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAACGGAGA ATAGAGAATTTAAATAAGAAAATGGAAGACGGATTCCTAGATGTCTGGACTTATAATGCTGAACTTTTAG TTCTCATGGAAAATGAGAGAACTCTAGATTTCCATGACTCAAATGTCAAGAACCTTTATGACAAAGTCCG ACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAATGGTTGTTTCGAGTTCTATCACAAATGTGATAAT GAATGTATGGAAAGTGTGAGGAATGGGACGTATGACTACCCCCAGTATTCAGAAGAAGCAAGATTAAAAA GGGAAGAAATAAGCGGAGTGAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTCAACAGT GGCGGGTTCCCTAGCACTGGCAATCATTGTGGCTGGTCTATCTTTATGGATGTGCTCCAATGGGTCGTTA CAATGCAGAATTTGCATTTAA seqgb_CY091815_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-277_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-277_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAAGACATACTGGAAAAGACACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT AATTTTAAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGGAACCCAATGTGTGATGAATTCATCAATGTA CCGGAATGGTCTTACATAGTGGAGAACAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA CCAACCAAGAAGATCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCAGAGCAAACGAGGCT ATATCAAAATCCAATCACCTATATTTCCGTTGGGACATCAACACTGAACCAGAGATTGGTACCAAAAATA GCTACCAGAACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGGGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTAGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_CY091816_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-288_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-288_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAAGACATACTGGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT AATTTTAAGAGATTGTAGTGTAGCCGGATGGCTCCTCGGGAACCCAATGTGTGACGAATTCATCAATGTA CCGGAATGGTCTTACATAGTGGAGAACAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA CCAACCAGGAAGATCTTTTGGTACTGTGGGGGATTCACCATCCTAATGATGCGGCTGAGCAAACGAAGCT ATATCAAAATCCAACCACCTATATTTCCGTTGGGACATCAACACTAAATCAGAGATTGGTACCAAAAATA GCTACTAGATCCAAAGTAAACGGACAAAGTGGAAGGATGGAGTTCTTCTGGACAATTTTAAAACCCAATG ATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCCTACAAAATTGTCAAGAAAGG GGACTCAGCAATTATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG GCGATAAACTTGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_CY091819_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-328_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-328_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAAGACATACTAGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGAGTGAAGCCTCT AATTTTAAGAGATTGTAGTGTAGCCGAGCAGAATAAACCATTTTGAGAAAATTCAGATCATCCCCAAAAG TTCTTGGTCCGACCATGAAGCCTCGTCAGGGGTGAGCTCAGCATGTCCATACCTGGGAACGCCCTCCTTT TTTAGAAATGTGGTATGGCTTATCAAAAAGAACAGTACATACCCAACAATAAAAAGAAGCTACAATAATA CCAACCAGGAAGATCTTTTGGTACTGTGGGGGATCCACCATCCTAATGATGCGGCTGAGCAAACGAAGCT ATATCAAAATCCAACCACCTATATTTCCGTTGGGACATCAACACTAAATCAGAGATTGGTACCAAAAATA GCTACTAGATCCAAAGTAAACGGACAAAGTGGAAGGATGGAGTTCTTCTGGACAATTTTAAAACCCAATG ATGCAATCAACTTCGAGAGTAATGGAAATTTCATTGCTCCAGAATATGCCTACAAAATTGTCAAGAAAGG GGACTCAGCAATTATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG GCGATAAACTCTAGTATGCCATTCCACAACATACACCCTCTCACCATCGGGGAATGCCCCAAATATGTGA AATCAAACAGATTAGTCCTTGCGACTGGGCTCAGAAATAGCCCCCAAAGAGAGAGAAGAAGAAAAAAGAG AGGACTATTTGGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG GAGTCACCAATAAGGTCAATTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGACTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAGATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATTTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_CY091820_Organism_Influenza_A_virus__A_chicken_Badung_BBVD-342_2007_H5N1___Strain_Name_A_chicken_Badung_BBVD-342_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAAGACATACTGGAAAAGGCACACAACGGGAAGCTCTGTGATCTAGATGGGGTGAAGCCTCT AATTTTAAGAGATTGTAGTGTAGCCGTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG GAGTCACCAATAAGGTCAACTCGATTATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_GQ122391_Organism_Influenza_A_virus__A_chicken_Bali_UT2091_2005_H5N1___Strain_Name_A_chicken_Bali_UT2091_2005_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAAAATAGTGCTTCTTCTTGCAACAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCCCAAGA CATACTGGAAAAAACACACAACGGGAATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCA ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA AGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTTCTAGATGTCTGGACTTATAATGCCGAAC TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTACGACAA GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT GATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGCTCCAATGGA TCGTTACAATGCAGAATTTGCATTTAA seqgb_GQ122392_Organism_Influenza_A_virus__A_chicken_Bali_UT2092_2005_H5N1___Strain_Name_A_chicken_Bali_UT2092_2005_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAAAATAGTGCTTCTTCTTGCAACAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAATTCAACAGAGCAGGTTGCCCTCAAAGAGAGAGAAGAAGAAAAAAGAGAGGACTATTTGGA GCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATGGTAGATGGTTGGTATGGGTATCACCATAGCA ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA AGGAGAATAGAATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTACGACAA GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT GATAATGAATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTACCAAATACTGTCAATTTATTC AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGCTCCAATGGA TCGTTACAATGCAGAATTTGCATTTAA seqgb_DQ083551_Organism_Influenza_A_virus__A_chicken_Bangkok_Thailand_CU-3_04_H5N1___Strain_Name_A_chicken_Bangkok_Thailand_CU-3_04_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAACTCGACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTACACATGCCCAAGA CATACTGGAAAAGACTTTCATTGCTCCAGAATATGCATACAAAATTGTCAAGAAAGGGGACTCAACAATT ATGAAAAGTGAATTGGAATATGGTAAATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCA ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA GGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAACAACTTAGAA AGGAGAATAGAAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCATAAATGT GATAATGAATGTATGGAAAGTGTAAGAAACGGAACGTATGACTACCCGCAGTATTCAGAAGAAGCAAGAC TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAATTTACCAAATACTGTCAATTTATTC TACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCCTTATGGATGTGCTCCAATGGG TCGTTACAATGCAGAATTTGCATTTAAATTTG seqgb_CY091797_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-245_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-245_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCTGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT TGCATTGGTTACCATGCAAACAATTCAACAGAGCAGGTTGACACAATAATGGAAAAGAACGTTACTGTTA CACATGCCCAATTAGTCCTTGCGACTATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGTTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_CY091801_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-562_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-562_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCTGTCATTCGAGAGTAATGGAGGGCTCAGAAATAGCCCCCAAAGAGAGAGAAGAAGAAAAAAGAG AGGACTATTTGGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAAATG GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAGATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATTTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_CY091803_Organism_Influenza_A_virus__A_chicken_Bangli_BBVD-575_2007_H5N1___Strain_Name_A_chicken_Bangli_BBVD-575_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCCGTCAGAGCTATAGCAGGTTTTATAGAGGGTGGATGGCAGGGAATGGTAGATGGTTGGTATGGG TACCACCATAGCAATGAGCAAGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCTTAGATGTCTGGACT TATAATGCTGAGCTTCTGGTTCTCATGGAAAATGAGAGAACTTTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCATTACAGTGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_GQ122399_Organism_Influenza_A_virus__A_chicken_Banten_UT6025_2006_H5N1___Strain_Name_A_chicken_Banten_UT6025_2006_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAAAATAGTGCTTCTTCTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAATCAGGGCTCAGAAAGGATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCATCATAGCA ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA GGTCAACTCAATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAATAACTTAGAA AGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTTCTAGATGTCTGGACTTATAATGCCGAAC TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGAACCTCTATGACAA GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCACAAATGT GATAATGGATGTATGGAAAGTATAAGAAACGGAACGTATAACTACCCGCAGTATTCAGAAGAAGCAAGAT TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAACTTATCAAATACTGTCAATTTATTC AACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGATGTGTTCCAATGGA TCGTTACAATGCAGAATTTGCATTTAA seqgb_CY091789_Organism_Influenza_A_virus__A_chicken_Buleleng_BBVD-545b_2007_H5N1___Strain_Name_A_chicken_Buleleng_BBVD-545b_2007_Segment_4_Subtype_H5N1_Host_Chicken, TCAATCCGTCAAAATGGAGAAAATAGTGCTTCTTCTTGCAATAGCCAGTCTTGTTAAAGGTGATCAGATT TGCATTGGTTACCATGAAAAGTGAATTGGAATATGGCAACTGCAACACCAAATGTCAAACTCCAATGGGG GCGATAAACTCTAGTATGCCATTCCATGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATG GAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATT TAATAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGATTCCTAGATGTCTGGACT TATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTTAAGA ACCTCTACGACAAAGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTT CTATCACAAATGTGATGATGAATGTATGGAAAGTGTAAGAAATGGGACGTATAACTACCCGCAGTATTCA GAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGGGTAAAATTGGAATCAATAGGAATTTACCAAATAC TGTCAATTTATTCAACAGTGGCGAGTTCCCTAGCACTGGCAATCATGATGGCTGGTCTATCTTTATGGAT GTGCTCCAATGGATCATTACAATGCAGAATTTGCATTTAAATTTGTGAGTTTAGATTGTAGTTAAA seqgb_HQ200590_Organism_Influenza_A_virus__A_chicken_Cambodia_047LC3_2005_H5N1___Strain_Name_A_chicken_Cambodia_047LC3_2005_Segment_4_Subtype_H5N1_Host_Chicken, AGCAAAAGCAGGGGTTTAATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCGATAGTCAGTCTTGTT AAAAGTGATCAGATGGGACTCAACAATTATGAAAAGTGAATTGGAATATGGTAACTGCAACACCAAGTGT CAAACTCCAATGGGGGCGATAAACTCCAATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTC AAAAGGCTATAGATGGAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGC CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC CTAGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTCCATG ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA CGGTTGTTTCGAGTTCTATCACAAATGTGATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGAC TACCCGCAGTATTCAGAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGG TCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGA TTGTAGTTAAAAACACCCTTGTTTCTACT seqgb_HQ200554_Organism_Influenza_A_virus__A_chicken_Cambodia_047LC3b_2005_H5N1___Strain_Name_A_chicken_Cambodia_047LC3b_2005_Segment_4_Subtype_H5N1_Host_Chicken, AGCAAAAGCAGGGGTTTAATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCGATAGTCAGTCTTGTT AAAAGTGATCAGATTTGCATTGGTTACCATGCAAACAACTCAACAGAGCAGGTTGACACAATAATGGAAA AGAACGTTACTGTTACACATGCCCAAGACATACTGGAAAAGACACATAACGGGAAGCTCTGCGATCTAGA TGGAGTGAAGCCTCTAATTTTGAGAGATTGTAGTGTAGCTGGATGGCTCCTCGGAAACCCAATGTGTGAC GAATTCATCAATGTGCCGGAATGGTCGAGCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATGGTA GATGGTTGGTATGGGTACCACCATAGCAATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTC AAAAGGCTATAGATGGAGTCACCAATAAGGTCAACTCGATCATTGACAAAATGAACACTCAGTTTGAGGC CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC CTAGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTCCATG ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA CGGTTGTTTCGAGTTCTATCACAAATGTGATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGAC TACCCGCAGTATTCAGAAGAAGCAAGATTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGG TCTATCCTTATGGATGTGCTCCAATGGGTCGTTACAATGCAGAATTTGCATTTAAATTTGTGAGTTCAGA TTGTAGTTAAAAACACCCTTGTTTCTACT seqgb_EU620652_Organism_Influenza_A_virus__A_chicken_Thailand_NS-339_2008_H5N1___Strain_Name_A_chicken_Thailand_NS-339_2008_Segment_4_Subtype_H5N1_Host_Chicken, AGCAAAAGCAGGGGTCTGATCTGTCAAAATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTT AAAAGTGATCAAATTTGCATTGGTATAAGGTCAACTCGATAATTGACAAAATGAACACTCAGTTTGAGGC CGTTGGAAGGGAATTTAACAACTTAGAAAGGAGAATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTC CTGGATGTCTGGACTTATAATGCTGAACTTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATG ACTCAAATGTCAAGAACCTTTACGACAAGGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAA CGGCTGTTTCGAGTTCTATCATAAATGTGATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGAC TACCCGCAGTATTCAGAAGAAGCAAAACTAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAG GAATTTACCAAATACTGTCAATTTATTCTACAGTGGCAAGTTCCCTAGCACTGGCAATCATGGTAGCTGG TCTATCCTTATGGATGTGCTCCAATGGGTCATTACAATGCAGAATTTGCATTAAATTGGAGTCA seqgb_EU850416_Organism_Influenza_A_virus__A_chicken_Thailand_NS-341_2008_H5N1___Strain_Name_A_chicken_Thailand_NS-341_2008_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAAAATAGTGCTTCTTTTTGCAATAGTCAGTCTTGTTAAAAGTGATCAGATTTGCATTGGTTACC ATGCAAACAACTCGACAGAGCAGGTTCTCACCATCGGGGAATGCCCCAAATATGTGAAATCAAATAGATT AGTCCTTGCGACTGGGCTCAGAAATAGCCCTCAAAGAGAGAGAAGAAGAAAAAAGAGAGGATTATTTGGA GCTATAGCAGGTTTTATAGAGGGAGGATGGCAGGGAATGGTAGATGGTTGGTATGGGTACCACCATAGCA ATGAGCAGGGGAGTGGGTACGCTGCAGACAAAGAATCCACTCAAAAGGCAATAGATGGAGTCACCAATAA GGTCAACTCGATAATTGACAAAATGAACACTCAGTTTGAGGCCGTTGGAAGGGAATTTAACMACTTAGAA AGGAGGATAGAGAATTTAAACAAGAAGATGGAAGACGGGTTCCTAGATGTCTGGACTTATAATGCTGAAC TTCTGGTTCTCATGGAAAATGAGAGAACTCTAGACTTTCATGACTCAAATGTCAAGAACCTTTACGACAA GGTCCGACTACAGCTTAGGGATAATGCAAAGGAGCTGGGTAACGGCTGTTTCGAGTTCTATCATAAATGT GATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGACTACCCGCAATATTCAGAAGAAGCAAAAC TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAATTTACCAAATACTGTCAATTTATTC TACAGTGGCAAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCCTTATGGATGTGCTCCAATGGG TCATTACAATGCAGAATTTGCATTTAAATTG seqgb_DQ999880_Organism_Influenza_A_virus__A_chicken_Thailand_PC-168_2006_H5N1___Strain_Name_A_chicken_Thailand_PC-168_2006_Segment_4_Subtype_H5N1_Host_Chicken, ATGGAGAGAATAGTGCAGGGATAATGCAAAGGAGCTGGGTAACGGTTGTTTCGAGTTCTATCATAAGTGT GATAATGAATGTATGGAAAGTGTGAGAAACGGAACGTATGACTACCCGCAGTATTCAGAAGAAGCAAAAC TAAAAAGAGAGGAAATAAGTGGAGTAAAATTGGAATCAATAGGAATTTACCAAATACTGTCAATTTATTC TACAGTGGCGAGTTCCCTAGCACTGGCAATCATGGTAGCTGGTCTATCCTTATGGATGTGCTCCAATGGG TCGTTACAATGCAGAATTTGCATTAAATTG
Comment peut Je dépouille la nouvelle ligne présente entre une seule séquence en utilisant des pandas. Merci d'avance !!
3 Réponses :
Presque par définition, les sauts de ligne sont une partie cruciale des fichiers CSV, il n'y a donc aucun moyen de faire en sorte que read_csv de Pandas les ignore. Le mieux est de supprimer manuellement les sauts de ligne, comme ceci:
import pandas as pd
import re
with open ("seq.txt", "r") as myfile:
data=myfile.readlines()
data = re.sub('\n', '', ''.join(data))
data = data.split(',')
df = pd.DataFrame([data], names=["id", "seq"])
Je ne pense pas que cela fonctionne tout à fait car cela concatène les identifiants avec les séquences. Si vous ajoutez data = re.sub ('seqgb', ', seqgb', data) [1:] avant la ligne split , c'est mieux. De plus, le DataFrame n'a pas d'argument names , donc à la place vous pouvez faire df = pd.DataFrame ({'id': data [0 :: 2] , 'seq': données [1 :: 2]})
Merci beaucoup pour l'aide.
@MattPitkin merci, vous avez tout à fait raison. Je l'ai fait à la hâte, avec une seule séquence.
Vous pouvez lire manuellement le fichier et le convertir en un pandas DataFrame avec quelque chose comme:
import pandas as pd
with open('seg.txt', 'r') as fp:
lines = fp.readlines()
data = {'id': [], 'seq': []}
sequence = ''
for line in lines:
if line[0] == '\n':
if len(sequence) != 0:
data['seq'].append(sequence)
sequence = ''
# skip empty lines
continue
if ',' in line:
data['id'].append(line.split(',')[0])
else:
# concatenate lines with sequences
sequence += line.strip()
# add on last sequence
if len(sequence) != 0:
data['seq'].append(sequence)
# create dataframe
df = pd.DataFrame(data)
Merci beaucoup pour l'aide!! J'apprécie vraiment cela.