obtained the required text in source code from the website KEGG-API in the following format:
[string 1]:
b">hsa:10056 K01890 phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] | (RefSeq) FARSB, FARSLB, FRSB, HSPC173, NEDBLLA, PheHB, PheRS; phenylalanyl-tRNA synthetase subunit beta (A)\nMPTVSVKRDLLFQALGRTYTDEEFDELCFEFGLELDEITSEKEIISKEQGNVKAAGASDV\nVLYKIDVPANRYDLLCLEGLVRGLQVFKERIKAPVYKRVMPDGKIQKLIITEETAKIRPF\nAVAAVLRNIKFTKDRYDSFIELQEKLHQNICRKRALVAIGTHDLDTLSGPFTYTAKRPSD\nIKFKPLNKTKEYTACELMNIYKTDNHLKHYLHIIENKPLYPVIYDSNGVVLSMPPIINGD\nHSRITVNTRNIFIECTGTDFTKAKIVLDIIVTMFSEYCENQFTVEAAEVVFPNGKSHTFP\nELAYRKEMVRADLINKKVGIRETPENLAKLLTRMYLKSEVIGDGNQIEIEIPPTRADIIH\nACDIVEDAAIAYGYNNIQMTLPKTYTIANQFPLNKLTELLRHDMAAAGFTEALTFALCSQ\nEDIADKLGVDISATKAVHISNPKTAEFQVARTTLLPGLLKTIAANRKMPLPLKLFEISDI\nVIKDSNTDVGAKNYRHLCAVYYNKNPGFEIIHGLLDRIMQLLDVPPGEDKGGYVIKASEG\nPAFFPGRCAEIFARGQSVGKLGVLHPDVITKFELTMPCSSLEINVGPFL\n"
after being processed into utf-8 format:
[string 2]:
>hsa:10056 K01890 phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] | (RefSeq) FARSB, FARSLB, FRSB, HSPC173, NEDBLLA, PheHB, PheRS; phenylalanyl-tRNA synthetase subunit beta (A)
MPTVSVKRDLLFQALGRTYTDEEFDELCFEFGLELDEITSEKEIISKEQGNVKAAGASDV
VLYKIDVPANRYDLLCLEGLVRGLQVFKERIKAPVYKRVMPDGKIQKLIITEETAKIRPF
AVAAVLRNIKFTKDRYDSFIELQEKLHQNICRKRALVAIGTHDLDTLSGPFTYTAKRPSD
IKFKPLNKTKEYTACELMNIYKTDNHLKHYLHIIENKPLYPVIYDSNGVVLSMPPIINGD
HSRITVNTRNIFIECTGTDFTKAKIVLDIIVTMFSEYCENQFTVEAAEVVFPNGKSHTFP
ELAYRKEMVRADLINKKVGIRETPENLAKLLTRMYLKSEVIGDGNQIEIEIPPTRADIIH
ACDIVEDAAIAYGYNNIQMTLPKTYTIANQFPLNKLTELLRHDMAAAGFTEALTFALCSQ
EDIADKLGVDISATKAVHISNPKTAEFQVARTTLLPGLLKTIAANRKMPLPLKLFEISDI
VIKDSNTDVGAKNYRHLCAVYYNKNPGFEIIHGLLDRIMQLLDVPPGEDKGGYVIKASEG
PAFFPGRCAEIFARGQSVGKLGVLHPDVITKFELTMPCSSLEINVGPFL
now my goal is to delete the data after the first row of spaces and leave the rest unmodified. This is done as follows:
[string 3]:
>hsa:10056
MPTVSVKRDLLFQALGRTYTDEEFDELCFEFGLELDEITSEKEIISKEQGNVKAAGASDV
VLYKIDVPANRYDLLCLEGLVRGLQVFKERIKAPVYKRVMPDGKIQKLIITEETAKIRPF
AVAAVLRNIKFTKDRYDSFIELQEKLHQNICRKRALVAIGTHDLDTLSGPFTYTAKRPSD
IKFKPLNKTKEYTACELMNIYKTDNHLKHYLHIIENKPLYPVIYDSNGVVLSMPPIINGD
HSRITVNTRNIFIECTGTDFTKAKIVLDIIVTMFSEYCENQFTVEAAEVVFPNGKSHTFP
ELAYRKEMVRADLINKKVGIRETPENLAKLLTRMYLKSEVIGDGNQIEIEIPPTRADIIH
ACDIVEDAAIAYGYNNIQMTLPKTYTIANQFPLNKLTELLRHDMAAAGFTEALTFALCSQ
EDIADKLGVDISATKAVHISNPKTAEFQVARTTLLPGLLKTIAANRKMPLPLKLFEISDI
VIKDSNTDVGAKNYRHLCAVYYNKNPGFEIIHGLLDRIMQLLDVPPGEDKGGYVIKASEG
PAFFPGRCAEIFARGQSVGKLGVLHPDVITKFELTMPCSSLEINVGPFL
< H2 > now my question is: < / H2 >
how to get the text, do not save as a file, directly deal with the multi-line string, and then save it as a file?
because it feels superfluous to write the acquired text to a file and then process the file.
this is the code to get the text:
def getHtml(url): -sharp
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
return response.read().decode("utf-8")
url1 = "http://rest.kegg.jp/get/hsa:10056/aaseq"
text = getHtml(url1)
the text" content obtained is shown in [string 2] above
I know that the first line can be excised using split:
>>>str1 = "hsa:10056 K01890 phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] | (RefSeq) FARSB, FARSLB, FRSB, HSPC173, NEDBLLA, PheHB, PheRS; phenylalanyl-tRNA synthetase subunit beta (A)"
>>>str2 = str1.split(" ")[:1]
>>>print(str2)
["hsa:10056"]
but now the problem is, "text" is a multiline string, I just have to deal with the first line of it, and I don"t know how to solve it.