Pandas multi-conditional grouping scheduling problem row_number

df = pd.DataFrame({"key1" : ["a","a","a","b","b"],
    "key2" : ["c","d","c","c","d"],
    "data" : [1,10,2,3,30]})

>>> df
  key1 key2  data
0    a    c     1
1    a    d    10
2    a    c     2
3    b    c     3
4    b    d    30


key1 key2  data  row_number
0    a    c     1     1
1    a    d    10     1
2    a    c     2     2
3    b    c     3     1
4    b    d    30     1

grouped by key1 and key2, sorted by data, what should be done to take out the serial number? The following methods found by search were not successful

df["row_number"] = df["data"].groupby(df["key1","key2"]).rank(ascending=True,method="first")

Python pandas

May.28,2022

def cumsum_seq(v):
    sub = v.sort_values('data')
    sub['seq'] = sub['seq'].cumsum()
    return sub.loc[:, ['data', 'seq']]

df['seq'] = 1
df.groupby(['key1', 'key2']).apply(cumsum_seq).reset_index().drop(columns='level_2')

result

< table > < thead > < tr > < th > < / th > < th > key1 < / th > < th > key2 < / th > < th > data < / th > < th > seq < / th > < / tr > < / thead > < tbody > < tr > < td > 0 < / td > < td > a < / td > < td > c < / td > < td > 1 < / td > < td > 1 < / td > < / tr > < tr > < td > 1 < / td > < td > a < / td > < td > c < / td > < td > 2 < / td > < td > 2 < / td > < / tr > < tr > < td > 2 < / td > < td > a < / td > < td > d < / td > < td > 10 < / td > < td > 1 < / td > < / tr > < tr > < td > 3 < / td > < td > b < / td > < td > c < / td > < td > 3 < / td > < td > 1 < / td > < / tr > < tr > < td > 4 < / td > < td > b < / td > < td > d < / td > < td > 30 < / td > < td > 1 < / td > < / tr > < / tbody > < / table >

Previous: How to solve the problem that multiple calls to constructor mode will override the previous one?

Next: Turn to webpack to deal with the problem of public js packaging

Pandas's dataframe condition Filter performance Optimization?
currently I have a piece of code that spends most of its time on the above two sentences of data filtering in dataframe. temp_df = df [df [ "data_date "] .isin (date_list)] temp = temp_df [rule [2]] [temp_df [ "data_date "] = = d] at present, it tak...

Python pandas

Feb.26,2021
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xaa
how to deal with decoding errors when reading files? ...

Python pandas

Feb.27,2021
The meaning of pandas
import pandas as pd word = pd.read_table ( test.txt , encoding = utf-8 , names = [ query ]) what does the query in the names here mean? header: int, list of ints, default infer Row number (s) to use as the column names, and the s...

Python pandas

Feb.28,2021
Problems with reading unicode-encoded txt files by python pandas.dataframe
I have a txt file encoded in unicode, : 1.with open( STK_MKT_ValuationMetrics.txt , rb ) as f:: utf-8 codec can t decode byte 0xff in position 0: invalid start byte2.with open( STK_MKT_ValuationMetrics.txt , rb ,encoding= utf-8 ) as ...

Python pandas read-and-write-file

Mar.03,2021
How does pandas clean data elements whose values are strings in a column?
the value in the figure is "pass " After pandas reads the csv file, I want to delete the data line whose value is a string in the kscj column. What should I do? ...

Python pandas

Mar.05,2021
How can pandas read csv files to avoid the impact of scientific counting on grouping?
csv`import numpy as npimport pandas as pdf=open( G:XueYegrades.csv , rb )df=pd.read_csv(f,low_memory=False,usecols=[0,1,3,4,5,7,8,15,16])group=df.groupby([ xh , xm ],sort=False)[ xf ]print(group.sum())` as a result, he counted the student n...

Python pandas

Mar.05,2021
Mysql Connector python,NotSupportedError
to export a batch of Json data from Mongodb, you need to transfer to Mysql, but the exported Json format cannot be directly written to mysql, so you want to convert the data to Pandas s dataframe, and then write to sql: through dataframe . import panda...

Json python pandas mysql

Mar.13,2021
Why is there an extra output for apply after using groupby in pandas?
df = pd.DataFrame([[4, 9],[4, 2], [4, 5], [5, 4]], columns=[ A , B ]) df.groupby([ A ]).apply(lambda x : print(x, n )) df is: A B 0 4 9 1 4 2 2 4 5 3 5 4 the output after using apply is as follows: A B 0 4 9 ...

Python pandas

Mar.16,2021
How does pandas write data completely in the sheet of an existing excel? How does pandas delete a row of data in excel?
when using pandas for file writing, if the original sheet already has data, the newly written data is overwritten on the original data without deletion. For example, there are 4 rows of data originally, and I want to delete one row. After read is datafr...

Python pandas excel data-processing

Mar.20,2021
How to customize a function to act on every value of dataframe
def hour_exceed (df): i=df.values if i is np.nan: return np.nan elif i>200: return 1 elif i<200: return 0 < H1 > dataframe < H1 > df15.head () Out [21]: time 1036A 1037A 1040A 1041A 1051A 1053A 1054A 0 2...

Python pandas

Mar.22,2021
Baidu interview questions, how to quickly find out the duplicates in the file (large files can not be read at one time)?
Baidu interview questions, roughly means that there is a file, the file is very large can not be read at one time (may not be loaded into memory), the file is stored in the IP address, how to quickly find the duplicate IP address? Ask for advice. The ...

Python pandas data-Mining preprocessing

Mar.25,2021
Pandas multiple grouping statistics
Ladies and Gentlemen, I would like to ask you a Pandas grouping question, which I feel is more complicated. df = pd.DataFrame({"Date":pd.date_range(start= 2018-08-17 08:10:30 ,periods=15,freq= s ,normalize=True),"Category":list(...

Python pandas group-by

Apr.22,2021
Abnormalities in the comparison of two groups of numpy data
1. When I am converting the real data of two dataframe tables into numpy data for column content comparison, I encounter the following situations: input: individual elements of different arrays d12.values [0] [0], d12.values [1] [0], d11.values [...

Python pandas numpy coded

Apr.26,2021
How does pandas convert the df of figure 1 to the df of figure 2
I wrote a part here, that is to say, first convert the original df into a dictionary, and then create a new dictionary and then operate on these two dictionaries, but there seems to be a problem with the result of a large amount of data . ...

Python pandas

Nov.16,2021
The speed of pandas read_sql is too slow. It takes about 10 seconds for 10W rows of data. Is there any optimization plan?
pandas read_sql is too slow 10W rows of data take about 10 seconds. Is there an optimization plan ? ...

Python pandas

Dec.02,2021
How do I display the columns specified in the DateFrame?
use pandas s read_csv to read a dataset import pandas as pd dfoff = pd.read_csv (xxx.csv , keep_default_na=False) pd.set_option ( display.max_columns , None) print (dfoff.head (3)) found that he has 6 columns of data User_id print(dfof...

Machine-learning python pandas

May.16,2022
Python time processing: know the start time and end time, count the number of times per minute
know the start time and end time of each sample, as shown in the following figure: hu is the unique value of the sample. It is known that time1 and time2 are the start time and end time of the behavior, respectively. Now, we want to count the number...

Python pandas

May.19,2022
How do the DataFrame objects of Pandas group and sort and retain the order of the groups?
encountered a problem and simplified it. has a dataframe df = pd.DataFrame([[ a , 1, c ], [ a , 3, a ], [ a , 2, b ], [ c , 3, a ], [ c , 2, b ], [ c , 1, c ], [ b , 2, b ], [ ...

Python pandas

Jun.15,2022
How does pandas delete the row where a column of data conforms to a regular expression?
two columns of existing data, as shown in the following figure: then I want to delete the data in column a that matches a regular expression (such as the beginning of 0002). How should I write it? add: the above is just an example, because it doe...

Python pandas

Jun.16,2022
In Pandas, the order of variables changes unexpectedly after adding Series to the empty DataFrame?
the reason has been found. The order of variables from append to the data box will be adjusted automatically when the column tag of DataFrame is not set beforehand. df = pd.DataFrame() series=pd.Series([3,4,1,6],index=[ b , a , d , c ]) df=df.a...

Python pandas data-Mining

Jun.22,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-536a50f-1d71.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-536a50f-1d71.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?