How does pandas delete the row where a column of data conforms to a regular expression?

two columns of existing data, as shown in the following figure:

clipboard.png

then I want to delete the data in column a that matches a regular expression (such as the beginning of 0002). How should I write it?

add: the above is just an example, because it does not necessarily start with XXXX, so I prefer to use re. Regular expressions to match. Thank you for the answer downstairs.

another: now there is another small problem, that is, there may be more than one match, so you need to do a loop. The code is as follows:

        for shield in shields:
            shield = shield.strip()
            print(":", shield)
            data.loc[:, "c"] = data["a"].map(lambda x: 1 if re.match(shield, x) else np.nan)
            data = data[data["c"].isnull()]

will report a warning: C:ProgramDataAnaconda3libsite-packagespandascoreindexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using. Locus [row _ indexer,col_indexer] = value instead

See the caveats in the documentation: http://pandas.pydata.org/pand...
self.obj [item] = s

specifically, in the last sentence, I changed the original data to overwrite data directly, which is not very standard, but I do need to keep overwriting it with new data. How can I write it better?

Jun.16,2022

fun = df['a'].apply(lambda x: x.startswith('0002'))
print df[fun == False]

I guess a column is a hexadecimal number?

del_bool_list = df['a'].apply(lambda x : not str(x).startswith('0002'))
df = df[del_bool_list]

at first, I was misled by the pd.drop () method. I felt that I had to use it to delete it, and then I realized that it was OK to overwrite it directly.
finally, pd.drop () had better have a definite index to reuse, otherwise, it will involve list of Bool values to list of index conversion, simple but not concise

Menu