two columns of existing data, as shown in the following figure:
then I want to delete the data in column a that matches a regular expression (such as the beginning of 0002). How should I write it?
add: the above is just an example, because it does not necessarily start with XXXX, so I prefer to use re. Regular expressions to match. Thank you for the answer downstairs.
another: now there is another small problem, that is, there may be more than one match, so you need to do a loop. The code is as follows:
for shield in shields:
shield = shield.strip()
print(":", shield)
data.loc[:, "c"] = data["a"].map(lambda x: 1 if re.match(shield, x) else np.nan)
data = data[data["c"].isnull()]
will report a warning: C:ProgramDataAnaconda3libsite-packagespandascoreindexing.py:543: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using. Locus [row _ indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pand...
self.obj [item] = s
specifically, in the last sentence, I changed the original data to overwrite data directly, which is not very standard, but I do need to keep overwriting it with new data. How can I write it better?