coding can be implemented in all four of the above methods. What"s the difference between them? Of course, the length of the script is different. I can already see it.
< H1 > get_dummies method < / H1 >df7 = DataFrame({"key":list("bbacab"),
"data1":range(6)})
dummies = pd.get_dummies(df7.key,prefix = "key")
dummies
dummies
0 0 1 0
1 0 1 0
2 1 0 0
3 0 0 1
4 1 0 0
5 0 1 0
< H1 > LabelEncoder plus get_dummies method < / H1 >
this method is mainly get_dummies
.le = LabelEncoder()
-sharp le.fit(df7["key"])
df7["key2"] = le.fit_transform(df7["key"])
pd.get_dummies(df7.key2)
0 0 1 0
1 0 1 0
2 1 0 0
3 0 0 1
4 1 0 0
5 0 1 0
< H1 > label_binarize method < / H1 >
lab = label_binarize(df7["key"],classes = ["a","b","c"])
lab
array([[0, 1, 0],
[0, 1, 0],
[1, 0, 0],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0]])
columns =
df7.join(pd.DataFrame(lab)).rename(columns = {0:"key_a",1:"key_b",2:"key_c"})
0 0 b 1 0 1 0
1 1 b 1 0 1 0
2 2 a 0 1 0 0
3 3 c 2 0 0 1
4 4 a 0 1 0 0
5 5 b 1 0 1 0
< H1 > OneHotEncoder < / H1 >
onehot_encoder = OneHotEncoder(sparse=False)
integer_encoded = df7["key2"].values.reshape(len(df7["key2"]), 1)
onehot_encoded = onehot_encoder.fit_transform(integer_encoded)
onehot_encoded
-sharp arrylabel_binarize
array([[ 0., 1., 0.],
[ 0., 1., 0.],
[ 1., 0., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.]])
pd.DataFrame(onehot_encoded)
-sharp join
0 0.0 1.0 0.0
1 0.0 1.0 0.0
2 1.0 0.0 0.0
3 0.0 0.0 1.0
4 1.0 0.0 0.0
5 0.0 1.0 0.0