the source of the problem: now both in the book and online are to divide a bunch of data sets with known prediction results into test sets and training sets, and then look at the accuracy between the prediction results and the real value, and report what. Ask weakly, I have a dataset, and now I want to predict the results of the data that have no results. The question comes: how to operate the data to be predicted and how to carry out feature engineering together with the training set and the test set? That is to say, how to do feature engineering with data sets without prediction results and those used for training. Then the predicted results are solved.
"""2 """
-sharp
x_train,x_test,y_train,y_test= train_test_split(x,y,test_size=0.01)
-sharp
dict= DictVectorizer(sparse=False)
-sharp
x_train = dict.fit_transform(x_train.to_dict(orient="records"))
print(dict.get_feature_names())
x_test = dict.transform(x_test.to_dict(orient="records"))
print(x_train)
"""3"""
-sharp,
dec = DecisionTreeClassifier(max_depth=12,min_samples_leaf=1)
-sharp
dec.fit(x_train,y_train)
-sharp
y_predict = dec.predict(x_test)-sharp-sharp-sharp