my question is that I now have three sentences
case1 = u"to deal with the management in front of the shop in Li Village"
case2 = u "Licun River Patrol"
case3 ="I am doing river management work by the Licun River"to compare these three sentences with the sentences related to the content of "Li Cun River Governance"?
which related technologies are used? Hope to provide relevant solutions
the first type
< H1 > calculate the str1 (input value keyword) and str2 to be compared, and return similarity < / H1 >def simicos (str1, str2):
-sharp str2
word_list = [word for word in jieba.cut(str2)]
all_word_list = [word_list, []]
-sharp str1
word_test_list = [word for word in jieba.cut(str1)]
-sharp
dictionary = corpora.Dictionary(all_word_list)
-sharp BOW
corpus = [dictionary.doc2bow(word) for word in all_word_list]
word_test_vec = dictionary.doc2bow(word_test_list)
-sharp TFIDFtf-idf
tfidf = models.TfidfModel(corpus)
-sharp tf-idf
-sharp print(tfidf[corpus])
similar = similarities.SparseMatrixSimilarity(
tfidf[corpus], num_features=len(dictionary.keys()))
sim = similar[tfidf[word_test_vec]]
-sharp print(sim)
return sim[0]
:
-sharp str1str2
def simicos (str1, str2):
cut_str1 = [w for w, t in posseg.lcut(str1) if "n" in t or "v" in t]
cut_str2 = [w for w, t in posseg.lcut(str2) if "n" in t or "v" in t]
all_words = set(cut_str1 + cut_str2)
freq_str1 = [cut_str1.count(x) for x in all_words]
freq_str2 = [cut_str2.count(x) for x in all_words]
sum_all = sum(map(lambda z, y: z * y, freq_str1, freq_str2))
sqrt_str1 = math.sqrt(sum(x ** 2 for x in freq_str1))
sqrt_str2 = math.sqrt(sum(x ** 2 for x in freq_str2))
return sum_all / (sqrt_str1 * sqrt_str2)
neither of these two schemes can achieve word meaning matching, how to better match a sentence (the published task, the text may be very long, and what the task does every day, how to better match