for example, I have a corpus:
-sharp corpus = [" ", -sharp
-sharp " ", -sharp
-sharp " ", -sharp
-sharp " "] -sharp
when I run:
vectorizer = TfidfVectorizer(min_df=1)
vectorizer.fit_transform(corpus)
print(vectorizer.transform(corpus).toarray()
comes out normally as follows:
[0. 0.52640543 0. 0. 0. 0.52640543
-
- 0.66767854. 0. 0.
[0. 0. 0.52547275 0. 0. 0.41428875
0.52547275. 0. 0. 0. 0.52547275]
[0.4472136. 0. 0. 0.4472136.
- 0.4472136. 0.4472136 0.4472136 0. ]
[0. 0.6191303 0. 0.78528828 0. 0.
-
- ]
but when my corpus list is very large, for example, when the length is 1w.
I"m running the above code, and every row of the matrix is 0, except for one.
Why is this?