How to code the words in the thesaurus efficiently?

I have two plans now. One is to grow directly with numbers

.
// let weight=
//     {
//         "": 10,
//         "": 5,
//         "": 7,
//         "": 4,
//         "": 7,
//         "ufo": 3,
//     }

the other is to parse the characters in utf8.

let str=""

function hash(str)
{
    let strcode=0
    for (const iterator of str) 
    {
        strcode += iterator.codePointAt(0).toString(2)
    }
    return strcode
}

console.log(hash(str))
//0101011011111101

but the encoding of both still cannot reduce the amount of data.
calculate this so that the text similarity can be calculated later. Thank you.

Jul.07,2022

the vectorized text before calculating similarity can also use TF-IDF, LSI and other models


coding can not reduce the amount of data, compression can reduce the amount of data.

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1b2b9ba-33e76.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1b2b9ba-33e76.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?