A large number of json files need to be read and re-parsed and imported into elasticsearch
. Json files are saved in different date folders. The size of a single folder is about 80g. The number of json files under the file has not been actually calculated, but there should be 1w +
Open a thread pool, and then multithread reads different json files. The data in json is in the form of array [data1, data2, data3, data4,.]
. I parse the json file, then traverse each data, and parse the data into the desired json format and import elasticsearch
. Through the batch import processing of elasticsearch
, import every thousand
I want to add a function similar to breakpoint continuation, that is, when my program is halfway down, I don"t need to start from scratch, but I re-read the import from the place where it was disconnected last time, but I didn"t realize the clue. I did it with Java
.