I just used spark streaming. I have a few questions about checkpoint: There are two types of checkpoint, one for meta for driver and one for data. The manual says that the checkpoint of data will be written only if you use stateful transformation. So...
the data source is kafka, and a field is a timestamp. We want to calculate the difference between the timestamps of the two pieces of data, and then add a new field to store this value and send it out. I checked. Do you want to reducebykeyandwindow? Wit...
error log com.slhan.service.BusinessService the 341 line is to get the value of the broadcast variable 18 09 08 13:50:02 ERROR scheduler.JobScheduler: Error running job streaming job 1536385800000 ms.1 java.io.IOException: com.esotericsoftware.kr...
purpose: there are two large pieces of data in spark that require join,. Both input data contain the field userid. Now you need to associate them according to userid. I hope to avoid shuffle. completed: I pre-processed two pieces of data into 1w f...
1. Json data is now available as follows { "id ": 11, "data ": [{ "package ": "com.browser1 ", "activetime ": 60000}, { "package ": "com.browser6 ", "activetime ": 1205000}, { "package ": "com.browser7 ", "activetime ": 1205000}]} { "id ": 12...
the figure is as follows: def update_model(rdd),mixture_model: it s OK to declare mixture_model directly in update_model, but every time you foreachRDD, you need to re-declare MingleModel. Makes it impossible to update the model in real time ...