I just used spark streaming. I have a few questions about checkpoint:
-
There are two types of
- checkpoint, one for meta for driver and one for data. The manual says that the checkpoint of data will be written only if you use stateful transformation. So, will I write if I don"t use stateful transformation,data "s checkpoint data? If I don"t write, where can I get the lost rdd when I re-restart? The checkpoint in
- batch job I can specify which rdd to write, or at which step to write checkpoint. So in streaming, do I need to use foreachRDD {rdd.checkpoint ()}?
- if I don"t show write rdd.checkpoint () in spark streaming, how does spark decide which rdd should write the data checkpoint file?