recently, I contacted kafka, and wanted to get data from kafka every half hour for updating and learning. I found a recommendation from relevant friends, which is to use flume to implement half-hour log rollback. The specific code is as follows
-sharp Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
-sharp Describe/configure the source
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.zookeeperConnect = host
a1.sources.r1.kafka.topics = topic
-sharp Use a channel which buffers events in memory
a1.channels.c1.type = file
-sharp Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
a1.sinks.k1.type = file_roll
a1.sinks.k1.sink.directory = /home/flume_log
a1.sinks.k1.sink.rollInterval=1800
a1.sinks.k1.channel = c1
The log is successfully generated every 1800s, but I don"t know why the log is still empty when there is a request. Is there something wrong with it?
in addition, can I read the data directly from the kafka implementation every half hour, or write the data lines within half an hour to the cache and write them to the file as soon as the half hour arrives?