1. Json data is now available as follows
{"id": 11, "data": [{"package": "com.browser1", "activetime": 60000}, {"package": "com.browser6", "activetime": 1205000}, {"package": "com.browser7", "activetime": 1205000}]}
{"id": 12, "data": [{"package": "com.browser1", "activetime": 60000}, {"package": "com.browser6", "activetime": 1205000}]}
.
, the activation time of app in json is to analyze the total activation time of each app
I use sparK sql to parse json
val sqlContext = sc.sqlContext
val behavior = sqlContext.read.json("behavior-json.log")
behavior.cache()
behavior.createOrReplaceTempView("behavior")
val appActiveTime = sqlContext.sql("SELECT data FROM behavior") // sql
appActiveTime.show(100,100) // dataFrame
appActiveTime.rdd.foreach(println) // rdd
but the printed dataFrame looks like this
+----------------------------------------------------------------------+
| data|
+----------------------------------------------------------------------+
| [[60000,com.browser1], [12870000,com.browser]]|
| [[60000,com.browser1], [120000,com.browser]]|
| [[60000,com.browser1], [120000,com.browser]]|
| [[60000,com.browser1], [1207000,com.browser]]|
| [[120000,com.browser]]|
| [[60000,com.browser1], [1204000,com.browser5]]|
| [[60000,com.browser1], [12075000,com.browser]]|
| [[60000,com.browser1], [120000,com.browser]]|
| [[60000,com.browser1], [1204000,com.browser]]|
| [[60000,com.browser1], [120000,com.browser]]|
| [[60000,com.browser1], [1201000,com.browser]]|
| [[1200400,com.browser5]]|
| [[60000,com.browser1], [1200400,com.browser]]|
|[[60000,com.browser1], [1205000,com.browser6], [1205000,com.browser7]]|
rdd is like this
[WrappedArray([60000,com.browser1], [60000,com.browser1])]
[WrappedArray([120000,com.browser])]
[WrappedArray([60000,com.browser1], [1204000,com.browser5])]
[WrappedArray([12075000,com.browser], [12075000,com.browser])]
and I want to convert the data into
com.browser1 60000
com.browser1 60000
com.browser 12075000
com.browser 12075000
.......
is to change the array elements of each row in rdd into one row. Of course, it can also be other structures that are easy to analyze
because I am a beginner in spark and scala, I have tried unsuccessfully for a long time, so I hope you can guide me.