Spark2.1 does cache to the temporary watch, but it doesn't work. - Codes Helper - Programming Question Answer

Spark2.1 does cache to the temporary watch, but it doesn't work.

problem description

because the temporary table created by DF needs to be queried many times, so the temporary table is cached, but it doesn"t work?
the amount of data is relatively large, about 10 billion, so it is necessary to optimize the efficiency.

the environmental background of the problems and what methods you have tried

before that, I also tried to repeatedly submit spark tasks in a multi-threaded way, and also cached temporary tables, but it didn"t seem to work either.

related codes

/ / Please paste the code text below (do not replace the code with pictures)
this is the normal way to submit the spark task, the code is as follows:

val data: DataFrame = sparkSession.read.parquet("XXX") 

data.createOrReplaceTempView("table_info")
sparkSession.catalog.cacheTable("table_info")

// sqlsql
val featureArray: ArrayBuffer[String] = StrsUtils.generateFeatures("type,`from`,page,value,source")

for(i <- 0 to 3) {

    val all_type = featureArray(i)

    val sql_merge =
        s"""
          | SELECT
          |      appid, soft_version, id, event_type, type, `from`, page, value, source,
          |      count(distinct cuid) as uv,
          |      sum(pv) AS pv,
          |      sum(duration) AS duration
          | FROM(
          |      SELECT cuid, appid, id, event_type, soft_version, duration, pv, event_day, $all_type
          |      FROM table_info
          | )tmp
          | GROUP BY appid, soft_version, id, event_type, type, `from`, page, value, source
          |
        """.stripMargin

    logger.info("merge_sql: " + sql_merge)
    sparkSession.sql(sql_merge).repartition(1).write.mode("overwrite").parquet(s"XXX/result/$event_day/" + i )
}

//cache
sparkSession.catalog.clearCache()
sparkSession.stop()

what result do you expect? What is the error message actually seen?

Stage;
StageDAG:

the time of this Stage is about 3-4 minutes. Normally, the first cycle will be cached, and the second and third cycles will directly query from the cache, but the later DAG diagram is the same as the first one, and the time is about the same, so I think it may be that cache to the temporary table does not work.

Scala

Jun.16,2022

Previous: While nested Loop in python?

Next: In Huawei cloud blockchain application development, how to use Node.js to complete the message body signature of its REST API chain code call request?

The problem of Object inheritance in scala
: class Test{} object Driver extends Test{ } : abstract Fruit{ val name :String val color :String } object Fruit{ object Apple extends Fruit("apple","red") val menu=List(Apple) } question: (1) Why object in scala can al...

Scala java

Feb.28,2021
Why can't you access the properties of an object in the method of spark: scala?
how to understand the content of the green part? Why does it feel so awkward? the feeling in the book is also very vague. ...

Spark scala

Feb.28,2021
Syntax understanding of scala function
override def generateJob(time: Time): Option[Job] = { parent.getOrCompute(time) match { case Some(rdd) => val jobFunc = () => createRDDWithLocalProperties(time, displayInnerRDDOps) { foreachFunc(rdd, time) } ...

Scala

Mar.01,2021
On the understanding of scala Grammar in Spark
val lines: Dataset[String] = session.read.textFile("") val words: Dataset[String] = lines.flatMap(_.split(" ")) linesdataSetflatMapdataSetIDEAflatMap: def flatMap[U : Encoder](func: T => Traversabl...

Java scala spark

Mar.03,2021
How does Scala get all the child objects that inherit trait
trait Base object A extends Base object B extends Base object C extends Base class D extents Base object Base { TODO trait Base object TODO akka actor event Base } when using akka to throw a Event event, I want to in...

Java scala akka

Mar.04,2021
Kafka.common.KafkaException: Wrong request type 18
use a simple java client to send a message after simulating the configuration of a kafka server, but without message storage, you can only see the error log all the time (the message is really not stored) kafka.common.KafkaException: Wrong request type...

Big-data kafka scala hadoop java

Mar.05,2021
There is a small problem encountered in data analysis and data extraction. The newcomer is looking for an answer.
Today, I was working on a machine learning project and encountered a small problem. I want to extract a column of data using the regular expression and then run it and find that there is no name1 data. This symbol is very puzzling for the Great God s ...

Github scala ruby javascript python

Mar.19,2021
Can UDF in sparkSQL use query statements to pass parameters
used sparkSQL to write a UDF, to calculate commission based on performance, but now the demand has changed. The percentage of commission is determined according to position and department. Here s an example spark.udf.register("mmyjtc", (yj: ...

Apache oracle scala

Mar.21,2021
Wrong footnote about hadoop running the word count written by python
this is the article I refer to. The program that runs according to this article https: blog.csdn.net wangato., as shown in the picture, always reports the error in the first sentence. I Baidu has the reason, but has not been able to solve it. 1. Py...

Python java scala hadoop2.7.1 hadoop

Mar.22,2021
Spark sql parses the json of an array of nested objects
1. Json data is now available as follows { "id ": 11, "data ": [{ "package ": "com.browser1 ", "activetime ": 60000}, { "package ": "com.browser6 ", "activetime ": 1205000}, { "package ": "com.browser7 ", "activetime ": 1205000}]} { "id ": 12...

Big-data json spark-streaming scala spark

Mar.28,2021
Why does the first gender print out null?
topic description Why does the first gender print out null sources of topics and their own ideas I ve tried val gender: String = "male " , but the results don t seem to be right. related codes constructor zhu null derived constructor zhu zz...

Scala

Apr.27,2021
Usage of configuration files in spark project
problem description sparksql project, the sql script is placed in the resource sql file below (different businesses, there are a lot of scripts); Local write code to load the sql script using this.getClass.getResource (). GetPath method, get the pa...

Scala spark-submit spark java

Aug.01,2021
Is the Scala Seq prompt used in Spark Dataframe join not serialized?
I want to use the multi-field join function of dataframe in java spark-sql. Take a look at this interface. If you want to have multiple fields join, you need to pass in a usingColumns. . public org.apache.spark.sql.DataFrame join(org.apache.spark.sql.Da...

Scala spark

Oct.24,2021
How does intellij modify the source code of maven jar packages, what plug-ins or other ways do you use?
how does intellij modify the source code of maven jar packages, what plug-ins or other ways do you use ...

Scala springboot jvm spring java

Dec.15,2021
Questions about the apply method of Array
val a = Array a (1) where a (1) should be called the apply method of class Array I don t see where the specific implementation is in the source code. How to get the data here? ** The element at given index. * * Indices start at `0`; `xs.a...

Scala

Jan.27,2022
Regular expression memory overflow problem
problem description regular expression memory overflow, JVM can t check memory, accumulate all the time, and then the program hangs the environmental background of the problems and what methods you have tried delete the second set of parentheses ...

Node.js scala c-sharp java javascript

May.14,2022
Why does the sortBy function of Spark generate 4 MapPartitionsRDD?
execute two programs in spark-shell: first paragraph sortBy: val list1: List[(String, Int)] = List(("the", 12), ("they", 2), ("do", 4), ("wild", 1), ("and", 5), ("into", 4)) val listRDD1: RDD...

Scala spark

Jul.11,2022
An algorithm problem
question: there are two actions for a user to invest and refund (for example, if An invests 100000 yuan, then he will get a rebate of 10 yuan after maturity). When the user does not make any investment within 6 months, the next investment will be regard...

Scala java

Jul.11,2022

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-3e25347-241d.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-3e25347-241d.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?