The problem of MySQL multi-table query mode
problem description
there are two tables in the database, namely a
and b
, with the following structure:
2created_at
2
:
the previous attempt is to use MySQL"s UNION
to merge the data of the two tables, then as a subquery, and finally group the count. The SQL is as follows:
select DATE_FORMAT(created_at, "%Y-%m-%d") as day, count(*) from (
select id, created_at from a UNION ALL select id, created_at from b
) as c group by day;
< table >
< thead > < tr >
< th > day < / th >
< th > count (*) < / th >
< / tr > < / thead >
< tbody >
< tr >
< td > 2018-12-24 < / td >
< td > 2 < / td >
< / tr >
< tr >
< td > 2018-12-21 < / td >
< td > 1 < / td >
< / tr >
< tr >
< td > 2018-12-20 < / td >
< td > 1 < / td >
< / tr >
< / tbody >
< / table >
you can query the expected data
but the question I want to ask is: is there a problem with this (when the amount of data is large)? Then is there any other better way
mysql> desc a;
+-------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+---------+------+-----+---------+-------+
| id | int(11) | YES | | NULL | |
| t | date | YES | MUL | NULL | |
+-------+---------+------+-----+---------+-------+
a,b
alter table a add key(t);
alter table b add key(t);
select c.t,count(*) from (
select a.t,count(*) from a a group by a.t
union all
select b.t,count(*) from b b group by b.t
) c
group by c.t;
the index on the t column can be used in this way.
your SQL does not use any indexes, and both tables an and b are scanned all over the table. There is no problem when the amount of data is small, but if the amount of data exceeds 1 million, the performance problem will be highlighted.
it is not clear what type your created_at
field is, but from your code DATE_FORMAT (created_at,'% Ymurf% MMI% d')
, it should be DATETIME or TIMESTAMP, field that is not very friendly to your scene. If it were me, I might add a field of DATE type created_at_date
to solve this problem:
ALTER TABLE `a` ADD COLUMN `created_at_date` DATE;
ALTER TABLE `b` ADD COLUMN `created_at_date` DATE;
ALTER TABLE `a` ADD INDEX `created_at_date` (`created_at_date`);
ALTER TABLE `b` ADD INDEX `created_at_date` (`created_at_date`);
for existing data in the table, set created_at_date
to DATE ( created_at
):
UPDATE `a` SET `created_at_date` = DATE(`created_at`);
UPDATE `b` SET `created_at_date` = DATE(`created_at`);
if you don't want to change the application layer code, you can create a TRIGGER, to automatically set this field.
secondly, I'm not sure about your specific needs. I don't know if the date is counted every time it appears, or once as long as the date exists in the table. If this is the second case, the answer upstairs is actually fine. You can use the index of the DATE column:
select c.t,count(*) from (
select a.t,count(*) from a a group by a.t
union all
select b.t,count(*) from b b group by b.t
) c
group by c.t;
if this is case one, judging from the SQL you wrote, this may be the case. At this point, it is time to take out the DATE columns of the two temporary tables to do FULL OUTER JOIN, and then calculate a.count + b.count. However, MYSQL does not support FULL OUTER JOIN,. You can use LEFT OUTER JOIN UNION RIGHT OUTER JOIN
Tips to solve the problem. The complete SQL is as follows (for reference only):
SELECT ta.`created_at_date`, ta.c + IFNULL(tb.c,0) cnt FROM
(
SELECT `created_at_date`, COUNT(*) c FROM a GROUP BY `created_at_date`
) AS ta LEFT OUTER JOIN
(
SELECT `created_at_date`, COUNT(*) c FROM b GROUP BY `created_at_date`
) AS tb ON ta.`created_at_date` = tb.`created_at_date`
UNION
SELECT tb.`created_at_date`, tb.c cnt FROM
(
SELECT `created_at_date`, COUNT(*) c FROM a GROUP BY `created_at_date`
) AS ta RIGHT OUTER JOIN
(
SELECT `created_at_date`, COUNT(*) c FROM b GROUP BY `created_at_date`
) AS tb ON ta.`created_at_date` = tb.`created_at_date`
WHERE ta.c IS NULL