There are two sets of data. Each set of data has 500 million url, but only 4 gigabytes of memory. How to find the same two url? in these two sets of data?

this is an interview question for Ali, which has been bothering me for a long time. Please.
there are two sets of data. Each set of data has 500 million url, but only 4 gigabytes of memory. How to find the same two url? in these two sets of data?

Jul.12,2021

interview-Ali -. Big data title-given two files an and b, each storing 5 billion url, each url occupies 64 bytes, and the memory limit is 4G, which allows you to find out the common url? of files an and b

.

there is a similar topic, which adopts the idea of divide and conquer.


take a guess, sort first and then divide into blocks?

< hr >

there are dozens of such problems every year, but big companies are just different. Tsk

MySQL Query : SELECT * FROM `codeshelper`.`v9_news` WHERE status=99 AND catid='6' ORDER BY rand() LIMIT 5
MySQL Error : Disk full (/tmp/#sql-temptable-64f5-1e9f190-1a31.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
MySQL Errno : 1021
Message : Disk full (/tmp/#sql-temptable-64f5-1e9f190-1a31.MAI); waiting for someone to free some space... (errno: 28 "No space left on device")
Need Help?