not feasible.
- append to list is no different from direct one-time reading, which takes up the memory of all data.
- the difference set can only be
set-list
, not list-set
.
conditions are not sufficient. If there are 10 million recorded addresses and there are only a few duplicates, you can sort them first and then map-reduce what comes to mind. If there are 10 million records, of which 9 million are duplicates, it can be solved with hashTable.
IPv4? There are only 4Gi addresses in all. Dig a hole in memory and wait for IP to jump. Waste point, use int8 to save that is 4GB memory, save point, use bit to save as long as 500MB. The idea can be more lively, in fact, I think the restriction of giving IP address is to reduce the difficulty .
IPv6, you may have to divide and divide. The basic idea is to check the first few bits of the address according to the length that the memory can bear, throw the collided ones into the same bucket, and then one by one bucket to see if there are any heavy ones, and you can also divide them down. in fact, DBMS does this all day.
first convert ip into int (a bunch of algorithms on the Internet please Baidu)
then make a large array
such as 192.168.3.4 converted to 3232236292
the corresponding array is int array [3232236292-1] to do the count
can be simplified again, because the array has to be pre-allocated, here use map
Mr. into a map that contains all ip, such as map < ip,1 >, which is only 10 million. (this can be optimized) and then read one by one from the file. After reading, take v=map.get (ip) from map. If vroom1, ip,0, and console.log (ip)
, if there is already a repetition of this, the code can be read
from the file one by one, and then read one by one
from the file. After reading, fetch it
from the file.