I don"t understand a paragraph in the official website introduction to parallel:
For better parallelism GNU parallel can distribute the arguments between all the parallel jobs when end of file is met.
Below GNU parallel reads the last argument when generating the second job. When GNU parallel reads the last argument, it spreads all the arguments for the second job over 4 jobs instead, as 4 parallel jobs are requested.
The first job will be the same as the-- xargs example above, but the second job will be split into 4 evenly sized jobs, resulting in a total of 5 jobs:
cat num30000 | parallel-- jobs 4-m echo | wc-l
Output (if you run this under Bash on GNU/Linux):
5
the above is clearly divided into 4 job,. Why is the result 5 lines?
secondly, according to the above statement, will parallel read the file first and then assign the contents of the file as parameters to each job? If the file is very large, won"t it take a lot of time to read the file and redistribute it? For example, counting the number of lines in a large file should take more time than directly wc-l
to read the file before allocating tasks (just counting the number of lines).
what"s even weirder is that the result of running on my computer is 6 lines?
[10:01 sxuan@hulab ~]$ cat num30000 | parallel --jobs 4 -m echo | wc -l
6
Thank you!