the database has 10w records, which may increase to 20w in half a year, but it should not exceed 100w in the end.
Server configuration:
python3.6 celery+rabbitMQ
CVM ubuntu 16.04 1G 1 core
Database postgresql 10, with a limit of 100 connections
the structure of the table is as follows:
The
last_update field is the time of the last request (we need to update each record at least once within 1 hour, allowing an error of 10 minutes)
uuid field determines the parameter passed to the other party"s api when the request is initiated
the last_update of each record may be different, depending on the time when the record was added. This field changes each time the record is updated
the idea of our current program is:
creates a task An in celery, which works every hour.
queries all records of whose update time is 1 hour ago, and
then splices the queried records with for loop url, to send the spliced Url to asynchronous task B
task B is simple: to request data according to the obtained url, write to the database, and update the last_update field
this way, you only need to create 2 celery tasks, but it doesn"t feel very robust.
it is said on the Internet that celery can support millions of tasks, so I am considering whether to create a celery task for each record.
dare to send a post to ask for help from my predecessors. In my case, which way of thinking is better? Do you have any improvement plans?
Thank you very much