how can a data analysis project be versioned effectively?
where version control is required and which is not.
how to manage the charts generated in the data analysis project.
how can a data analysis project be versioned effectively?
where version control is required and which is not.
how to manage the charts generated in the data analysis project.
basically my plan is to use jupyter notebook
. Put some intermediate results (stored in Pickle) and functions used by Pipeline in the tool module, then display the version by the label of Notebook, and finally use git
to do version control. For example:
-- project
|__ data:
|__ SQL:SQL
|__ pickle:
|__ src:Notebook
|__ notebooks:
|__ 0.0 contents and introduction.ipnb:notebook
|__ 1.0 EDA.ipnb
|__ 1.1 .ipnb
|__ 1.2 .ipnb
|__ 2.0 EDA.ipnb
|__ ...
|__ end.0 .ipnb
|__ temp_module:notebook
|__ README
Previous: Sorting the number of mysql subquery statistics causes the query to be too slow
Next: How to solve the statistics of the number of likes in the Chinese chapter of the website?