Small tips to improve your packages on pypi (for beginners)

I am a big fan of plots. Not for their beauty, but for their dense informational properties. And since I capture download time series on pypi downloads on miscellaneous packages here are my preliminary findings (that needs validations, since my samples are not relevant in terms of size):

  1. After the first release you have 8 days to find your public;
  2. make packages with practical use;
  3. while new releases will reboost your exposure, if you have found your adoptants it will not make miracles (don't spam pypi, it is useless);
  4. README might be the most relevant criteria for early adoption; 
  5. put some actual code use in your README that is revelant.

Here are some funny stuffs I have no way (and not enough knowledge) to check and I dream to have an answer for: 
  1. the impact of the quality of the setup (have you filled in properly your setup, your trove classifiers);
  2. quality measure for documentation and impact on the downloads (I like pathlib's doc better than mine, because it sticks to the facts);
  3. snowball effect due to the reputation of the packager;
  4. is there an optimal templates for doc? (can we correlate a doc structure to a better adoption?)
  5. the impact of documentation presence (either pypi or rtd) on package adoption (this one seems obvious to me);
  6. the  impact of alpha/beta/stable tagging on adoption;
  7. the impact of a source code link in the README, and of a valid home page;
  8. which kind of home page increases adoption? 
  9. what is the optimal number of functionalities (number of class/methods) for a package (is sparse better than dense, simple better than complex) ? 
  10. which metrics are the most significant?

The purpose of the exercise is not to tell how smart I am, but to daydream of some feedbacks  from the QA in the packaging guide or in distribute (like an enhanced python setup.py check). I do lack skills to do all the aforementioned tests so I mainly send a message in a bottle expecting it to drift one day on the shore of the pypi/distribute team :) 

PS : I miss the make test from Perl and I can't figure a way to make my unittest mandatory prior to the installation of my package, and to have an automated feedback when pip install fails to improve my packaging. As a result if pypi would state the percentage of failed install over successful install I would be delighted.

Fixed window of ~1 week for finding your public



If we consider all the new packages I monitored (gof, archery, weirddict) you'll notice a charging law in the form of DLmax(1-e(-time/4d)) + adoption_rate*time.

ex :

Possible Explanation: 



RSS feeds will propagate, and your package will be visible for this period on various places and on sites such as https://pythonpackages.com/.

How to validate



For any new package : the ratio of download  after 4 days and 8 days should be the same. And the ratio after 16 days / 8 days should be far less than 4 days / 8 days. Most non adopted packages should have a flat download curve.


Impact


What if packages older than x years without adoptants were cleaned of pypi?

A package that meet its public will have an almost linear growth




Explanation



Well on this one it is just observation :)

How to validate


For any download curve that don't follow the DLmax(1-e(-t/4d)) do the diff after 16 days for period without releases and then check the diff is constant +-10%.


A new release will give you a new chance in adoption


 A new release gives you a second chance but is is needless to spam pypi with new releases since your rate of adoption won't normally change.

Explanation



Your new release will encourage migration anyway on one hand, but on the other hand if you have met your demand, then your «adoption market» is already saturated.

Validation


Do the linear diff of the curves 10 days before and after a release on a package and check that the growth ratio is the same.

Practicality beats purity



Given two packages : VectorDict (I recommend archery instead) for dict with addition and pypi-stat (used for these graphs) the practical package will have more chance of adoption than the more abstract one.









 

As you can see, VectorDict adoption is mostly due to pypi-stat adoption (200dl/4weeks).



 Explanation



Your fellow coder search pypi for actual solutions to their problems not for a tool in search for a solution. Coders better understand practical use than conceptual ones.


Validation


If you have a configuration with one «abstract package» and one or several «practical package» depending on it subtract the download curve of the practical package from the «abstract» one. I bet growing curve of abstract package is less that practical one. 



Put Some practical code in your README


The new curve for archery has a better growth. The difference is I put some use code. My intuition is putting actual code in the README helps

Validation



Parsing README and measuring downloads after 16 days for the 1st release in two groups : the one with code in it, and the ones without. There should be a bimodal distribution if I am right.

No comments: