Last year at and around PDW, a bunch of people were talking about doing software discovery for software projects – especially Python-related – in planetary science. We were hoping to get information that might guide development efforts and help make believable claims about software needs. The project as a whole didn’t end up going as far as we’d liked, but we did conduct a survey. I wrote a summary and analysis report in October that I’ve just been sitting on, and I figured I might as well release it in case it’s of any use to anyone. It, along with the raw text of the survey, are on github here: https://github.com/MillionConcepts/software-survey-2019. I’m also going to briefly discuss some of these results at the OP lunch next week.
Because the project wasn’t finalized, we didn’t collect formal authorship information, but I believe the major contributors to survey design were Michael Aye, Angeline Burrell, Trent Hare, Jason Laura, Andrew Annex, MaryJon Barrineau, Chase Million, and me. I should note that Andrew conducted a software usage survey a few years prior that was part of the motivating impulse for this project. I should also note that the report is mine and does not necessarily represent the opinions of any of those contributors or their affiliate organizations. Let me know if I’m missing any names.
Hi ! thanks for the report.
I have a first comment on the survey itself (but I should have probably spoke up before ): the survey questions look very “planetary-surfaces”-centric to me. The tools or python libraries I use for planetary magnetospheric data analysis are not cited in the survey.
I’m also a bit puzzled by one point of the report: “Several non-Python users also listed issues of this type (such as reading and writing CDF files), presumably as reasons they avoid Python”. I wonder what leads to this. We have to hear what is said here, but my personal experience is that I switched to python specifically because the management of CDF data (reading and writing) is so much simpler in Python than in IDL or Matlab… (nice pythonic API, versus C-like low-level API).
Point 1: I agree. In retrospect, the survey reveals some subdisciplinary biases, both towards ‘surfaces’ and more broadly towards imaging and remote sensing. If I work on more projects of this type in the future, I’m going to try to make them more inclusive of planetary science as a whole.
Point 2: Most of the write-in responses that mentioned specific data formats were fairly terse. I would note, though, that user perception that existing tools / capabilities are inadequate can easily coexist with the availability of excellent tools. For instance, in the presence of excellent CDF handling tools, we might interpret a complaint about CDF handling as “actually” being about discoverability or documentation of those tools. In that case, we might want to consider whether improved education, advertising, documentation, or software integration work might improve the utility of those tools; developing new CDF handling tools would probably be counterproductive.
on point 2: yes, I agree.
There is an effort done by the PyHC (Python Heliophysics Community, http://heliopython.org, which also covers planetary plasmas) to try to organise the community around core python packages (as we are doing on planetary sciences). They indeed noted that there were at least 3 packages for interfacing with CDF data, which is not good. One of them is very pythonic (
spacepy.pycdf) and is used a lot, but the others have more IDL- or Matlab-like interfaces (e.g.,
cdfpy), and people transitioning from those languages usually prefer the alternate less-pythonic packages…
At the some point we could try to coordinate and exchange good practices between the Planetary Science python community and the Heliophysics one. After all, both communities use FITS, CDF, NetCDF, Spice, coordinates, projections, spectral analysis, etc…
Totally agree with your points. This effort was in part inspired by my interactions with the Heliophysics community (I am slightly involved with PyHC for example due to my work developing SpiceyPy). Planetary Science is extremely diverse and has many sub fields that don’t communicate so it is understandable that some things may have been left out. But this is also the great thing about the survey, as we are learning things about how “the community” uses Python or not and why!
One of my hopes of the survey would be to help begin to educate the community to this need to coordinate!
You can add another to your list: https://github.com/MAVENSDC/cdflib
This seemed to work well for me, and has the advantage of being a pure python implementation. Easy installation & maintenance, hopefully.
Hi Dave, yes, that’s one of the handful of python CDF modules. It has the nice feature of being pure-python (no “painful” install CDF C library, although it’s not always that painful, its just not provided out of the box). However, it’s interface is still close to that of the IDL or Matlab one. When I mentioned the people transitioning from IDL or Matlab to Python, and looking for CDF support, I was thinking of that module. However, at this point I find the
cdflib interface not as pythonic as the
pycdf one. One example from the
cdf_file = cdflib.CDF('/path/to/cdf_file.cdf')
x = cdf_file.varget("NameOfVariable", startrec = 0, endrec = 150)
whereas, with pycdf:
from spacepy import pycdf
cdf_file = pycdf.CDF('/path/to/cdf_file.cdf')
x = cdf_file['NameOfVariable'][0:150]
Within PyHC, there is a discussion to try to converge towards a common interface, merging the pure-python of
cdflib and the more standard python interface of