What guidelines should I follow for simulation software projects?
An answer to this question on the Scientific Computing Stack Exchange.
Question
I am not sure whether this question belongs here, but I would like to give it a try and benefit from the experience of the people at scicomp.SE.
From my experience, the software quality in computational science often leaves a bit to be desired. Clearly, established software projects such as PETSc or OpenFOAM are exceptions to that statement. But often there is some old self-crafted code from the 80s that should serve as basis for new research topics but no one really understands what it does. I have the impression that while there are trends and guidelines to make simulation software projects more sustainable, often the developers lack the skills to implement quality assurance measures. Not blaming anyone, this is an interdisciplinary field and in a physics or engineering degree those skills are not taught.
- Are there some established guidelines that one can follow?
Thanks to contributors from other channels and my own research I can provide this list:
Are there further candidates?
- Are there any template projects that provide a clean basis?
In order to make things easier for developers of simulation software projects, I thought it would be helpful to have some boilerplate project that provides a clean start. Is there already such a thing?
So far I have collected several components the boilerplate project should contain:
- Version control system (git)
- Collaboration tool (gitlab)
- Continuous integration (gitlab-ci) for automatic build tests
- Extract documentation from source code (doxygen)
- Automated build system (CMake)
- Framework for unit tests (
Google Test?Catch2) - Code formatting tool (clang-format)
- ...
but I fear that the list is not exhaustive and would be grateful for your experiences and/or pointers to literature.
EDIT: The second question has lead to a project: Check out bertha, the skeleton [1, 2]! The goal is to install this project as template in GitLab so that with a click an instance of a working C++ project that follows the best practices for simulation software projects is generated. Of course, it is also possible to copy the files manually. Anyway, instead of starting from scratch the skeleton project provides a solid base. Alternatively, one can cherry-pick things from it if there is already an established software project.
At the moment, bertha features an automated multi-platform build system, automatic documentation generation, and supports unit tests using the Catch2 framework. It is about to be extended, so if you have any input do not hesitate to contact me or raise an issue on GitLab.
[1] https://gitlab.com/cph-tum/bertha
[2] https://arxiv.org/abs/1912.01640
Answer
"developers lack the skills".
Maybe.
I think it's much more likely that the developers lack the incentives. Making solid code is difficult and expensive and, in academia, comes with minimal-to-negative reward. You're asking for a list of things of guidelines, but all of your examples are specific to the technical situation, not the social situation. That's asking for trouble.
One way to get good software is to change the incentives. In my work as an editor, I send papers back if the authors haven't released their source code or that source code doesn't meet my (admittedly self-defined) standards. Some journals, such as JOSS, take this farther and have guidelines for what they expect to see. If you find yourself in a position of power as a reviewer or editor, use that influence to help move your field into the 21st century.
If you're a student, or mentor students, you should know that it's hard to get tenure. A reasonable person will therefore seek to develop diversified skills during their PhD. They say that Github is the new resume. Having solid, unit-tested, documented code is a valuable indicator for alternative academic tracks (research programmer) as well as for industry and government. Use this as a carrot for yourself and others.
As the JOSS guidelines say, you should have a contributor guidelines for your project and maybe a PR template. If you want good code, you need to make it easy for people to help you build it. If you're in a senior position, you also need a way of educating your mentees and, especially, yourself. Programs like Software Carpentry can help with this.
In short, with some remarkable exceptions, software is only as good as the incentives which produce it.
I also highly recommend the paper "Good enough practices in scientific computing".