Research Software Reactor

Supporting researchers, research software engineers, and technical specialists in the application of cloud computing for the research community.

DevOps for better software and reproducible research
9-10 January 2020
The Alan Turing Institute, London

24 October 2019

Devops

DevOps for better software and reproducible research

TL;DR

  • What: An open event for RSEs, researchers and those interested in making research software more robust through DevOps, CI/CD, testing, and automation among other practices. This is an open community event for the community.
  • When: 9th-10th January 2020
  • Where: The Alan Turing Institute, British Library, 96 Euston Road, London NW1 2DB https://www.turing.ac.uk/
  • Registration: External registration on Eventbrite

Background

Lets face it, developing software for science and engineering is really hard. First there are the garden variety bugs that you find in all software. As these are so common are many developer tools that help identify these errors. However, for science and engineering software there are a whole host of additional ways software can be incorrect - the physics might be messed up (Mars Climate Orbiter), errors may be present in the underlying mathematical methods or an inappropriate numerical method used in a given context, performance bugs, ML components that sometimes give unexpected output, etc. A single piece of software may cut across all these domains of expertise so it is incredibly difficult for a single individual to have enough time and knowledge to even understand all the different ways that the software might give incorrect results. The vital importance and complexity of code testing and verification is such that it needs to be an integral part of software development itself and a shared responsibility among all researchers involved.

Rigorous research software testing is often considered far too labor intensive, expensive and thankless of an activity. The whole incentive structure within academic research is heavily weighted towards maximizing the number of journal publications. There is rarely any real motivation for putting effort into making sure that research software is reusable by others (where ‘others’ may be their future selves). At the same time one of the most infuriating statements researchers learn to hate is ‘…but it works on my laptop’, and researchers can even struggling to reproduce their own results when we are revising a paper submitted 6 months before. This is not only bad for research but also greatly limits the potential social, and economic impact of the research.

Thankfully the rapid growth in digitalization and Cloud computing in recent years has brought with it a greatly enhanced software development ecosystem. Today fully automated software testing is for the most part free and relatively simple to set up. The technology barrier has been greatly lowed to the extent that it is now hard to justify not adding automated testing to your research software. The challenge now is to increase awareness throughout the research community that quality research software testing and reproducibility is shifted from being an idealistic goal to being an integral part of research that increases quality and productivity.

That is not to say software testing and reliability is a solved problem. We can always do more and better testing - and this means better, more reliable and reproducible research. Some areas present specific challenges for code verification and validation which will continue to be difficult topics. However, this does not detract from the fact that some level of automated testing and deployment can always be achieved as the outcome will be faster research rather than spending your days chasing down bugs or indeed trying to figure out why it only ‘works on my laptop’.


The overall aim of this sprint is to increase the reliability of software and reproducible data and computational research through the use of modern DevOps practices and identify how DevOps needs to be improved to better support the research community. —

Sprint objectives

We will be adopting an “unconference style” for this event. Meaning that we do not have a fixed agenda of the projects we will be working on but the community will decide the topics and tools they are more interested in discussing/exploring. Some topics that could be explored are:

  • Training (Docker, CI/CD, testing).
  • Adding CI/CD to existing research software projects.
  • Pushing the frontier of CI/CD (multiple architectures, HPC).
  • Adopting DevOps practises for scientific computing.

Bring your own project

Feel free to bring your own code to work on or join a team working on a code you are interested in. If you want to propose your own project/code then please open an issue on GitHub giving details of the software and what you would like help with. Similarly, browse github issues to see what existing projects you would like to participate in. The sprint will begin with an elevator pitch from each proposed project to recruit their dev team for the sprint.

Who should attend?

Folks interested in improving the reliability of research software or interested in Continuous Integration, Continuous delivery and DevOps practices. You will have some coding experience and should be prepared to get involved in the sessions and discussion including leading topics.

What should you bring?

Make sure to bring a laptop.

Tea, cofee and lunch will be provided on both days.

Agenda

Day 1

  • 08:30 - Welcome coffee
  • 09:00 - Introduction
  • 09:10 - Project pitches and setup teams
  • 09:30 - Sprint
  • 12:30 - Status update
  • 13:00 - Lunch
  • 14:00 - Sprint
  • 16:30 - Status update
  • 17:00 - Social

Day 2

  • 08:30 - Welcome coffee
  • 09:00 - Sprint
  • 12:30 - Status update
  • 13:00 - Lunch
  • 14:00 - Sprint
  • 16:30 - Status update and lessons learned
  • 17:00 - Social

Registration

Registration has not yet been opened. Please use the contact form on this site to register your interest.

Code of conduct

All the attendees and organisers should abide by The Research Softwaree Reactor Code of Conduct which can be found on this site.

Preparation work

Background training material and talks that will help you prepare for the day are listed below. Don’t worry if you don’t feel like a DevOps guru before you arrive as there will be lots of opportunities to ask questions of fellow practitioners and Azure experts over the course of the sprint. Depending on demand we may have break out sessions to walk through specific topics.

Tutorials and documentation:

  • DevOps for the busy data scientist
  • Git: http://swcarpentry.github.io/git-novice/
  • Test framework based on programming language, e.g.
    • Python - pytest - https://docs.pytest.org/en/latest/
    • C/C++ - googletest - https://github.com/google/googletest
  • Docker and containerizing an application
    • https://docs.docker.com/get-started/
    • https://docs.docker.com/get-started/part2/
  • Azure Pipelines
    • https://azure.microsoft.com/en-gb/services/devops/pipelines/

Suggested reading on reproducibility in computational research:


Organisers

Sponsors

Logo