Research Computing Teams Link Roundup, 4 Sept 2020
Hi!
Our last AMA (Ask Managers Anything) question was:
For non-embedded teams, what do you do to keep researcher clients / stakeholders up to date on progress of work?
We received one answer:
Our communication with stakeholders - leadership, projects we were supporting, research community members - was always a lot more structured than our internal team communications. So everyone working from home wasn’t as big a deal. For our team we’ve had to be a lot more deliberate in creating communications channels to replace the loss of “water cooler” interactions. But we have always maintained pretty scheduled meetings and emails with stakeholders. Walk-ins were pretty rare. So we haven’t lost much. We have a couple of projects that had dashboards of different kinds and those have certainly taken on new importance for having other parties feel like they are “in the loop”.
I think the AMAs have been pretty successful! We got a big initial burst of questions which we’ve gone through. I think I’ll give it a few months and try this again.
Are there any other recurring features you’d like to see in the newsletter?
For now, on to the link roundup.
Managing Teams
There’s a big difference between a team and a working group - Tim Leslie
The distinguishing feature of a real team is mutual accountability.
This 10-word quote nicely summarizes the article. A team isn’t just a collection of individuals with related tasks; it’s a group fo people who feel they can rely on each other’s contributions and hold each other accountable to that. When that is set up and working well, the team is an entity in and of itself.
My own managerial style tends to be a bit more reductionist, and I tend to interject myself into peer interactions more often then is often helpful. When I remember to let go a little bit and encourage team members to rely on each other (and give them the space to work up to relying on each other for increasingly big tasks) the team-formation process can begin. (This article by Warren Lynch is a good introduction to the Tuckman “forming, storming, norming, and performing” model of team formation).
Simple Burnout Triage - Ben McCormick
McCormick suggests one simple question for your team members to make sure they’re not edging towards burnout:
If you take the pace & quality of the last 2 months of your life and repeated it again and again, how long would you be able to sustain it?
If you get an answer ranging from “I could make this work, but..” to “I can’t go on like this”, then that raises increasingly serious red flags. The only non-worrying answer to this question is something along the lines of a genuine “oh no, this is good, I can do this indefinitely”.
Managing Your Own Career
Emotional Resilience In Leadership 2020 Report - Jonny Miller & Jan Chipchase
It’s been a long six months or so, and even if their teams are doing well, a lot of managers are feeling exhausted. Leadership is lonely and tiring at the best of times, but trying to manage a newly distributed team while keeping things on track and juggling the new challenges in our own lives makes it even more so. And if we’re not careful, that can lead to burnout. It is a lot harder and more time consuming to recover from burnout than it is to avoid it.
This is a long read, but if you’re feeling more and more exhausted and stressed it’s worth it; and even if you’re not, just the first section (a couple of pages of a google doc) is worth spending some time with. Some of the key points are:
- We tend to recognize the importance of big and sudden external stressors (“this new project just got dumped on me”), but the low-grade ongoing stressors, external or internal, will get to you just as much.
- Like “sleep debt” - if you’re not sleeping enough you’ll be overtired and it takes more than a couple normal good nights sleep to catch up - emotional/stress debt piles up too and has to be paid off before you get back to “normal”.
- Being stressed out ripples outwards and can bounce back (think of being in a bad mood and so getting into an argument with a colleague that then makes work tense). This can cause avoidable spirals of stress.
When we know we’re stressed and tired we know what to do, but low-grade ongoing stressors can sneak by our defences. Just being aware of them and knowing we can take action to short-circuit spiralling consequences of stress, in ourselves, our team members, or our close ones, can help a lot.
Scaling yourself as an engineering manager - Sally Lait
Speaking of new projects being dumped on you…
When our responsibilities grow, we need to grow too. That means focussing on the truly important, not doing the things that simply don’t make the cut of the priority list, getting the help you need. Not discussed in this article, though it’s at least as important, is delegating tasks and efforts you know how to do well and were doing previously to your team members, helping them grow as well.
This article also gives some time to two items that don’t get discussed enough. First is that the processes you’ve built that were serving you well - your own processes or process with your teams (how you were doing staff meetings, etc) may need to change; these should always be up for reconsideration.
The second is that you’ll need to communicate what’s changing and why to your entire team. You may be less available, temporarily or not, and a team member who could previously chat with you easily is going to assume the worst if suddenly you seem aloof and less communicative. Any changes you make should be communicated clearly and probably repeatedly to your team (and any other affected stakeholders).
What’s it like as a Senior Engineer? - Zain Rizvi
Rizvi, who has been a senior+ developer at Google, Microsoft, and Stripe, talks about what being a senior technical contributor is like in tech.
This is relevant to us because I don’t think we appreciate how those of us who work with digital research infrastructure (software, data, systems) have had to develop skills that are fairly high up the career ladder in industry. Having to be quite self-directed, working on open-ended ill-defined problems, balancing risk and reward, and building consensus around solutions is pretty much table stakes in the world of research computing. I think this leads us to undervalue our own skills.
In terms of new hires it also means we underestimate the amount of coaching that quite technically talented new team members from industry will require in these areas.
Product Management and Working with Research Communities
Ten simple rules to increase computational skills among biologists with Code Clubs - Ada K. Hagan et al.
Bootcamp-style training can be very useful for getting research trainees “over the hump” and starting to be effective with developing software for their own use. But it’s pretty well understood that retention of that material fades quickly unless it’s in regular use. For the majority of attendees who don’t regularly use what they’ve learned afterwards, the benefits of the bootcamp can quickly fade away.
In this article, the authors describe their approach to “Code Clubs” (think journal clubs) to get research trainees ongoing practice with writing personal research software. Sessions can be “BYOC sessions”, where attendees rotate bringing their own code or problem and present it; the facilitator breaks the attendees into sub teams with a very specific goal and (refactor the code to make it more generalizable and more DRY is an example). They can also be more tutorial sessions, where again their is a hands on component but it follows a presentation on a new package or technique.
These ongoing sessions are known to be more effective at building longer term skills, and can follow a bootcamp. The authors give ten rules for facilitators thinking of running such sessions.
Research Software Development
Brittleness and Bureaucracy: Software as a Material for Science - Matt Spencer, Perspectives on Science
This is a paper from 2015 but was recently mentioned on the Society of Research Software Engineering slack. It’s an interesting view of a major software transition for the fluid simulation code Fluidity; the author watched and interviewed the team over a year and a half, during which there was a rewrite due to the original software becoming brittle. This required not just a rewrite but a change of how the team operated:
Fluidity’s robustness was increased by the re-write. But it would be a mistake to think about manipulability [including maintainability/extensibility - LJD] solely as a property of the software itself. Everything depends on working practices. There is no straightforward way to isolate the technology from the wider ecosystem of techniques through which it is brought into use.
There’s lots of great stuff in here, including familiar issues of different members of the community having different ideas of what the long-term goal of the software effort was, the buildup of technical issues which finally result in wholesale change. It’s a short and clear read from an informed outsider about the process.
Implementing Shape-Up - Nolan Phillip
I’ve written before about shape up, the development process out of Basecamp that has longer cycles (6 weeks) than typical agile, and focusses on pitching competing efforts for the next 6 week cycles. As you can likely tell I’m interested in this approach for research software development, as an attempt to to balance thenmedium-term planning cycles needed when you’re genuinely in somewhat uncharted territory with short bursts of execution. (My own default is to focus on the longer term, and I sometimes need to be dragged kicking and screaming back down to the day-to-day and week-to-week focus of execution).
This is a description of how shape up was implemented at one company. In this case it was added entirely on top of a weekly sprint cycle. The first week focussed on planning and shaping the goals for the upcoming 6-week effort, weeks 2-7 focussed on execution, and week 8 was a cool-down week/preparations to begin again.
Introducing Github Container Registry - Kayla Ngan
You’ve no doubt already heard about this suspiciously well-timed announcement from GitHub, following as it did on the heels of Dockerhub’s announcement that they would no longer host and serve container images indefinitely for free.
GitHub Actions, Github packages, and now Github Container registry make for an increasingly compelling solution for testing, building, and making available for deployment. Have you integrated GitHub Actions into your team’s workflow yet, or have you played with Github Container Registry’s public beta already?
Emerging Data & Infrastructure Tools
We Replaced an SSD with Storage Class Memory. Here is What We Learned - Sasha Fedorova
The MongoDB team has been playing with Optane technologies for a bit - we wrote about an earlier experiment with Optane as SSDs. Here they compare using Optane as an SSD vs using an Optane Persistent Memory (PM) module. The underlying NVMe hardware is basically identical, so the difference here is between a device which is sitting on the memory bus vs PCI-attached and using a file system interface.
The takeaway here is that writes are still pretty low-bandwidth (compared to memory). Reads are quite high bandwidth - but DRAM caching can mask that difference quite effectively. The big win is in the latency of new reads:
Latency, and not bandwidth, is where SCM [Storage Class Memory] can shine. In contrast to bandwidth, the latency of reading a block of data from an Optane PM is two orders of magnitude shorter than reading it from an Optane SSD: 1 microsecond vs 100-200 microseconds.
So random read-heavy workloads are where this could make a big difference.
Lightweight Kubernetes - Rancher
K3s, Rancher’s new lightweight kubernetes, has made a bit of a splash since it just recently got certified as a Kubernetes distribution. It’s a highly stripped down kubernetes and it bills itself as:
K3s is a highly available, certified Kubernetes distribution designed for production workloads in unattended, resource-constrained, remote locations or inside IoT appliances.
While it’s pitched at edge/IoT applications, its compact nature and aim towards unattended running could potentially make it useful for deploying researcher applications that need something more than a VM plus an ansible script or docker compose but a full Kubernetes would be overkill and too much management overhead.
I’ll be watching this to see how it plays out in research computing.
Flume - The Flume Project
Crepe - The Crepe Project
Alternative programming models like data flow and declarative programming are becoming more and more accessible, and each can play roles in different areas of research computing. Flume is a project that allows your team to readily easily create easy-to-use DAG data flow diagrams like workflows (“Let users code with type safety in your own visual programming language”) while Crepe provides relatively easy access to Datalog-style programming for declarative calculations in Rust.
Events
Executable Research Article (ERA): Enrich a research paper with code and data - Dr. Emmy Tsang, eLife - September 9 14:00 – 15:00 UTC
Next up in the SORSE programme, Dr. Tsang presents her work with eLife on supporting executable research articles:
We published our first demo ERA in February 2019. Over the past year, we have been working closely with our collaborator Stencila to build an open tool stack that would enable our authors and production team to easily publish ERAs at scale. In this talk, we hope to showcase the potential of ERAs with examples and walk through how authors can enrich their traditional eLife paper using Stencila Hub.
Calls for Proposals
Call for Posters: Minisymposterium on Software Productivity and Sustainability for Computational Science and Engineering - Poster Proposals due Sept 14 for event Mar 1-4 2021
Colleagues at Better Scientific Software are advertising a call work 1500 word abstracts of poster submissions for a minisymposium on software productivity and sustainability. The minisymposium is part of SIAM Computational Science Engineering 2021 which will be held in Fort Worth TX COVID-19 permitting, but remote participation will be available as well.
Random
HTTP status codes came from a protocol for submitting batch programs to computers in the early 70s, by way of FTP.
Finally - I can work with spreadsheets in the terminal and pretend I’m still a developer. sc-im is an ncurses-based spreadsheet program.
The thermodynamics of Turing machines as a fundamental connection between computation and physics.
Not new, but becoming increasingly mainstream - cgroups2 is slowly replacing the original cgroups as linux OSs get updated, which will improve the usability of “rootless containers” and container fleet management.
SDSC’s CloudBank is now operational. This commercial-cloud brokering play is an interesting experiment for research computing and I’m curious to see how it turns out.
Here’s a really slick-looking VS Code debugging visualizer for complex data structures, that works with several languages of interest to us here (javascript, go, python, java, C++, rust).
Automate postgres audit logging using triggers.
Mozilla’s financial woes and cutbacks remind us that research isn’t the only part of important digital infrastructure that has no sustainable funding model.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Highlights below; full listing available on the job board.
Head of Research Software Engineering - University of Cambridge, Cambridge UK
You will have * A MSc or PhD degree in a Computer science or related discipline or significant relevant experience. * Experience managing a highly technical team * Significant experience writing and maintaining high-performance application code. * Fluency in the one or more of the key languages commonly used in scientific computing i.e. C, C++, Fortran or Python. * Experience with the frameworks used to exploit large, modern parallel computers such as MPI, OpenMP, CUDA, OpenACC or PGAS. * Ideally, experience or knowledge in the areas of machine learning and data science.
Informatics Project Manager - CK Group, Cambridge UK
In this role, you will be part of a dynamic team in CGR’s multidisciplinary genomics research environment. Be responsible for coordinating efforts related to the Centre’s growing and evolving informatics and data analytics capabilities. Work alongside bioinformaticians, genome analysts and software scientists to ensure that project activities are aligned to deliver the expected business value within agreed timelines.
Scientific Product Manager: ELIXIR Human Genomics and Translational Data - European Molecular Biology Laboratory, Hinxton UK
The main purpose of this role is to provide coordination and support to deliver and manage key infrastructure elements and products being developed within Human Genomics and Translational Data to enable an ELIXIR-wide comprehensive approach to the management, archival, and responsible sharing of human data consented for reuse in scientific research.
Computer Engineer, Deputy Manager of the Earth Science Data and Information System Project (ESDIS) Science Operations Office (SOO - NASA Goddard Space Flight Laboratory, Greenfield MD USA
You will lead a team of engineers to develop and operation the Earth Observing System Data and Information System (EOSDIS) Distributed Active Archive Centers (DAACs).Direct and manage interdisciplinary science data systems development, design, and integration activities in a broad range of technologies. Direct and manage interdisciplinary science data systems development, design, and integration activities in a broad range of technologies. Lead ESDIS science operations engineering team and provide plans and direction to accomplish the work of the ESDIS science operations segment and ensure that the operation and maintenance of software data systems are on time and within budget.
High Performance Computing (HPC) Operations Manager - NIWA - New Zealand eScience Infrastructure, Wellington NZ
We are seeking an Operations Manager to manage the HPC infrastructure (in Wellington and Auckland) and associated services, and the team responsible for delivering these HPC services. As well as these operational management responsibilities, this position contributes to the development of the current HPC systems, informing and implementing the policies under which they operate, supporting future developments and procurement, and developing and delivering the services that underpin NIWA research, forecasting operations, and commercial services.
Custom HPC Projects Manager - HPE, Telework Various US
Manages the implementation and utilization of Global Method Project Management (GMPM) methodology. Participates in the development and implementation of worldwide, region, sub- region and/or business unit policies, practices, and systems to support the program/projects business. May provides strategic direction to maximize success in the industry or solution area. May enhances solution offering set. Identifies and sponsors the ongoing development and improvement of leading edge processes and approaches for creating and leveraging intellectual capital. Provides an environment in which the program/project business is predictable and profitable.
Senior Manager Data Scientist - GSK, Rixensart BE
As a Senior Manager Data Scientist, you will have the opportunity to ensure that data analytics and bioinformatics deliverables for a portfolio of Vaccines R&D projects are at the top of industry standards with respect to scientific excellence, quality and timelines. Overseeing and coordinating Data Science and Digital Innovation (DSDI) activities on multiple Vaccines R&D projects in partnership with Project teams and Pre-clinical laboratories. Contribute to define the Personalized Vaccinology scientific strategy, processes and objectives in alignment with the GSK Vaccines R&D strategy.