Research Computing Teams Link Roundup, 10 July 2021
Hi, everyone!
I hope you’re having a good weekend, and that the past week was good for you and your team.
I’m working (with some help) on tidying up and improving the material from the hiring and performance management series from the newsletter at the start of the year; I’d like to turn those into expanded and more coherent writeups that can help other research computing team leaders (or those interested in becoming one). Other topics that came up when last I asked for suggestions were:
- Strategy
- Product management
- Peer mentorship
- Levelling & promotion in research computing
Let me know if there are others you’d like to see, or questions or input you have - just hit reply, or email me at jonathan@researchcomputingteams.org.
And now, the roundup:
Managing Teams
Why “start bringing solutions, not just problems” doesn’t work - Lara Hogan, Wherewithall
We’ve talked often about why it’s best for you and your team members for you to avoid trying to solve problems directly for them, and instead help coach them to find a solution yourself. But how do you do that well, in a way that makes them comfortable coming to you raising issues?
Hogan has some suggestions for questions to ask:
- What could we try?
- If you could wave a magic wand, what one thing would you change?
- What, deep-down, do you truly want?
- What do you need?
- What’s in the way?
- What’s holding you back?
and suggests giving feedback on approaches if that doesn’t work. She also cautions against trying to help the team member come to a solution if they’re just venting about something; venting is natural, we all do it sometimes, it’s even good in a way if they’re comfortable venting with you, so you want to allow it but without necessarily letting it spiral out of control or happen regularly.
Hogan also includes some approaches for if your manager tells you something similar, including looking for other people to raise the issue with.
Good Meetings - Sara Drasner
- The purpose of the meeting is clear
- There’s an agenda (we’ll dive in to the complexity of this in a moment)
- There are the right people in the room. Not too many where communication is overly complicated, not too few where the people you need to move forward aren’t there.
- There’s some order. People aren’t dropping in and out, talking over each other, or being generally inconsiderate
- There’s a clear decision, outcome, and next steps at the end
Some points Drasner makes that I think are particularly useful in our context include the fact that purpose isn’t the agenda, the agenda is a tool that serves the purpose; that different kinds of meetings need to be run different ways; that too many people in the meeting is a common failure mode; and that respectful conflicts are good, and that people not saying things that need to be said are bad.
She also suggests meetings can be started and finished with sentences that sum up:
- The shared purpose
- What you’re doing about getting to that outcome
- Who is owning what
- How
- When, what are the timelines
Three Core Ideas to Make Remote Work, Work - Cate Huston
Huston has been working remote for over five years, and for those of us getting ready to continue working remote for real and without a pandemic driving it, suggests three key approaches:
- Embrace async.
- Enable autonomy.
- Build connection.
Crucially, these are all team-enhancers for in person teams, too:
- Even for team members all going to the same office, meetings can be hard to schedule, unnecessary meetings are bad, and having written documents outlining how decisions came to be are good and helpful
- Autonomous teams where individuals have significant control can respond faster and more robustly, especially where there’s documentation to support that
- Office culture helps build connection but even there being intentional about how its done makes the connections stronger
Managing Your Own Career
Research managers are essential to a healthy research culture - Nature Editorial
Ours isn’t the only somewhat undervalued research supporting profession starting to get more organized and advocate for more recognition and resources. Research managers, often found in the sponsored research office under a VPR, are also beginning more consistent advocacy work. Such managers are also a somewhat under-appreciated resource at many institutions, and can help with agreements and planning.
Research Software Development
Software for research communities - UKRI
Ouvrir la science - French Ministry of Research
Some exciting things going on in the funding and recognition of research software in the UK and in France.
In the first item, a modestly scoped initiative but with real money available, UKRI has issued a funding call (totalling £4.5m) for research software quite generally, for projects within the remit of ESPRC: funding can be used for
- development and re-engineering of existing software
- maintenance activities
- activities that widen participation in development and maintenance.
Rather than just being a one-off, this appears to be the result of extensive groundwork by the society for RSE and ESPRC, so hopefully this is the beginning of regular calls (and calls involving other councils like NERC).
In France, a more ambitious program but one so far which is only a document. The Ministry is announcing a significant push towards "Open Science" quite generally; the third of four themes concern software, in particular recommendations exist to:
- Showcase and support the dissemination under an open license of source code from publicly funded research programmes.
- Highlight the production of source code from higher education, research and innovation.
- Define and promote an open software policy.
This includes strongly recommending open source license for funded research software, producing recommendations for funding agencies to better support software development, funding open software prizes, providing greater recognition for researchers who develop software, and building a catalogue of such software.
Julia Evans, who produces consistently excellent software examples covering a wide range of topics for her blog, describes her approach to creating meaningful examples that don’t seem contrived - start with real code and simplify.
Ownership of one’s work is a good thing, but individual “ownership” of a code base in a team can get out of hand, as Britton Broderick points out. It's easy to see when it's happening, but avoiding it takes a delicate balance between taking advantage of specialization and what can evolve into gatekeeping/"hands off my stuff" behaviour.
ExCALIBUR, the UK's exascale effort, released a review on RSE (Research Software Engineering) skills for HPC, and what will be needed for training efforts in the future.
Research Data Management and Analysis
Write a time-series database engine from scratch - Ryo Nakao
You probably shouldn’t, of course, write a time-series database (or any other kind of database) from scratch. But when someone does, to meet their own very specific needs, reading about it is a great opportunity to learn from their experience. For time series databases in particular (which have a lot of applications for research - e.g. for sensor readings or system metrics), if you’re not familiar with them, this is a great way to get a sense of how they work.
Time-series databases are high-volume, append-only, time-indexed data stores where the most common query is to look up values in a window of time - and for many use cases recent windows will be queried much more often than windows further in the past. (More advanced use cases, like anomaly detection, aren’t covered in this post). Because input is typically happening somewhat regularly, and (hopefully!) data is being sampled faster than the data is changing, there are a lot of opportunities for compressing the data stream as well.
In this article, Nakao walks us through the basics of his implementation, tstorage, and how he ensured that data points arriving out of order are handled, the use of a write-ahead log for persistence, metadata handling, and a compressed encoding.
Research Computing Systems
USENIX LISA2021 Computing Performance: On the Horizon - Brendan Gregg
Gregg, who appears pretty often on the newsletter, gave a talk on his predictions for the future of computing performance - and monitoring and improving performance on new architectures (like ARM, new memory technologies like DDR5/HBM/persistent memory, accelerators, interconnects, operating system features (Linux’s kyber scheduler, io_uring), lightweight VMs and observability. The talk in both slides and video recording form are available embedded on the page.
It’s a pretty comprehensive talk for its 41 minutes running length, and probably a pretty good place to get a sense of where one pretty knowledgeable practitioner thinks things are going. Some noteworthy predictions from my point of view:
- Extra tiers of memory are coming too late to be useful as new memory and storage technologies remove the huge gap between main memory and storage devices
- BPF as a kernel “JIT” for more adaptive performance tuning
OpenZFS 2.1 is out—let’s talk about its brand-new dRAID vdevs - Jim Salter, Ars Technica
Ars’ technical articles are typically detailed deep-dives, well researched and clearly written, and this article is no exception. Salter
Back in #36 we talked about OpenZFS’s vdevs - virtual storage devices consisting of potentially multiple disks - pools of vdevs, and RAID-Z, a more dynamic version of RAID-5 (RAID with parity) except that the parity data is dynamically distributed per-write rather than in a precomputed way across disks, in a way that requires and benefits being integrated with the filesystem.
That’s quite performant with writes, and RAID-Z’s additional checksumming has the advantage of being able to detect silent data corruption; but it makes for extremely slow rebuilding (“resilvering”) of the RAID.
dRAID - integrating logical hot spare “disks” (spare blocks) and moving back to fixed raid stripe widths - has the write-time advantages of RAID-Z but much faster resilver times.
This is a complex beast - OpenZFS has been working on this since 2016 and it just came out now in OpenZFS 2.1 - and is aimed at large (~90 disk) storage systems. Salter doesn’t recommend adopting dRAIDs right away, but it’s worth keeping an eye one.
Emerging Technologies and Practices
Running GATK workflows on AWS: a user-friendly solution - Michael DeRan, Chris Friedline, Jenna Lang, Netsanet Gebremedhin, and Lee Pang, AWS for Industries Blog
The nice thing about being a late-early adopter or early-early-majority of a technology is that a lot of legwork has already been done. We’re arguably on the third generation of “running genomics pipelines in the cloud” attempts; each step is an increase in usability and often a decrease in cost, and with infrastructure-as-code, groups are generously contributing scripts to start up their entire described infrastructure from scratch at the click of a button.
AWS has a reference “run a best-practices GATK genomics pipeline using Cromwell and AWS batch” setup and I can say from experience that it’s incredibly brittle and finicky.
In this post, the authors describe (and provide code for) running short-variant and joint-genotyping workflows on AWS with Nextflow and AWS Batch. They also provide relative costing and runtime for on-demand and spot images, broken down by instance type (c5/m5/r5) and runtime storage type (Elastic Block Store and FSx for Lustre). In either case, data is staged out of S3 to the runtime storage, and then the Nextflow pipeline breaks the data up into shards and runs the GATK pipeline in containers on AWS batch, whence the data is staged back out.
Some fun results I hadn’t expected:
- FSx is pricey, but the short variant calling workload is compute-time bound (in terms of dollar cost), so the increase in speed provided by FSx dramatically reduces cost for on-demand instances (though its a wash for spot instances). For joint genotyping, it’s less compute bound so the pricier FSx shows up in the cost.
- Even though it’s compute bound, the instance types make less of an effect than I had expected, especially for spot instances
IoT for Beginners - A Curriculum - Jen Fox, Jen Looper, Jim Bennett, Nitya Narasimhan, Microsoft Learn
As readers will know, I think IoT infrastructure will eventually be useful for certain research applications, letting researchers collect sensor data more easily, leaving fewer components that have to be created from scratch by the project’s team members.
Here, Fox et al have a 12-week, 24-lesson course on IoT involving six different projects. You can buy ~$100USD hardware kits to implement the hardware yourself as you follow along, or use provided “virtual hardware” on your regular development machine.
Calls for Submissions
Workshop on Programming and Performance Visualization Tools (ProTools 21), part of SC21, - Submissions due 20 Aug
Some SC21 workshops still coming together:
The Workshop on Programming and Performance Visualization Tools (ProTools) intends to bring together HPC application developers, tool developers, and researchers from the visualization, performance, and program analysis fields for an exchange of new approaches to assist developers in analyzing, understanding, and optimizing programs for extreme-scale platforms.
Events: Conferences, Training
Supercomputing Frontiers Europe 2021, 19-23 July, Free
A pretty wide range of HPC topics from the traditional (weather and materials modelling) to self-service HPC and edge computing.
EuroPython 2021 Trainings - 26 & 27 July, €195 as part of EuroPython 2021
Several interesting sessions here, including Climate data analysis with xarray and cartopy, data analysis with pandas, introduction to property-based testing, and python packaging demystified.
Strange Loop - 30 Sept - 2 Oct, In person in St Louis but $175 for virtual attendees
Strange Loop is a widely known eclectic conference that covers software development pretty broadly including programming languages, databases, distributed systems, machine learning, security, observability, and other topics. It’s never been something I could justify going to in person, but at $175 for virtual sessions may be of interest.
Random
A hello world-type deployment written in kubernetes yaml which is also hello world in AT&T syntax x86 assembler, somehow.
A counterpoint to some copyright and licensing concerns about GitHub Copilot from last issue - if copyright really prohibited Copilot it would also prohibit a lot of things we don’t want prohibited (like data mining the medical literature).
Tuplex is a Dask or Spark type Python based data science platform that does JIT compiling to optimized LLVM byte code(!). The paper in SIGMOD’21 looks pretty interesting.
It turns out that git rebase —onto is surprisingly powerful.
An awk script for generating diagrams visualizing iptables chains.
In praise of the underappreciated architecture which greatly influenced both DOS/Windows and UNIX systems - the Vax.
Astro, a static website generator that feels like writing a modern node-based webpage.
SQL is ubiquitous, powerful, and has seen off dozens if not hundreds of would-be SQL replacements; and yet it’s unambiguously not a great programming language. Jamie Brandon gives the case against SQL.
You’ve almost certainly seen the instructions for using VQGAN+CLIP to generate images on Google Colab using a notebook by Katherine Crowson. Below are results for a couple of variations om “research computing team”. I will refrain from making either the logo for the newsletter.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 177 jobs is, as ever, available on the job board.
Lead Research Software Developer - McMaster University, Hamilton ON CA
We are seeking candidates for a Lead Research Software Developer to provide leadership for research software projects supporting McMaster’s research enterprise and research community. As a part of Research & High-Performance Computing Support (RHPCS), you will work with a team to implement software projects related to McMaster’s research support infrastructure, including research information and facility management systems (McMaster Experts, MacLIMS) and various administrative databases. The ideal candidate will have extensive software development experience acquired in both small and large-scale, complex projects, and will have progressive experience leading a team in a research or academic environment.
Unicellular sequencing bioinformatics analysis platform manager - Centre de recherche du CHU Sainte-Justine, Montreal QC CA
The single-cell sequencing bioinformatics platform is a new initiative dedicated to paediatric haemato-oncology to meet the needs of researchers and to develop strong local expertise. The candidate must be able to carry out different analyses appropriate for the most common approaches in single cell genomics, and whole exome, transcriptome, and genome sequencing. They must have excellent communication skills, help train team members, and participate in the scientific life of the research centre. They will manage and co-ordinate bioinformatics work on the platform.
Project Manager, Advanced Research Computing - Simon Fraser University, Burnaby BC CA
The Project Manager oversees a variety of national initiatives supporting research including the procurement and implementation of four new Advanced Research Computing (ARC) systems at four institutions across Canada. The Project Manager leads project teams at the four institutions in the development of operational partnerships for computation and data storage. The Project Manager works closely with Compute Canada's Executive staff and the various Project Leads at each of the institutions, and with partner organizations, to support and coordinate various ARC and cyberinfrastructure activities. The incumbent directs and coordinates cross-functional project teams and manages inter-project dependencies and communications. The Project Manager provides regular status reports to Compute Canada's CTO and Project Leads and escalates issues as required.
Director, Institute of Scientific Information (ISI) - Clarivate, London UK
Lead a team of 5-10 data scientists, data analysts and product specialists. Carry out original research that showcases the value of Web of Science data, demonstrates novel use-cases, and provides advice regarding use and interpretation of our analytical data. Research topics include data categorization, field-normalization, fractional counting, novelty detection, subject diversity, interdisciplinarity, visualization, research assessment, and research integrity. Maintain active engagement with the research community: attend and present at conferences, publish original research articles in scholarly journals, brief internal stakeholders on significant findings, and monitor the literature for mentions of Clarivate and competitor datasets.
Senior Research Software Engineer - Oak Ridge National Laboratory, Oak Ridge TN USA
ur diverse capabilities span scientific and engineering disciplines, enabling the Laboratory to explore fundamental science challenges and to carry out the research needed to accelerate the delivery of solutions to the marketplace.
We invite applications for the position of Senior Research Software Engineer in the Software Engineering Group within the Computer Science and Mathematics Division. The Software Engineering group focuses on engineering the next generation of high-quality scientific software. Our group innovates and inspires the next generation of cutting-edge scientific software, thus enabling Oak Ridge National Laboratory (ORNL) to host the world’s premier scientific software engineering group and transform science with software-defined solutions that are reliable, usable, and trustworthy.
Director of Institutional Research and Application Development (IRAD) - University of Pennsylvania, Philadelphia PA USA
The Director manages a team of senior application developers, data analysts, and project managers to meet all of School of Arts and Sciences (SAS)’ data and application needs in support of the academic mission and the efficient administration of the school. The team’s work spans everything from reporting and visualization to integrations with numerous third-party platforms to custom application development. The IRAD team must regularly partner with the SAS Dean’s office and executive leadership in every division to advance data-driven decision making and create or integrate new technologies. Given the high demand for IRAD’s work, the Director must work with the CIO and Dean’s office to triage and establish priorities, making the most effective use of the team’s expertise while ensuring a manageable workload is maintained.
Director, Data Engineering and Data Platform - University of Toronto, Toronto ON CA
Under the general guidance of the Executive Director of the Institutional Research and Data Governance Office, the Director, Data Engineering and Data Platform contributes to the University’s institutional data strategy by advancing its practices, processes and technologies to meet the evolving data management and analytics needs of the institution. The Director is responsible for establishing and delivering a data and analytics platform roadmap that will migrate the University from its on-premises solution for data warehousing to a cloud-based platform. In collaboration with key cross-functional partners, the Director will be responsible for designing, managing, and modernizing the University’s data and analytics platform and tools that will serve divisions across the institution.
Senior Engineer - HPC Infrastructure - Genentech, Remote CA USA
As a Senior Engineer in High Performance Computing (HPC) Infrastructure, you are a deeply skilled Infrastructure Engineer who thrives in solving a variety of complex business problems through practical and ingenious application of technologies. They are senior T-shape engineers. They have in-depth knowledge in at least two technology spaces within their infrastructure technology area and a working knowledge in a wide range of infrastructure technologies. They have a detailed understanding of how the IT infrastructure in their areas of responsibility impacts respective Roche business processes. They contribute to and technically lead challenging projects which require deep technical knowledge and infrastructure engineering skills.