Research Computing Teams Link Roundup, 7 May 2021
Hi, all:
I hope you’re enjoying the change of seasons. The end of April here brings an onslaught of deadlines, meetings, and events, but things are settling back down now.
Details of the basecamp fiasco continue to come out. As the raw signal folks point out, discomfort as a leader isn’t inherently bad or a signal of a problem. It’s not our lot as leaders to be comfortable about everything. Listen to your team, even - especially! - when you don’t love what you’re hearing.
As always, let me know if you have questions, suggestions, or any other feedback - I always like to hear from you.
On to the roundup!
Managing Teams
Three Feedback Models - Jacob Kaplan-Moss
A quick overview and comparison of three feedback models, similar to what we covered when we were talking about performance communications in , but includes one I had forgotten, Lara Hogan’s Feedback Equation:
Lara Hogan, an author, public speaker, and coach for managers, has a Feedback Equation that is quite simple:
Observation of behavior, e.g. “In your review of Jane’s pull request, you gave her clear advice on test coverage…” Impact of the behavior, e.g. “… this helped her improve her code, which helps with our team’s goal of better-tested code.” Question or Request, e.g. “thank you; more like that!” (for positive feedback) or “in the future, can you … instead?” (for negative feedback).
This one is a lot like Centre for Creative Leadership’s Situation-Behaviour-Impact, but with the question at the end.
There’s a lot of data to support the Manager-Tools model, but if Hogan’s or the SBI model are easier for you to start with, they are perfectly decent approaches, and much better than not giving feedback. The most important thing is to give lots of feedback, mostly positive, promptly while focusing on behaviour and impact. Everything else can be tweaked over time.
Managing Your Own Career
What Good Leaders Do When Replacing Bad Leaders - Andrew Blum
At some point in your career, you’re going to step into a role as a leader where the previous leader made a hash of it. They weren’t necessarily a bad person or incompetent, but for whatever reason what they were doing wasn’t working. Blum talks about how to manage that transition.
A key point for me is an early sentence:
Good leaders create a separation between the past and the future.
Creating that rupture between “that was then” and “now we’re moving forward” is key. People want things to work well but it’s pretty easy to stay caught up in the dysfunction of what was happening before. Blum outlines three steps:
- Acknowledge the contributions of the previous leader
- Enable a vision for the future
- What about how we have worked and operated do we want to maintain?
- What do we want to leave behind?
- What do we want to create anew? and
- Seek to understand your employees’ experiences [LJD: one-on-ones are great forums for that!]
Product Management and Working with Research Communities
The case of the 50ms Request - Julia Evans
Notes on building debugging puzzles - Julia Evans
We do a lot of excellent training in our community, but as a community when it comes to creating online trainings we’re not super creative - the tendancy is just to put a bunch of videos, slides, and text on a website. Even going as far as to add MOOC-like solution checking for homework is relatively rare.
While not suitable to every kind of training, this is a fun different approach; “The Case of the 50ms Request” is an online debugging puzzle in the form of a Choose-Your-Own-Adventure book, and Evans writes about putting together such a puzzle (with more planned) in “Notes on building debugging puzzles.”
Research Software Development
RSE Administration Tool - RSE-Sheffield
The Research Software Engineering Sheffield team has released their project management and time tracking tool, RSEAdmin - documentation can be read here, and there’s a test instance running you can log in to. It tracks proposed projects through acceptance and funding stages, then tracks staff time on the projects.
Having a Healthy Pull Request Process for Teams - Alex Kitchens
This is a longer read on setting up a pull request process, both for the authors of the PR and the reviewers. Other processes could be healthy too, but any healthy process will have clear and explicit expectations.
Kitchens spells out the responsibilities for an author - they fall under making PRs easier to review:
- Make the PR Description Clear and Digestible
- Explain Unexpected Changes
- Keep the Size of Pull Requests Small (When Possible)
- Make Sure the Team Knows the Problem Before Opening a PR to Address It
- Listen to and Take in the Feedback Received on PRs
and for reviewers, under making the process easier for and thus unblocking the author:
- Give an Explicit Approval/Disapproval with Explicit Reasons
- When Not Approving, Distinguish Which Comments are Blocking
- Express full thoughts rather than shortening sentences
- Know When to Reach Out
There are good sidebars there on some possible approaches to “chunking” larger PRs, and examples of what is discussed.
Kitchens suggestion for explicit blocking or nonblocking comments, sounds sort of like Netlify’s boulder-to-pebble spectrum for comments we discussed way back in #15.
Testability = Modularity - Massimo Nazaria
A good reminder that, whether within a code base or across a larger architecture, a good guide towards what is a well-decomposed modular architecture is how testable it is. This also suggests a larger and somewhat deeper point - modularity isn’t some intrinsic property of a set of concepts, but instead depends on how you are going to use combinations of the underlying functionality.
Research Data Management and Analysis
First Steps #5: Pluto.jl - Josh Day, Julia for Data Science
A very quick overview of Pluto.jl, a reactive notebook for Julia. We mentioned Pluto in #48, and it’s continued to mature. It avoids one of the two big problems of Jupyter, hidden state in the order of execution of cells, by being reactive - update a variable in one place and it propagates to the other cells, like a spreadsheet. The maybe larger issue of notebooks (as opposed to environments like RStudio) remains - that snippets are written in an environment with no real off-ramp into version-controlled, tested, productionizable code other than copying-and-pasting relevant bits. For things like that I’m keeping an eye on the development of VSCode for Juypter notebooks.
UNION ALL, pt II - polymorphic data - Alexey Makhotkin
Makhotkin has a new this year newsletter on relational data modelling; he started with a long series on a very simple problem, expressing either/or data (“Do you have any symptoms of COVID-19 from the list below? If no, tick here: [ ], If yes, mark all that apply: [ ] cold symptoms; [ ] coughing; [ ] elevated temperature”) and described modelling the problem in JSON, the inevitable extensions, the storage considerations, and resulting migrations.
In this series he takes a similarly simple construct, the UNION operator, talks in part I about how UNION ALL can be much faster than UNION, and in this part connects it to object oriented-style polymorphism, expressing four narrow table “subclasses” into a single view.
Hosting SQLite databases on Github Pages - phiresky
Really cool JS and WASM work - hosting a (SQLite) database-backed web page completely statically by compiling SQLite to wasm, then implementing a virtual file system to access the statically hosted sqlite file.
Not only is this useful for making trivially hosted web applications for data analysis, it also opens up quite a number of options for creating online tutorials for SQL and data analysis (or visualization).
Research Computing Systems
Balancing Performance, Capacity, and Budget for AI Training - Timothy Prickett Morgan, The Next Platform
Using Lambda Lab’s GPU cluster systems as a launching point, Morgan goes through the current NVIDIA DL/HPC product catalogue of Ampere GPUs and suggests that the A6000 is likely a good compromise for many use cases between capacity (especially on-GPU memory), performance, and budget, and may be significantly cheaper than the Voltas that the cloud providers have previously stocked up on. Obviously the details of particular workloads matters, and some may need the 80GB A100.
This is a useful article for those making immediate-term buying decisions, but it’s a bit infurating that Morgan needed to do so much tea leaf-reading for this, particularly for prices. Unfortunately that’s where we are right now.
Manageable On-Call for Companies without Money Printers - Utsav Shah, Software at Scale
A lot of information out there about running on-call, or more advanced practices like SRE, assume that you’re a large organization with 24/7 uptime targets. These can apply to research computing, but more often don’t.
Teams sometimes respond to the inability to have 5-nines uptime support and 24/7 oncall with a shrug and just keep things vague; “will respond promptly during working hours, with best-effort responses outside of those times”. But that lack of clarity isn’t great for either staff, who don’t know what’s expected of them, or researchers, who don’t know what they can and can’t count on. It’s also a missed opportunity to lay out priorities and define successes. How can you tell if you’re doing well if there’s no goal to reach?
Shaw suggests laying out managable service level objectives (SLOs - internal measures) and slightly less stringent commitments (SLAs - agreements, external measures), and designing on-call processes and policies about that, iterating on the process as needed. For some kinds of research computing, especially availability of compute, SLAs can be quite low and still keep the researchers happy; data availability is likely somewhat higher. Choosing some target services, a manageable SLA, and fixing a manageable on-call process consistent with that will help researchers and staff know what to expect and make priorities explicit.
Backblaze Drive Stats for Q1 2021 - Andy Klein, Backblaze
The quarterly update for drive failures by model backblaze does is always interesting. In this update, they are starting to report overall SSD failure rates along with HDDs; they’re relatively new (2 years) to using SSDs, and its only now that the statistics are starting to be good enough to report. There isn’t a lot of overlap yet in years of service between their SSD fleet and their HDDs, so their results may change, but so far they’re seeing about a 20x different in annual failure rate between HDDs and SSDs used for storage, and 10x when used as boot drives. I guess I’m impressed that the difference is that small; they also report that the AFRs for hard drives continues to fall, pointing to significant increase in reliability of ol’ spinning rust.
Emerging Data & Infrastructure Tools
Object Storage Makes a Push into HPC - Jeffrey Burt, The Next Platform
What HPC POSIX-like file system makes you happiest? Lustre? BeeGFS? GPFS? Isilon? Hahaha of course I was joking. POSIX file systems have lovely semantics for local systems, but those same semantics make it extremely difficult to scale well, resulting in systems that are either ruinously expensive (in $ or in people time), very brittle, or both. Many (most?) HPC use cases don’t inherently need POSIX APIs for running, and none require them for warm- or cold- storage.
Burt gives an overview of Cloudian, a ten-year old company working to make an S3-compatible object storage system attractive for HPC and high performance data analysis, using the same approaches (flash, tiered storage, improved networking) as the parallel posix file systems are using.
Making the Internet more secure one signed container at a time - Priya Wadhwa, Jake Sanders, Google Security Blog
As dependencies multiply and there are so many ways to pull down packages, the security of the “software supply chain” - making sure that the software you’re using is the software it’s supposed to be- is increasingly important
Sigstore is a new Linux Foundation effort to provide verifiable signatures from authoritative sources for a lot of kinds of builds:
We are initially targeting generic release artifacts such as tarballs, compiled binaries and container images. Later we will explore other formats (such as jars) and manifest signing, such as SBOM etc. We also open to collaborate with package managers and ease the adoption of signing for their communities.
Google is building on this with cosign for signing and verifying the signature of container images, and incorporating that into CI but also CD workflows.
Calls for Submissions
Prace Autumn School 2021 - 11-15 October 2021, In Person, Finland, Applications due 9 Aug
For EU participants. There are three tracks one can apply for:
- Introduction to deep learning
- Introduction to GPU programming with OpenMP
- Hackathon: Optimizing GPU applications on LUMI supercomputer.
From the site:
The course consists of lectures and hands-on training on modern, GPU-accelerated high-performance computing: GPU programming and GPU code optimization at scale, as well as understanding and applying machine learning methods. The course includes also scientific case studies about using GPUs in various disciplines. The tutors and lecturers are experts in these fields providing various interesting aspects in the course topics.
Events: Conferences, Training
Automated Fortran–C++ Bindings for Large-Scale Scientific Applications - 12 May, Online, Free
The monthly Best Practices for HPC Software Developers talk series continues, with a look at using SWIG-Fortran to build modern, MPI-aware C++ interfaces to Fortran code.
Random
Formal methods, SAT solvers, etc aren’t widely used in research computing even though they’re useful for making sure all cases of an algorithm are covered or whittling down combinations. Here’s a collab notebook for one such package, Z3, with Python.
A cheat sheet for the still newish Go modules.
Small but huge thing - on Github, absent conflicts, you can now fast-forward commits from an upstream repo into your fork with one click from the web UI.
A summary of and map to a massive blog series on C++ coroutines.
Berkshire Hathaway stock prices, in the usual units of 1/100 of a cent, (almost) exceeded the size of an unsigned 32-bit integer and so broke the stockmarket.
EdgteDB, a Postgres-based database that avoids SQL in favour of ReST and GraphQL APIs, with schema migrations built in from the start.
SQLite running directly on an object store (Ceph).
HATETRIS - tetris that always gives you the worst possible piece. In case management isn’t enough frustration for you.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 148 jobs is, as ever, available on the job board.
Principal Clinical Data Scientist Lead - PRA Health Sciences, US or CA
Leads end-to-end data review activities performed on a clinical trial. Accountable for achieving clinical data management deliverables
on-time, with high quality and to agreed financial metrics. Responsible for applying advanced analytics to centrally aggregate and
analyze data from disparate sources to identify risks and issues impacting data integrity, patient safety and/or regulatory compliance. Triages and assigns data review findings to the appropriate project team role for follow-up and resolution. Communicates trending analyses and a summary of findings to internal and external stakeholders to support the on-time delivery of data fit for analysis.
Research Project Manager 1 - Bioinformatics - BC Cancer Agency, Vancouver BC CA
Establish detailed research project charter, plans and objectives to outline timelines and project deliverables. Execute project plan according to project methodologies, ensures successful and coordinated completion of project components, consult with stakeholders as needed and ensure readiness for project implementation.
The research focus of this posting is in translational research and clinical trials for gastrointestinal cancers, with a particular focus on colorectal cancer. Analysis led by the successful candidate will support development of future clinical trials.
Senior Software Project Manager- R&D - Sciex, Vaughan ON CA
You will play a meaningful role in driving software product development efforts spanning our software platform and vertical applications for our scientific analytical product lines. Our software projects follow Agile development methodologies and build products for MS-Windows and cloud-based systems. Some projects also integrate with hardware projects developing our Mass Spectrometer instruments and peripherals. The individual will provide oversight to a multi-functional team of software, verification, validation, user experience and technical writing professionals, and report to the R&D Project Management Office.
Biostatistician Manager - Emmes, Anywhere CA
The Biostatistician Manager oversees statistical activities and deliverables across a project or platform of related clinical research studies, leads a team of Statisticians and SAS Programmers and ensures that statistical deliverables are completed in a timely manner and with high quality. Serves as a lead statistician on multiple clinical research studies, from the initial design stage all the way to final report writing and manuscript preparation.
Lead Data Scientist - Storm4, Toronto ON CA
Are you a Toronto based Data Scientist looking for their next role as Lead Data Scientist for a growing Greentech business focussed on building a better energy future. Lead the development of machine learning models, such as energy demand prediction, to provide insights and automation for our customers. Lead analytics projects to optimize the accuracy of predictions. Implement cloud-native data science processes that are secure, reliable and scalable by design
Senior Manager, Research Computing - Nuance, Burlington MA USA
The Sr. Manager will manage and lead the R&D Research Compute Engineering team, collaborate with corporate leadership, Research and Engineering leads, Security, and IT. You will take ownership of the machine learning (#ML) and data science environment, the design, and deployment of the high-performance infrastructure, with consideration of future technological needs and trends. You will directly spearhead and shape the environment that is utilized for the development of latest AI technology and Nuance’s Conversational AI, such as Dragon Ambient Experience (DAX), Agent AI and many more.
Chercheur Scientifique Senior - Senior Research Scientist - Nuance, Montreal QC CA
This Senior Research Scientist position is opened within the Core Technology Engines group, which is responsible for the development of AI engines within Nuance. The person will be part of an energetic team developing cutting-edge multi-platform Natural Language Processing technology that powers our Virtual Assistant-oriented applications. She/He will work with interdisciplinary teams of scientists, engineers and architects to push the boundaries of machine learning technology for NLP in areas including algorithms, MLOps and performance optimization.
Applied Research Manager, Computer Vision - NVIDIA, Santa Clara CA USA
NVIDIA is searching for outstanding applied research manager to spearhead our Computer Vision efforts. Computer Vision is a core component of NVIDIAs platforms from Drive AV to Isaac autonomous Robotics to Clara for Healthcare to Maxine streaming and broadcast and beyond. The ideal candidate is an established thought leader in academic or industry with deep experience in vision, human understanding, publishing and presenting at industry and academic conferences, and hands-on implementation. This role requires high-level knowledge of multiple areas within production workflow and tools, Deep Learning, network optimization, and vision systems. Come join a diverse applied research group that works on hard and meaningful problems; that consistently publishes at top venues in graphics, computer vision, and artificial intelligence; and that values real impact on NVIDIA products and the industry at large.
Team Lead, Research Software Development - Simon Fraser University, Burnaby BC CA
The Lead, Research Software Development will lead a lean team of software developers and program analysts responsible for supporting the development, implementation, and evaluation of research computing applications at Simon Fraser University. This team, comprised of technical specialists, will work closely with individual researchers and research groups to design and implement solutions to specific computing needs while also providing general education and outreach to researchers from all disciplines across the University about the tools available to support and enhance their research activities.