Research Computing Teams - Performance Communication is Expectations Communication, and Link Roundup, 12 March 2021
Hi, everyone -
Here in Toronto we’re enjoying the traditional season of “False Spring”, where a burst of warm weather and blue skies lures the unwary into putting away winter coats - but even so it’s cheering and encouraging, and spirits everywhere are notably lifted.
I’m going to write a little bit about performance management in the next three newsletters. I have a brief introduction to my preferred philosophy of performance management below (or skip to the roundup); in the next issue we’ll talk a bit about short-term performance management (feedback) in the following we’ll talk about longer-term performance (goal-setting and review), and finally we’ll talk about what happens if people are or aren’t consistently meeting expectations.
When talking about performance management, especially with new managers, I generally prefer to avoid the word performance as much as possible at the beginning, instead talking about communicating expectations. I find this clarifying for a number of reasons:
- It avoids unproductive framings such as worrying that these conversations is about blaming the employee for “doing a poor job” or praising them for “doing a good job”. “Good” and “bad” are highly charged words that nonetheless don’t really mean anything absent context. Efficiently and capably doing beautiful work on the wrong things is a performance problem.
- It takes the focus away from the past and what was or wasn’t done in that now-immutable time, and towards expectations of the future.
- It connects quite different things - immediate short-term nudges on tasks and long-term goal setting - into a framework of talking about expectations.
- It provides a framework for thinking about accountability conversations between peers.
- It places the burden, properly, on we managers for explicitly communicating expectations, and for having reasonable expectations in the first place.
- It correctly equates not giving feedback or having other performance conversations to withholding knowledge of your expectations from the team member.
The thing is, almost all of the team members you will ever work with want to meet your expectations, and are capable or can become capable of meeting them.
New team members, or team members new to particular responsibilities, won’t know what those expectations are. Even experienced team members who do know all the relevant expectations might not know which of several competing expectations are the most important in some particular situations.
As a manager you certainly communicate expectations when assigning goal - but you won’t be able to list every single relevant expectations for every single task every single time someone does something. Those expectations are still there, though, and they need to be consistently met over time. So in addition to taking about expectations before hand it’s important to be able to talk about them afterwards too. In both cases, it’s about guiding future behaviour.
Short-term feedback, and long-term goal-setting and review, are fundamentally about describing your expectations, and recognizing team members efforts as meeting or not meeting those expectations. The resulting conversations are about helping your team members meet your expectations in the future. In some cases - especially as you find yourself working in new areas - it will also be in part about recalibrating your expectations.
Next time we’ll talk about a couple of common formulations for having those conversations around immediate and short-term expectations - giving feedback.
Managing Teams
Thoughts on Engineering Management - Ben McCormick
McCormick writes a series of five articles on technical computing management. The broad topics won’t be a huge surprise to readers, but they’re good reads in that they’re all in a very coherent voice and philosophy:
- What EMs do - focussing on facilitating information flow, driving progress forward, and building a sustainably productive team - and that all of those are time consuming
- Choosing what to work on - which does a good job of separating the focus from the “rhythms” of certain kinds of routine but important work like one-on-ones, sprint rituals, etc that keep things running smoothly, and the prioritizing the rest on high-leverage work that has the most impact. “Stay Sane, Keep Your Integrity, and Remember That People Matter The Most” is probably as good a single line of management advice as any I’ve heard;
- Handling Accountability as a manager, which in our first managerial jobs is very different than the kinds of accountabilities we’re used to
- How to delegate - and
- How to give feedback - emphasizing candor and the “situation, behaviour, impact” model.
Does it KLAPS? An OKR Rubric - Debra Roberts
You’ve almost certainly heard of SMART goals - a rubric for defining good goals for work. They’re variously defined as some combination of Specific, Measurable, Achievable/Acceptable/Assignable, Relevant/Realistic and Time-based. The fact that A and R’s meanings change from telling to telling suggest, correctly, that they’re not essential. The important thing is - is the goal specific enough that a third party could assess whether it’s met or not? Can you actually measure it, and if so, what’s the metric? And is there a deadline? A goal without a deadline and a way of checking to see if you made it is merely an aspiration.
OKRs - Objectives and Key Results - are like goals for organizations; and they cascade down, to keep sizable organizations aligned and all moving in the same direction. There’s an clearly written (Specific) Objective for the quarter or year (Time-based) with three or four quantitative (Measurable) Key Results that let the organization know if they’re getting there. Higher-ups set the OKR for the organization, and then teams below set their own OKRs so as to support the OKRs above, and so on.
Roberts suggests a rubric for OKRs similar to SMART for goals. Many of us work in teams that are small enough we don’t need OKRs for alignment, but much of what Roberts suggests is useful even for team-based metrics, which we’ve talked about before. Roberts’ rubric is:
- Knowable - launching a new OKR with a newly created metric or objective that no one’s seen or used before will cause confusion, not clarity
- Leading - is this a leading indicator? Does it show us where we’re going, or where we’ve been?
- Autonomous - is it fully within the control of our team, or is it something that others can block our progress on?
- Progressive - can we see our progress, or is it a binary where it’s no until it finally becomes yes?
- Scoped - is it a tightly scoped objective/metric?
Even just for metrics for individual teams, knowable, autonomous, and progressive are important; leading is good too although lagging has its place.
Product Management and Working With Research Communities
Digital skills for FAIR and open science: Report from the EOSC Skills and Training Working Group - EU Secretariat
For years, the focus for digital research infrastructure was on stuff, not people. Stuff - computing, storage - definitely is needed, but none of matters without the teams to operate them, write the sofware, to train and support the researchers, to curate the data, etc.
The RSE (Research Software Engineer) movement has successfully raised the profile of software developers and the need for hiring and training them, and for them to have meaningful career ladders; and that’s all to the good. But a recurring theme of the newsletter is that focussing on one aspect of research computing and data while omitting the others is a mistake. The various sub-disciplines - sofware, systems, data management - are interlocking and mutually reinforcing, while boundaries between the disciplines are blurring if they were ever sharp at all.
This EOSC report looks at skills and training needed by staff, trainees, and researchers for the European Open Source Cloud as a whole, with a focus on FAIR and Open science in particular. It takes a look at the set of skills needed in exactly that interlocking and mutually reinforcing way, and involving a wide number of different roles and actors. While I don’t think the recommendations made would surprise most readers, it’s nice to see the framework of actors (the figure above) explicitly laid out.
Research Software Development
Rust vs. Go: Why They’re Better Together - Jonathan Turner and Steve Francia, The New Stack
Rust and Go are both relatively new languages that are being increasingly adopted by the research software development community. As is often the case in tech, there’s a lot of unproductive “X is better than Y” bickering among adherents.
In this article, a member of the rust team (Turner) and of the go team (Francia) come together and make it clear that the core teams of the two languages do not see each other as opponents, but rather complementary. They interviewed three companies (and others) who use both languages, and use quotes to indicate their points. This is a fun approach to such a blog post and one I haven’t seen before.
Turner and Francia point out that both languages are:
- Well suited to large teams and code review, with modern tooling including formatting and linters,
- Possessed of package managers,
- Type-safe,
- Concurrent,
- Portable,
- Performant, but in different ways - Go is pretty fast out of the box, but what you get is what you get, while Rust gives you the fine-grained control to get blazing speed, and
- Memory-safe, but in different ways - garbage collection vs borrow-checking.
While the big differences are around learnability and fast iterations (go) vs fine-grained control, and rewarding long iterations with high performance (rust).
I’ve never really understood the rust vs golang arguments - they’re clearly aimed at disjoint use cases. Go is going to carve out a big swath of what had been Java’s domain - enterprise-type development, web services, or distributed systems (using static binaries rather than jar’s and JVM), having both garbage collection and opinionated build and dependency management setups - but it does all that with a nice small language and strong typing.
Rust is going to do real damage to C++’s “market share”, gaining adoption in systems programming or the building of individual tools where it makes sense to lovingly hand-craft the fastest code humanly possible - but will address those target uses with memory safety and without being weighted by 40 years of backwards-compatible baggage.
Both use cases are relevant in research computing. Both languages have enormous advantages over what they’ll end up displacing. But I have a really hard time imagining any intended application for which both languages would simultaneously be obvious contenders.
Accelerated Signal Processing with cuSignal - Adam Thompson, NVIDIA Developer’s Blog
It takes about 1/3 of the article before we learn the big news:
> cuSignal GPU-accelerates the popular SciPy Signal library with CuPy (GPU-accelerated NumPy) and custom Numba CUDA kernels.
Obviously there have been CUDA packages relevant to signal processing for a while - for FFTs, sparse solvers, etc - but this allows researchers to, with some care, use scipy signal as an interface to code to and get GPU acceleration. And by using CuPy and Numba it allows zero-copy interfacing to tools like PyTorch for machine learning.
Research Computing Systems
Why Dropbox’s Exascale Strategy is Long-Term, On-Prem, Disk - Nicole Hemsoth
Hemsoth writes another well-reported, interesting article - this time on Dropbox’s storage needs.
A lot - way too many - articles get written about compute and on-prem vs commercial cloud, but the much more interesting and nuanced discussions are around data and storage. Dropbox was an early adopter of AWS S3, but at its scale and for its use case, it’s sticking with on-premises:
> “Cost isn’t number one, but it’s in the top three—and there are three things we look at when we make a decision about hardware. Is it more cost efficient to stay on prem or use the public cloud? We are operating at scale where customization is cost-efficient. And also, can we innovative faster than the public cloud? And further, is our workload unique compared to what the public cloud supports?”
Although I never really thought of it before this article, it makes a lot of sense. Dropbox is large enough that it is a hyperscaler itself of a (niche) sort - so buying servces from another hyperscaler doesn’t really make business sense for them. Having said that, Dropbox isn’t dogmatic about it - for some specialized workflows and for some international locations they do use commercial cloud services.
Maybe even more more interesting is that even though performance is important to the company, it is not moving to Flash or even SSDs in any big way; they see current and upcoming improvements in good-ol’ disk as being enough for their needs. Hemsoth has a nice summary (all new material to me) of emerging disk technologies looking out the next four years or so that makes Dropbox pretty happy with staying on spinning disk.
Practiced Humility in Retrospectives - Will Gallego
A nice article from a seasoned practitioner on retrospectives after an incident, and the importance of humility in running them.
One of the first humility lessons for Gallego was that he saw and ran successful retrospectives at Etsy - but when he went to another organization with different goals and culture it didn’t work. Other lessons included:
- Retrospectives shouldn’t be top down
- They should prioritize learning before fixing, and that means saying “I don’t know”
- Reduce use of hindsight
- Review what is assumed to be true
- All participants should be equal, and
- The “right” approach changes.
Emerging Data & Infrastructure Tools
Goodbye, Minikube - Nicolas Fränkel
Using Podman and Docker Compose - Brent Baude
It’s a great sign the container-orchestration “market” is getting increasingly mature when people are writing “Goodbye, X”-type posts about early small-k8s distributions, or when alternatives to simpler solutions like docker + docker-compose come about.
Fränkel’s post talks about leaving minkube for the newer kind. Long-term readers will know that I’m pretty optimistic about various small kubernetii for at-the-edge type installations for some research computing applications - say complex applications that involve collecting and processing sensor data before sending it on. So news that that ecosystem is strengthening is good news.
In rootless-container news, Redhat’s podman now supports docker-compose, allowing for orchestration of these rootless containers - something that’s been lacking (at least with any robustness) with the singularity/podman type solutions. Since you still need root to run docker compose, wire up networks, etc, I’m not sure what the use case here is, though? Anyone know?
Calls for Proposals
PRACE-ICEI CALLS FOR PROPOSALS - CALL #5 - Submission Deadline 16 April
PRACE’s call for proposals for resources on the EU federated Fenics infrastructure is out. EU researchers who need the following are eligible to apply:
- Scalable computing services
- Interactive computing services
- Virtual Machine services
- Archival data repository
- Active data repository
Events: Conferences, Training
On the evaluation of research software: the CDUR procedure - Dr. Teresa Gomez-Diaz, Prof. Tomas Recio, SORSE talk, 16 Mar 14:00UTC, Free
How do we evaluate if a research software development effort was successful? The authors will describe their method, looking at Citation, Dissemination, Use (usability), and (impact on) Research.
Building Research Software Communities: How to increase engagement in your community - Jeremy Cohen , Dr. Michelle Barker, Daniel Nüst, SORSE talk, 17 March 9-1 UTC, repeated at 18 March at the same time, free
A workshop on “building, developing, and sustaining research software communities”.
International Workshop on Performance Analysis of Machine Learning Systems - 28 March
One full day of meetings (9-5 Eastern US) on performance analsyis of AI/ML systems - covering workload characterization, acceleration of compute, memory, I/O and network, co-design, ML systems optimized for specific domain, and more.
Random
Does a trainee or team member need to learn Git? Introduce them to Oh my Git!, the online strategy card came.
Continuing a recent recurring theme of the details of linkers and loaders - how to execute an object file if that’s all you’ve got.
We teach people to write code when they’re starting out but reading code is at least as important and scanadalously under-emphasized.
Internet Archive Scholar indexes 25 million open-access research articles and other scholarly items.
Advice on using conda in docker containers.
Netflix’s new workflow+serverless platform for e.g. video transcoding, Cosmos. This sort of platform could easily be useful for large scale production deployment of data analysis workflows as in genomics.
Has anyone played with dapr, which just reached 1.0, yet? It’s Microsoft’s multi-language microservices framework. It seems potentially useful but I don’t know anyone who has done anything with it.
Even after years doing this work, it’s always surprising how long “temporary” solutions last. MacOS’s “poof” animation was a placeholder that lasted for years.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Highlights below; full listing available on the job board.
Director, Research and High Performance Computing Support (RHPCS) - McMaster University, Hamilton ON CA
We are seeking a Director to provide strategic leadership and technical expertise to RHPCS senior technology staff and managers engaged in the planning, implementation, and management of multiple large and complex information systems to support long-term strategic goals for digital research infrastructure and support. The Director will encourage collaboration and lead in a manner that inspires partnerships with internal and external partners, and the values of teamwork, service excellence, respect, integrity, accountability, and inclusivity.
Research And Compliance Cloud Architect - Colorado School of Mines, Golden CO USA
The Cloud Architect is a member of the Infrastructure and Research Computing Solutions Team within the Information and Technology Solutions (ITS) Department and works collaboratively with the Information Security Team in support of security compliance efforts, ITS staff, and other university departments to further the research and compliance missions of Mines. With a thorough understanding of cloud architecture and cloud platforms, this individual will design, develop, and manage robust, secure, scalable, highly available, and dynamic solutions to drive the research, teaching and learning missions of Mines.
Senior Research Software Engineer - University of Oxford, Oxford UK
To fulfil our opportunity for impact we need a Senior Research Software Engineer that practices robust, well maintained, well documented, well tested, well managed and ambitious software development. As a Senior Software Engineer, you will co-lead the Visual AI software development directly supporting the dissemination and scientific objectives of the project. This involves maintaining an ‘integrator platform’ (the Visual Transformer) in collaboration with our postdoctoral researchers and students, as well as leading the development of smaller projects focused on partner-specific collaborations. Responsibilities include developing innovative software solutions to complex problems, managing and integrating large scientific data repositories, and driving concepts through to practical realisation.
Senior Computational Scientist - Pittsburgh Supercomputing Center, Pittsburgh PA USA
We are seeking a Senior Computational Scientist to join our team. In this role, you will have the opportunity to support research applications, solving problems with internal and external users as they accomplish discovery on modern high-performance computing platforms. You will provide in-depth consultation and may lead a team to help address users’ research and support questions. Your services may also include proposal development and technical training to ensure that allocations are managed, and outputs are working accurately. You will ensure the satisfaction of our customers by advising them on configurations, practices and policies that may favorably impact software performance.
Research and Scientific Computing Sr. Program Manager - General Dynamics, for the FDA, Silver Spring MD USA
The incumbent should have demonstrated experience successfully delivering large complex programs involving cross-functional teams that span IT and computational sciences (e.g. computational scientists, bio-informaticians, data scientists , statisticians , systems administrators/engineers, networking engineers, database administrators, and software engineers) . The ideal candidate will possess an advanced degree with background and experience in physical (e.g. biology, chemistry, physics) and computational sciences combined with the ability to align scientific and technical needs with broader business goals and objectives. In this role you will be responsible for overall program execution (technical, financial, staffing, and reporting) while providing strategic oversight and subject matter expertise to promote a culture of collaboration and continuous improvement across multiple task orders, stakeholders, and their associated teams. The ideal candidate will have a proven record of working with diverse stakeholders often with competing requirements in developing efficient solutions to address advanced research and scientific computing initiatives.
Senior Data Engineer - Memorial Sloan Kettering Cancer Center, New York NY USA
As Senior Data Engineer, you will collaborate with the Engineering Manager, and you will be responsible for the full development life cycle including collecting, organizing, transforming and analyzing data to make it more useful and accessible. Focuses on sophisticated projects requiring integration to other systems including analysis, design, programming, implementation and support. Superb communicator, excited to mentor across diverse teams with varied technical backgrounds.
Research Software Engineering Team Lead - Imperial College London, London UK
Contribute and advise on strategy and architecture. Manage the team, providing planning and performance management. Identify new and emerging technologies, products, services and techniques. Plan software development in collaboration with the team and researchers Engage with partners to facilitate open communication, acting as a single point of contact by developing, maintaining and working to strategies and plans.
High Performance Computing Department Manager - Idaho National Laboratory, Idaho Falls ID USA
Ensure that INL’s scientific programs/projects are adequately supported and provided with computational resources needed to ensure notable scientific outcomes. Advise senior management on issues related to scientific computing strategy, facilities, and software tools. Provide leadership for a team of senior scientific and computing professionals in accomplishing INL’s scientific computing mission and associated objectives.
Manager, Research Computing Applications and Operations - Harvard University, Boston MA USA
Coordinate the planning of and conduct advanced research computing engineering duties. Implement current and develop new RC solutions to keep up with the pace of complex research problems. Work independently to build, monitor, and maintain the integrity of RC systems. Provide technical expertise to teams and projects alongside research programs. Be a key contributor to multiple projects simultaneously.
Director, High Performance Computing - University of Miami, Coral Gables FL USA
The Director, High Performance Computing manages the design, development, and delivery of a cost-effective mix of services, systems, applications, and operations. Additionally, the incumbent supports research computing, including shared high-performance computing resources, data-analysis platforms, storage systems, and visualization tools and platforms across the University of Miami campus. The Director, High Performance Computing identifies, engages, and supports the needs of faculty, postdoctoral researchers, and graduate students to insure projects related to their research are successful.
Data Management - Team Manager - Bank of England, London UK
Reliable, well managed and well governed data is critical to the Bank of England in order to achieve its overall mission. This position provides a chance to people manage a team with a broad remit, including the activities related to bank-wide data cataloguing, data governance of data on the Bank’s data lake, bank-wide data management policy communication and implementation, and activities related to the bank-wide data strategy and data projects.
Research Scientist - Quantum/Advanced Computing - Carnegie Mellon University, Pittsburg PA USA
Collaboration: You’ll actively participate on teams of software developers, researchers, designers, and technical leads. You’ll build relationships and collaborate with researchers, government customers, and other stakeholders to understand challenges, needs, possible solutions, and research directions. Mentoring: You’ll contribute to improving the overall technical capabilities of the Center by mentoring and teaching others, participating in design (software and otherwise) sessions, and sharing insights and wisdom across the SEI Emerging Technology Center team
IT Architect - San Diego Supercomputing Centre, San Diego CA USA
The incumbent will act as an architectural lead for large scale projects requiring significant cross functional design. Such designs will include all aspects of science gateways and scientific applications. Designs may involve green field projects starting fresh or re-architecting existing major software projects and will include implementation language, composable service decisions, service interfaces, database architectures and designs (standard, non-SQL, and custom), design for fast performance at scale, message queueing, middleware services, large data import facilities, enabling effective UX design with underlying architectures that support a dynamic UX. S/he will be responsible for ongoing HUBzero architectural changes and those of major HUBzero applications. Architectural designs also include planning appropriate implementation technologies, including Python, PHP, React.js, Clojure, and other supporting languages and technologies. Additionally the incumbent will act as a liaison between experts in scientific simulation, HPC, middleware, web design, UX, and data science to define architectures that combine all such perspectives to deliver effective client solutions.