Research Computing Teams #115, 26 Mar 2022
Hi!
Well, it finally happened. The time came for something I’ve been dreading since taking a job with a vendor; something which held me back from taking such a job earlier.
I had my first meetings with former colleagues from the vendor side of the table.
And… it was fine! It wasn’t weird at all.
In retrospect, this was an odd thing for me to have gotten worked up about, because I’ve been through almost exactly this transition before. When I went from being a postdoc in a research department to staff at a research computing centre, the shift from “researcher” to “service provider” happened the same way. Internally, it felt like a huge identity crisis going from “academic” to “staff”! But that was all stuff going on in my head. It certainly didn’t come from interactions with other postdocs or profs or grad students, where if anything I became involved with more projects than before. I was still a colleague; only the things and expectations I brought to the table changed.
An enormous and under-appreciated benefit of our line of work is that whatever our role and whoever is paying us, we’re all still collaborators. Research, pushing forward the frontiers of human knowledge, is a hecking big job! It takes lab techs and sponsored research officers and PIs and trainees and computing staff and, yes, vendors. We’re facing off, not amongst each other, but against the unknown.
There can be rivalries, which can turn unhealthy, between teams competing to get the answer first, or get the grant/prize/faculty line. And god knows there’s ridiculous levels of gatekeeping, disproportionately arrayed against the same groups of people that face racism and bias elsewhere. We can and must fight against all of that toxic nonsense.
But we’re also trying to push forward against research problems that are bigger than any one of us. It takes a village, and a broad range of expertise and capabilities, to make any progress answering the necessary research questions that are being posed. I’m really excited about what my new job enables me to bring to bear against problems faced by research groups and those that support them! Yes, I’m be playing a different role, but a necessary one. As always in this discipline, the work will involve a mix of deeply understanding the problem, helping design and plan possible solutions, concierge-ing between people and efforts, project-managing the resulting work, and learning along the way to be better ready to tackle the next question. The different roles just focus on different facets in different ways.
With that, on to the roundup!
Managing Teams
Make changes easy for your team - Jade Rubick
Miscommunication Plan - Aviv Ben-Yosef
Rubick has a plan for making changes manageable for your team:
- Always be listening
- Keep a management backlog (!!)
- How to make a change: problem, diagnosis, remedy
- Socialize your plan
- Communicate the change
- As an experiment
- As reversible, if true
- Watch your iteration speed
and Ben-Yosef emphasizes the importance of bookending the change with lots of communication, which too often we deeply specialized people tend to forget. “The old way was bad, this way’s clearly better, what more is there to say?”
As managers we’re all always keeping an eye out for potential problems, as Rubick suggests, but keeping a management backlog is a terrific idea. Keeping track of apparent problems is an excellent way to guide further listening and evidence gathering. That’s how you figure out which problems continue to make themselves apparent, and which were transitory or imaginary, and so hone your mental model of what’s going on with the team.
One thing I’d change with Rubick’s approach is to start communicating about the problem and diagnosis first, to see if people agree. That also gives you an opportunity to solicit possible remedies maybe you haven’t thought of. Even if you have a remedy in your back pocket, and end up going with it, having people contribute to the change process will make everything easier.
Both authors talk about the huge challenge in actually pre-communicating the change and then following through. Even if people didn’t like the problem, change is uncomfortable. It takes a lot of talking and people-work to make process changes. (A google search for “change management” in quotes returns 113 million results).
I’ve never fired anyone for technical incompetence - Jonathan Hall
A good reminder from Hall that we tend to over-emphasize super-specific technical skills when we’re hiring, and under-emphasize “soft skills”. We do that even though the technical skills are more readily learned, and even though we know at some level that if we regret the hire it’s more likely to be because of the soft skills.
As technologists and people immersed in the research world, it’s easy to retreat into sort of a false but comfortable objectivity when facing uncertainty. And hiring is all about uncertainty! It’s easy to say “they flunked the [programming language X] test”, and yes that’s probably a semi-objective measure. But is day-1 fluency in language X, under the unnatural conditions of an interview, really a core requirement for success in the job? Or is it something they can learn in parallel with learning the code base if they’ve already demonstrated success in language Y?
It’s not that we shouldn’t probe for demonstrated technical success in the relevant areas - we absolutely should continue to do that! But that’s the easy part. The much thornier management situations come from hiring someone who’s technically fine-to-excellent but is causes real team problems because of non-technical behaviours. In hiring we need to probe those areas at least as much.
Managing Your Own Career
Energy Management for Newer Managers - Cate Huston
One of the first things new managers discover is that they have to abruptly switch from having a relatively few deep tasks they’re working on to many tasks, much of them fairly small. So there’s a lot of discussion and about task management and task management tools.
(Note “task management”, not “time management”. It’s not a power given to we mere mortals to manage time. We manage tasks. One of the key skills new managers have to learn to apply to tasks is gracefully declining them).
Not all of the tasks are equal. Some might take longer, true, and so that’s the usual focus with task management. It’s quantitative and easy to track.
But some tasks are just harder or more draining, especially when you’re starting out. They may take the same time, but they take more of you. Depending on your inclinations, a common one to find tiring is having a performance or expectation discussion with a team member. It might only take a few minutes, or kick off a larger discussion which consumes an entire one-on-one. And it could distract or otherwise drain you for the rest of the afternoon. Or the task might not be hard or draining exactly, but require something particular of you. Maybe you have to be particularly alert to learning some challenging materials, or to pay full attention during some one-on-ones.
It gets better! Like with physical tasks, hard management tasks can get easier as you develop certain “muscles”.
But as Huston points out, being aware of this and adjusting your task lists accordingly is important when you’re starting out in a new role. That role might be a first-time person or project manager gig, or a director job, or even taking on substantial new responsibilities within a given role. You’ll be doing new things, and it’s worth paying attention to what drains your energy and what pumps you up, and try to schedule accordingly while you’re getting used to things.
Unstuck yourself from the ideas that go nowhere - Candost Dagdeviren
Falsify yourself - Jonas Lundberg
It’s really, really hard to let go of an idea you came up with. We built the entire scientific method around that fact!
And so outside of science and the rigour of hypothesis testing and unfriendly reviewers, it’s too easy to cling to ideas that clearly aren’t going anywhere. This is especially true in people or project management, where you lack the immediate feedback that comes with doing hands-on technical work. The resulting sensory deprivation can be disorienting, allowing you to believe all sorts of nonsense. (In new contexts where you haven’t experienced feedback yet, this can be true too: e.g. “it will be weird to interact with once-colleagues from the vendor side of the table”).
Dagdeviren reminds us that continually being curious, and asking open questions of others, can help us with this. Lundberg says that you don’t need to wait for others; you can try the scientific approach and attempt falsification yourself. That can involve probing and questioning your assumptions, trying “pre-mortem”s, listing tradeoffs, and trying to construct other possible solutions.
Product Management and Working with Research Communities
Facilitating an online participation-rich workshop in Gatherly - Adrian Segar, Conferences That Work
I don’t really understand Zoom or Teams videoconferencing hate. They’re fine tools for certain kinds of meetings.
And yet I agree that they’re definitely not as interactive as being in person, and especially being in person around a whiteboard or pad of paper. Something about the almost-imperceptable lags and reduced amount of body language makes back-and-forth harder - I’ve said “no, you go ahead” more times in the last two years than I have in in-person conversations over my entire life.
I don’t think this is an insurmountable problem of online meetings. Even slight nudges to the tooling make things much more interactive. Slack’s in-image chat and emojis help quite a bit, as does Whereby’s tooling. It’s pretty noticeable! Just moving our old team’s online meetings from Zoom to Slack videoconferencing made a significant difference in collegiality and chattiness (which admittedly isn’t the outcome you want for every meeting). Same people, same time, same calendar invite, better meetings for that purpose.
Segar talks about hosting a 2.5 hour workshop in Gatherly, which I’ve not used but I’ve heard a few good things about. It seems like it has a promising mix of the “broadcast” zoom-like mode, the room-like mode that spatial chat applications like gather.town have made popular, easy switching between them, and interactivity.
I really should try making more use of things like Zoom “apps” to bring whiteboarding and task trackers or whatever directly into the meeting; I think that could help. But being able to easily self-organize breakout sessions everyone has a top level view is could be good for other meetings.
What tools and techniques you used that’s helped have more interactive remote meetings? Anything I and other readers should try, whether it’s software or method? Let me know - hit reply or email jonathan@researchcomputingteams.org .
Research Software Development
Workflows Community Initiative - Workflows Community
Ten simple rules for making a software tool workflow-ready - Brack et al., PLOS Computational Biology
Research data analysis pipelines have always been complex, and oft-times kind of awkward fits to the usual queueing systems used in research computing systems. But until recently they were a relatively small part of the workload portfolio.
Today data analysis and data science continues its explosive growth, and even simulation workflows are getting complex. I hope we’ve found the peak of the exploratory, everyone-makes-their-own-workflow-engine, Cambrian explosion phase of discovering what works and what doesn’t. Next up is developing and promulgating some best practices.
So I’m excited to see a community start up around workflows for research computing and data. It looks to be about a year old, and still nascent, but with real ambitions about training, developing standards and sharing. (It also had a workshop in November about “Tightening the integration between computing facilities and scientific workflows”. Is this work Research Software Development? Systems? If it’s for automated data processing workflows is it part of Research Data Management? It’s almost as if modern research computing and data is growing beyond the boundaries of these 90s-era silos…)
Meanwhile, Brack et al have a nice short article about making a tool workflow-ready. Those of us who started developing in Unix-y environments, where scriptability goes almost without saying, won’t learn much from that article. But a lot of our trainees or juniors very much have not had that experience. This would be a good article to send them to nudge them in the right direction to make their tools automatable and thus, almost incidentally, suitable for inclusion into larger workflows.
C Isn’t A Programming Language Anymore - Aria Beingessner
There’s been a lot of discussion in blogs and software development twitter lately about C’s ABI (or lack thereof!), brought on in part by Aarch64 now becoming prevalent. As long as the ecosystem was overwhelmingly x86_64 and others accepted they had to jump through weird hoops, some issues could sort of be papered over, but now long-standing issues are coming to the fore.
If you don’t program in C or directly interact with C libraries this may seem a pretty obscure topic, but Beingessner sum up the issue more cogently than I’ve seen discussed anywhere else:
Oops! Now C is the lingua franca of programming. Oops! Now C isn’t just a programming language, it’s a protocol.
“C’s ABI as a protocol” is an excellent and succinct distillation of the problem. It’s not just how programming languages interact with the OS on Unix-y systems, it’s how programming languages interact with each other. Want to call (and send some data to) a Rust library from Go? You’re going to need to use the C FFI on each side, even if no C code is actually involved.
And a protocol with ill-defined types and conventions is an unreliable protocol.
So what’s to be done? Honestly, who knows. It’s easy to say let’s change things, but the 2014 glib ABI break for the System/390 mainframe(!!) is still something people talk about, and you could probably comfortably convene a quorum of S/390 system programmers in a large conference meeting room to decide on a path forward. Changing one of the de facto ABIs like x86_64-linux-gnu would break unfathomably many things in who knows how many places.
But at least a clear description of the problem advances the discussion.
Good overview of C++23 range adaptors (C++ finally gets zip and join!), and a tutorial introduction to the long-awaited Go Generics.
A GitHub Action that pings people to update their PRs if there are updates to important branches.
Research Computing Systems
Shrinkwrap: Taming dynamic shared objects - Farid Zakaria
If you’re using (say) Spack, which is lovely, to manage builds of tools, you may have a lot of resulting binaries with N dynamic library dependencies and a RUNPATH of M path entries. That means there are O(NM) file system operations needed each time the executable is loaded, on each process. That can really hammer parallel file systems.
Shrinkwrap is a package which freezes the dependencies (which IMHO is what you want in a modules-type system, for reproducibility if nothing else) in the DT_NEEDED section of the binary, reducing it to O(N) file system accesses.
Emerging Technologies and Practices
“Milan-X” 3D Vertical Cache Yields Epyc HPC Bang for the Buck Boost - Timothy Prickett Morgan, The Next Platform
Morgan has a typically thorough article about the AMD’s Milan-X and its vertical cache.
It’s been great to see AMD’s Milan and Milan-X absolutely killing it on real-world HPC applications. It’s particularly clear in various “bake-offs” on AWS, where they’re part of Hpc6a instances; they’re competitive-to-leading on raw performance, and just smoking the other CPU offerings on price-performance.
I find it fascinating that we’ve clearly reached the point where CPU hardware optimizations are increasingly workload-specific, even within HPC applications. AMD has wisely created a number of Milan-X options with varying caches and core counts. The tradeoffs really matter, even favouring one or the other of EDA, CFD, and FEA. I think that’s remarkable! Even two years ago, in presentations or reports, all three of those workloads have fallen under the same undifferentiated “HPC” category.
With the disclaimer that I’m at NVIDIA now, Morgan also has articles on the Grace and Hopper announcements this week, and his take on what they may mean.
GitHub had problems on and off for the past 10 days, and even GitHub is having problems understanding them. Their writeup by Keith Ballinger is a pretty frank discussion of the problems and the fact that it’s not resolved, and makes use of the fact that they’ve already written quite a bit about how they’re growing their back end to support their growth. I think it’s a good example of clear and honest communication about an ongoing problem, even if there’s no resolution. Certainly our researchers deserve the same kind of honesty and transparency that free-tier GitHub users do.
This talk by Andrew Helwer of Microsoft Research is a really nice introduction to quantum computing for computer scientists.
Random
A receipt printer that automatically prints out new GitHub issues as soon as they’re created. Cool and demoralizing!
Miss the olden days of talk
for in-terminal interactive chat? No interest in making “long distance” calls to those outside your L2 broadcast domain, stranded on the other side of a router? Why not (besides the many obvious reasons, of course) write a chat app entirely based on ARP packets?
60% of university staff and grad students in the UK are planning to leave the higher ed sector in the coming years.
Having a new computer for work has made me think about keyboards quite a bit - Ars Technica has a nice overview of mechanical keyboards for newbs like me. They don’t all have to go CLICKY CLICKITY!! CLICK all the time.
Understanding TLS by implementing a toy version of it.
A simple in-browser IDE for interactively writing Graphviz diagrams - Edotor.
Drawing a circle without using floating point numbers.
A high end RISC-V CPU you ran run on your favourite cloud provider FPGA node - VRoom.
GCC has released gcobol. No, it’s still March; this is for real. You can even build it on OpenBSD, kind of.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 134 jobs is, as ever, available on the job board.
Aquatic Informatics - Team Lead, Data Science - Remote - Hach, Various CA
Are you passionate about data science, mentoring others, and research and development? This role is an excellent opportunity to join our Predictive Analytics team to support Aquatic Informatics’ suite of digital twin and decision support products. These core growth products provide municipal drinking and wastewater customers with the predictive and prescriptive insights they need to operate optimally. This leader will have the opportunity to directly contribute and cultivate a dynamic, multi-disciplinary team of engineers and data scientists applying cutting-edge technology to deliver solutions that have a tangible impact on the environment and human health.
Program Manager, Responsible AI Health Research - Google, London UK
Work closely with research scientists, engineers, product managers, and business development to deliver impactful projects with internal and external partners. Lead team planning for a set of established projects in the team’s portfolio, including Objectives and Key Results and longer-term roadmapping.
Use technical skills to understand and make project trade offs, prioritize partner needs, challenge assumptions, and evaluate project work and risks.
Coordinate cross-organization efforts to improve visibility, including running weekly technical presentations, organizing events, etc. Communicate both vision and details (such as project status and risks to various levels of the organization); persuasion through reasoning and data; and resolution of conflicts over goals, priorities, and approach.
Program Manager - Data Sharing Services - AWS, Various EU or UK
You will own a strategy, in liaison with tech teams, on how to develop data spaces that will connect fragmented and dispersed data sets from various ecosystems, from the private and public sectors. You will liaise with highly visible customers and external stakeholders such as Gaia-x, the International Data Spaces Business Association (IDSA), the Eclipse Foundation, Fraunhofer research institutes and more.
Applied Science Manager, Spoken Language Understanding - Amazon, Toronto ON CA
As an Applied Science Manager for the Alexa AI, Edge NU Science team you will be responsible for leading a team of machine learning scientists in the field of NLU and Edge computing. The responsibilities include the strategic aspects of setting long term research directions as well as the tactical aspects of addressing customer pain points, setting priorities, and driving the design, development, and deployment of NLU and SLU technologies. A successful candidate will have an established background in developing customer-facing experiences, a strong technical ability, excellent project management skills, great communication skills, and a motivation to achieve results in a fast-paced environment. You will hire and develop your team, build customer-facing experiences, and manage your own projects.
Research and Enterprise Manager, Centre for Advanced Research Computing - University College London, London UK
We are an innovative hybrid: a professional services department that delivers reliable and secure infrastructure and services to UCL research groups, and a laboratory for research and innovation in the application of advanced computational and data- intensive research methods, working in partnership with academics from all fields. Principally, the post-holder will provide high-level strategic and operational support for research and enterprise activities within a department and be the primary contact for academics, industry professionals and researchers interested in these undertakings.
Research Computing Architect and Solutions Developer - North Carolina State University, Raleigh NC USA
This Research Computing Architect and Solutions Developer position will work closely with interdisciplinary research teams to design IT systems to support data acquisition, management, and analysis requirements, and support complex interdisciplinary projects within and across university academies, centers, initiatives, and institutes. In collaboration with project Principal Investigators (PIs) and OIT’s Assistant Vice Chancellors for Communications Technologies and Shared Services (who serve as the virtual Chief Technology Officer for OIT), this position provides design options for personnel and computing architectures needed to support large interdisciplinary research project objectives.
Senior Software Development Manager, HPC Test Infrastructure, HPC - AWS, various USA
The AWS High Performance Computing (HPC) organization is looking for a Senior Software Development Manager (Sr. SDM), Seattle to lead a new Test Infrastructure for High Performance Computing (HPC). HPC teams are focused on distributed computing, high-speed networking, and Linux systems programming. The AWS HPC engineering group develops services and solutions that span the full stack of our offerings - from our AWS ParallelCluster cluster orchestration tool to our high-speed, low-latency, kernel-bypass networking Elastic Fabric Adapter (EFA). We contribute to open-source distributed computing frameworks like Libfabric and Open MPI. We enable a broad set of applications for computational fluid dynamics, weather modeling, molecular dynamics, seismic modeling, and machine learning.
Software Development Manager - Quantum Computing, Amazon Braket - AWS, Seattle WA USA
We are looking for a Software Development Manager to build upon one of AWS’s newest, cutting-edge services. You will manage a team of engineers to define, develop, and deploy critical Amazon Braket infrastructure components such as compilation, optimization, translation, and control interfaces. You will create and maintain foundational service functions central to quantum computing in the NISQ period and beyond. You must enjoy working on complex problems at the intersection of quantum computing, machine learning and computer science. You are familiar with AWS tools and services, and best practices governing their use.