The good news is that my team, and the larger organization I’m a part of, is going to be growing substantially in the coming year. That’s also the bad news. We have to hire.
Hiring team members is a time-consuming, exhausting job - and probably rightly so, since it’s the most important thing we do. A lot of planning, organizational, and process mistakes we make as managers can be mitigated if we’ve helped assemble a terrific team; on the other hand there’s only so much pulling on those same levers can help us if we’ve made poor hiring choices. Your research computing team members are the people who do the work of supporting research with working code, data curation/analysis/systems, or computing systems. Putting that time and effort into hiring that makes it time consuming and tiring is absolutely appropriate.
Hiring, like anything else in management, is a practice that you can improve on by having a process you go through that you learn from and improve each time through. That means being a lot more deliberate about hiring (or really any other aspect of management) than we usually are in academia-adjacent areas. It’s also, to be honest, more work. But hiring is the most important decision you’ll make as a manager. Decisions you make about new team members will last even after you leave. A good hiring choice will make everyone’s job easier, and a poor hiring choice will make everyone’s job worse.
The larger organization I’m part of is setting up a proper hiring pipeline/funnel, so the whole process has been on my mind this week. And every research computing team leader I’ve spoken to in the past couple of years -including one I just met this week - has described the same issues about hiring . So for the next several weeks I’ll write up how we approach hiring, and what we learn as we proceed with our hiring pipeline. The topics will look like:
The most important thing I’ve learned about hiring after leaving academia I first heard on Manager-Tools - here’s one of the relevant podcasts - but once you see it you see it in all kinds of advice, including in advice on how to reduce hiring bias.
The purpose of the screening and interviewing process is to find a reason to say no; to discover ways in which the candidate and the job would not be a good match
In academia-adjacent fields, we tend to reject this,. We’ve been trained in an environment which emphasizes “meritocracy”, and for students and postdocs and junior faculty everyone one is looking for the “smartest” and “best” candidate with “the most potential”.
But none of that is right for us. It’s not great for academic positions; it’s definitely wrong for research computing positions.
We’re hiring a new person because we have a problem; the team has things that need doing. It doesn’t matter who’s “smartest” or “best”, even if we knew how to reliably assess those things, which we absolutely do not. What matters is how well a job candidate can do the things we need them to do, and how well the tasks we have match what the candidate is looking for. The “smartest” candidate who has skills the team already has in abundance but lacks the things we really need in this new position of can’t help us. The “best” candidate who is looking to do things that are radically different from what we need them to do is going to resent the job and leave it as soon as possible if we do hire them.
In academia we still tend to reject this, either because we’re still looking for “the best potential” like we’ve been trained to do, or for the opposite reason - because we see it as toxic and biased (which it is) and we don’t want to do anything which looks for negatives, which smacks of “winnowing out”, “raising the bar”, “keeping candidate quality high”, etc. But that’s not what’s going on here. We need to think about it as scientists.
Once we’ve selected people for the next step, whatever that step is, the hypothesis is that they are suitable candidates for the job. That must be the hypothesis, or we wouldn’t be wasting their and our time. And so, as scientists, the right way to proceed is to attempt to find evidence that contradicts the hypothesis; evidence that they wouldn’t succeed in the job. We are not searching for evidence that a person would be a good hire; there is nothing that is easier than fooling yourself into confirming a hypothesis! Confirmation bias and the halo effect are the enemy here. They allow all sorts of other biases to sneak in (and not necessarily even nefarious biases, like the classic “looks like/thinks like me”; it could just be unconsciously over/under-weighting certain parts of the job based on experiences with previous hires). We want to find evidence that this person would not succeed in the day-to-day technical work and teamwork we’d be asking of them.
In French class in school, I used to do dictées, where the teacher would read aloud and we’d transcribe into written French. We’d start with 100% and lose points for every wrong answer. That’s the approach we’re taking here. We have a list of must-have skills or approaches - technical but also team-related - and we look for evidence that the candidate does not have what we need them to have. Except they lose all the points when they’ve failed to demonstrate they have a must-have requirement for the job.
You can absolutely still be biased - your bias against a candidate can be so strong as to blind you to the fact that they have in fact demonstrated such-and-such a requirement. But it’s harder - not impossible; harder - for us humans to deny evidence that’s clearly in front of them than it is for us to choose to see evidence that they do have the skill, or at least they’ve shown the potential for it. That’s why the scientific method is the way it is. You find evidence to disprove, not to support, the hypothesis.
Once you’ve gotten used to the idea that the hiring process is one where the purpose is for each side of the potential match to find reasons to say no, a lot of things become clearer — what the process might be, what job ads should say, and what your approach you should take to criteria and questions. We’ll talk about criteria and questions next time.
Make Boring Plans - Camille Fournier
Riffing off of Chose Boring Technology, Fournier advocates for making boring, plodding plans - well thought out, well communicated, based on trying things, making sure they work, and incrementally implementing.
While it’s absolutely true that teams can be very motivated by audacious, ambitious goals, the plans for getting should be nice and boring. This is especially true when one has to incorporate excitement somewhere else; “Novel Technology Deserves Boring Plans”.
4 Lessons Learned in 2020 as an Engineering Manager - Luca Canducci
I’ve been avoiding 2020 retrospective articles - honestly, who really wants to look back - but Canducci’s lessons and descriptions here are good:
There’s a common, dumb argument that there can’t be sustained discrimination in a competitive hiring market place, because competition would have gotten rid of any such inefficiencies.
Needless to say trying to negate actual reality with pure thought doesn’t work well, and Luu’s article here puts this article to rest. This is an older article that is extremely relevant to the hiring process text above; how you aren’t trying to hire some “best” candidate out there - and that even if you were, making steps to ensure you have a diverse candidate pool and making a point of hiring candidates who face discrimination is the very opposite of “lowering the bar” and so presumably having worse outcomes.
Questionable Advice: “How do I feel Worthwhile as a Manager when My People are Doing all the Implementing?” - Charity Majors>
The Non-psychopath’s Guide to Managing an Open-source Project - Kode Vicious, ACM Queue
Majors’ article is a good reminder for new managers that it’s really hard to recalibrate job satisfaction or the feeling of accomplishment when you’ve moved into management. All you can do is focus on the big, long timeline stuff while still taking joy in the little moments, and make sure that you’re a source of calm not chaos in a crisis.
Vicious takes on the same topic but from the point of view of a new open source maintainer, which is managing a software development team in hard mode. Most of the same rules apply.
Newcastle University Research Software Engineering 2020 (PDF) - Newcastle Research Software Group
BEAR - Advanced Research Computing Research Software Group 2020 Report (PDF) - Birmingham Research Software Group
These two reports on the 2020 activities of the research software development groups at Newcastle and Birmingham are extremely interesting if you run a research software development core facility-type operation, and very interesting even if you don’t int terms of the clear product and strategy mindset (and communications efforts) behind the groups. In Newcastle’s, we get some interesting overviews of what 2020 held:
Their service offerings are discussed - offering MSc computing projects supervised by faculty and the RSE group, offering coding services with fractional FTEs written into grants They also are pretty transparent about where they’re going; they’ll changing to a simpler and easier to administer model but one with a little less certainty - rather than allocating (say) 40% of a staff person to a project, they’ll be charing facility day rates like a core facility. This will greatly simplify taking on shorter-term projects.
They’re also getting into higher-level services, like architecture/design training and consulting rather than doing the hands on work, and trying to tie into the institutional open data repository.
Birmingham’s is just as interesting, and reflects an organization with a different focus. The team has up to this point been a small number of centrally-provided RSEs plus dedicated RSEs for different colleges/schools, all sitting together; rather than longer-term contract software development they provided free, time-limited advice, coaching, coding, and mentoring. (The coaching is particularly interesting to me, as I hadn’t heard that before; they won’t be hands on keyboard but sitting with and advising as you take on projects - or even reviewing your code).
They’re also responsible for software maintenance on the main Birmingham research computing systems, and perform training.
They have started moving, however, to including grant-funded research software developers for longer-term projects, allowing researchers to “hire” a fractional software developer without having to recruit people with expertise they may not be able to judge, and have that intact whole person sitting in a team of colleagues.
Theres are really interesting documents, and our jobs as managers would be easier if more teams wrote up such descriptions of their operations routinely.
Strengths, weaknesses, opportunities, and threats facing the GNU Autotools - Zachary Weinberg
Another very transparent product-focused assessment; a simple but thorough SWOT analyses of the current GNU Autotools stack, which hasn’t been updated in some time (which itself makes the updates harder since the entire process is “rusty”), and which has enormous legacy baggage, but still has opportunities.
Recognizing the value of software: a software citation guide - The FORCE11 Software Citation Team, F1000 Research
A style guide for citations of research software; following the American Psychological Association style guide, something like this
> Coon, E., Berndt, M., Jan, A., Svyatsky, D., Atchley, A., Kikinzon, E., Harp, D., Manzini, G., Shelef, E., Lipnikov, K., Garimella, R., Xu, C., Moulton, D., Karra, S., Painter, S., Jafarov, E., & Molins, S. (2020, March 25). Advanced Terrestrial Simulator (ATS) v0.88 (Version 0.88) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.3727209
is recommended, with the obvious implications for what software authors should make available. There are variations on the link URL for some software archives.
(I have complex feelings about this, due to my entirely heretical belief that much of research software - and necessarily the most-used research software - isn’t in and of itself a research output.)
Creating a Risk-Adjusted Backlog - Mike Griffiths
Here’s an example of a concept that I think research software development teams probably “get”, if implicitly, more than teams in other environments.
Research software development spends much more time further down the technology readiness ladder; we spend a lot more time asking the question “can this even work” than we do “when will this feature ship”. The risks are higher, because most promising research ideas simply don’t pan out. So we spend a lot of time prototyping, fully aware that the answer to “will this work” could well be “no”.
Griffith’s theme is that even as you march up the technology readiness ladder to more and more production code, you should still be explicitly prioritizing risk-mitigation tasks on the backlog rather than just prioritizing the most valuable feature. That might be code cleanup, it might be doing research on uncertain new steps way earlier than seemingly necessary, it might be documentation - it depends on the risks you’re worried about. In our context it might include going to conferences and giving talks about your tool, if the risk is lack of adoption.
Number Parsing at a Gigabyte per Second - Daniel Lemire
And here’s a reminder that efficient deserialization of floating-point numerical data from text is still an open area of research. Even the 3-4x improvement in performance enabled by the algorithm referred to in this blog post (with links to this paper by Lemire) pulling numerical data off of disk is still dominated by the deserialization, not disk I/O, at least on SSDs.
The Next Gen Database Servers Powering Let’s Encrypt - Josh Aas and James Renken
Let’s encrypt, wihch provides certificates for 235 million websites, is powered by a single MariaDB 64-core server with 2TB of RAM and 24 NVMe SSDs. There’s a tendency in tech to want to use the latest distributed DB for things, or in research computing/HPC to complain “why don’t they just write it with MPI”, but if you can have a simpler solution running on one fat node that does the job, that’s usually going to be the right choice.
Firecracker: start a VM in less than a second - Julia Evans
63-Node EKS Cluster Running on a Single Instance with Firecracker - Álvaro Hernández, OnGres
Firecracker is a lightweight, stripped-down VM engine that uses hardware support for isolation, and it’s intended for use as a host for small or short-lived processes (like Amazon Lambda functions) that don’t require support for a million devices like a virtual desktop would.
Evans has a use case that could well be useful for research-computing applications, for interactive analysis or training, where each user gets their own VM, and walks through the process of creating a Firecracker image and starting it up. It’s also (as if often the case with Evans’ posts) a nice collection of related resources.
Hernández has another use case - spinning up a many “node” kubernetes cluster for devel/testing/messing-around-with purposes on a single actual node. In his blog post (and associated gitlab repo) he shows how to use Ansible, Firecracker, and EKS-D, an open source AWS EKS-compatible distro of Kubernetes, to run the cluster on a single, admittedly very large, AWS instance.
Scaling Kubernetes to 7,500 Nodes - OpenAI
If your workflow is just running batch jobs, there are much simpler tools than Kubernetes. But if you want to schedule some batch jobs while also spinning up and down web services, have batch jobs connected both ways to web services, have dynamic software-defined storage and networking…. then Slurm’s going to be a challenge.
The downside for all of that flexibility is complexity and challenges pushing to new scales. Here OpenAI walks through the changes they had to make to networking, deployment of API servers, monitoring tools, and health checks to handle what in simpler, more uniform HPC clusters would be a healthy but not exceptional number of nodes.
PEARC’21 - Full/Short papers due 9 Mar/13 Apr, virtrual conference 18-22 July
PEARC is a great conference about research computing and providing research computing quite broadly. Tracks include
There are also tutorial and workshop proposals due 9 Feb.
Events & Conferences in 2021 - Research Data - Springer Nature
Rather than a single event this is an updated resource of research data events; some events that look intersting are Open Science conferences in Feb and June, and the RDA plenary in April.
A Compaq AlphaServer emulator running OpenVMS on your linux box.
A beginer’s guide to capture the flag events.
Interested in trying Nix, but not so much that you’re willing to wipe a whole machine? dev-env is a local nix env “on training wheels”.
I’ve been looking for better markdown-presentation tools for a while. Marp looks really promising.
Research computing, and simulation, has a history spaning over a half-century. Here’s video and oral history of the field.
We live in an age of static-analysis wonders; the Enzyme package will autodifferentiate functions at the LLVM-IR level, meaning not only does it work with a large number of languages but it will autodifferentiate optimized code, making for potentially substantially faster results.
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Highlights below; full listing available on the job board.
Senior Scientific Engineer, Scientific Computing - Bioteam, remote USA
The ideal candidate will find deep satisfaction in solving complex problems and being a part of the development of custom solutions to meet our clients’ unique needs. Our engagements result in innovative and unique solutions, to be successful you should have a deep curiosity and the desire to learn about “bleeding-edge” advances in biomedical science.
Senior Scientific Consultant - Bioteam, remote USA
Senior Scientific Cloud Engineer - Bioteam, remote USA
Senior HPC Consultant and Technologist - Bioteam, remote USA
Senior Data Science Consultant - Bioteam, remote US
Research Project Manager - Genomics England, London UK
We are seeking a highly talented individual with a scientific background to help manage and coordinate our scientific R&D activities. This role will offer you the unique opportunity to use your skills in cohort population studies to advance personalised genomic medicine in the UK. This role offers remote working, with the expectation of attending our offices fortnightly - either in Hinxton, Cambs or London. It’s important that you have experience in cohort population studies as you’ll manage the onboarding and submission of new sample cohorts, ensuring legal agreements are in place, consent reviews undertaken & systems set up.
Associate Director, Advanced Analytics - Canadian Blood Services, Canada
Reporting to Director, Integrated Supply Chain Business Systems & Analytics, the Associate Director, Advanced Analytics will be primarily responsible for driving the advanced analytics agenda at Canadian Blood Services (CBS), a core capability required to successfully deliver donor relations and supply chain strategies and drive best-in-class integrated supply chain performance. You will also be responsible for leadership and product management of technology and digitalization opportunities for Integrated Supply Chain, including enhancement efforts and new capability analysis.
Scientific Computing Manager - Fred Hutchinson Cancer Research Center, Seattle WA USA
The Scientific Computing Manager will direct and oversee all facets of scientific computing including managing a highly capable engineer team, engineering functions such as design, development, installation, and maintenance of hardware and software, and customer service and support, for the organization. The manager will partner with faculty and stakeholders in the scientific community to test and integrate new analysis pipelines, determine innovative technologies and software to support their research, evangelize new and current solutions, and foster technology and data storage best practices. The manager will participate in strategic planning and align projects and resources for implementation in partnership with the Project Management Office and Business Operations, and other departments.
Faculty Computing Manager, Natural & Mathematical Sciences - King’s College London, London UK
The ideal candidate will have a broad background of technical experience covering infrastructure and end user technologies, in particular Linux and open-source solutions. The Faculty is growing and seeks someone with a strategic vision that allows technical innovation and scalable operations. They should be an excellent communicator able to effectively work with technically savvy academics and inexperienced users alike.
The NMS Computing Team is responsible for providing end user support for education and research within the departments. It also includes a small development team which maintain a number of in-house applications and infrastructure platforms. The role will also work directly with colleagues in King’s central IT and e-Research functions to deliver the Faculty’s technical and functional requirements from central platforms and services.
Technical Program / Product Manager, Technology Platforms - Sage Bionetworks, Seattle WA USA but remote currently
As a Technical Program / Product Manager, you will help scientists attack some of the most challenging problems in the world. You will work with our design, engineering, and research teams to manage our development process, understand requirements, design solutions, and ensure a high-quality product. You will be passionate about helping our technology team be efficient in the delivery of software solutions needed by internal and external users of our platform. You are technical and able to write, update, and disseminate detailed specifications as well as help manage an agile development methodology.
Project Manager, Scientific Coordination and Community Engagement - Sage Bionetworks, Seattle WA USA but remote currently
Senior Research Scientist, Computational Biology - Sage Bionetworks, Seatle WA USA but remote currently
Technical Project Manager – Tool Development & Method Reliability - Sage Bionetworks, Seattle WA USA but currently remote
Director Of Research Computing - University of Colorado Boulder, Boulder CO USA
The Director of Research Computing fosters scholarship, discovery, and innovation, supporting CU Boulder faculty needs for a continually advancing IT research environment in order to build and accelerate CU-Boulder’s research competitiveness. Responsibilities span three areas: Leadership of the campus cyberinfrastructure. Campus leadership in data-intensive research and teaching , as Co-Executive Director of the Center for Research Data and Digital Scholarship (CRDDS). Bring to the campus CI a leading national and regional presence in academic, federal, and private sectors.
Chief Technology Officer - Open Air Quality, Remote
Lead the iteration, growth, and future development of several production systems Oversee essential architectural evolution and the development of a comprehensive platform and infrastructure strategy for OpenAQ’s technical ecosystem and tooling Along with the Executive Director and team, develop an innovation pipeline of platform tools, based on new data use cases, community engagement and feedback in line with and in support of OpenAQ’s strategic plan