Research Computing Teams Link Roundup, 22 Oct 2021
Hi!
So, two final aspects of our recent hiring situation that I haven’t had room to mention earlier: we’re giving out interview questions ahead of time, and interacting with peer teams is still hard in this hybrid/remote world.
The first of the two is likely more surprising and/or controversial. We’ve changed a bit how we’re hiring, including sending out some key interview questions ahead of time. This was initiated by a team member, interviewing co-op students. I was initially pretty skeptical, until I attended the resulting interviews; the discussions were so much better, and went so much deeper, that I wanted to keep trying it. Our initial attempts with interns went well enough that we’ve hired our first contract staff member using this approach too (and sent a pretty detailed candidate packet).
This seems so far like it works particularly well for behavioural type questions, or questions of a similar type (“Tell me about your ‘favourite’ bug you found and had to fix”, etc). By them knowing the questions that are coming they’ve had time to come up with good initial answers and have them fresh in mind; then, the followup questions - where the real work of the interview is done anyway - comes much faster and so can go much deeper in the same amount of time. It also feels (on both sides, seemingly) much more like a conversation than an interrogation.
We’ve talked in the newsletter about how useful it is to be explicit ahead of time with the candidate about the interviewing and hiring process (e.g. #84, talking about candidate packets); now the discussion is, for each bit of the process, given what we want to accomplish, how detailed should the explicit information be. Obviously there are some kinds of technical assessment that can’t be done with full up-front disclosure, but even there we can be more detailed than we’ve been in the past.
We’re still early on in this process, especially for permanent staff, but the results seem to be really promising for interns; they are so far succeeding at least as well as those hired the old way, the acceptance rate appears to be markedly improved, and everyone seems happier with the interview process (although of course candidates - maybe even especially successful candidates - are unlikely to tell us our process is terrible). Doubtless we’ll find situations where it doesn’t work; I suspect in particular that for leadership positions where the responsibilities are more wide-ranging and assessment criteria are less clear, doing this would be difficult.
Second, while we’ve pretty much got our own internal team communications down in this new remote/hybrid world, interacting with peer teams is still an unsolved problem for us. We were caught off guard by one of our peer teams wanting the same student - this would have come up much earlier when we were all sharing a work space.
I think in general, most successful teams have got the hang of remote and hybrid communications now in situations where (a) people work closely together - within teams, or between very closely collaborating teams; and (b) people interact seldom - external collaborators, or resources at other institutions, where “reaching out” always took a fair bit of effort.
What seems like an unsolved problem for us is the in between state - people who would interact occasionally and with relatively little effort going into it. Peer teams working nearby; groups giving colloquia that we occasionally attend. For the peer teams we’ve tried (unsuccessfully) to set up some internal events to help maintain the communication flows, and that hasn’t worked so far. I could expand my peer one-on-ones, but that would make the managers a bottleneck (and often the right topics wouldn’t come up). We can’t all just chatter on the same slack - that’d become an unwieldy number of people. Something like an all-of-organization donut kind of thing could work, but it seems like a big ask of team members when most of the time nothing would come of it.
What do you think - have you tried sharing interview questions or other aspects of interviews ahead of time, or are concerned that it’s a terrible idea? Have you found something that does work for replacing what used to be occasional interactions in the workplace? Let me know, and if you give permission I’ll share with the readers.
But now that’s the last, I promise, of stuff about our co-op hiring mini fiasco. On to the roundup!
Managing Teams
How New Managers Fail Individual Contributors - Camille Fournier
Fournier has coached a lot of managers, and she shares some common failure modes of new managers:
- Doing all the technical design work yourself
- Doing all the project management yourself - your team members will need to learn those skills as they advance
- Neglecting to give feedback
- Hoarding information - intentionally or unintentionally
- Focussing on the their own output and not that of the team
How to write (actually) good job descriptions - Aline Lerner
In an incredibly tech hot job market, it’s hard to even attract the attention of candidates who would be amazing additions to our team. That can be helped with active recruiting; your other tool is your job ad. Lerner reminds us about writing good job ads:
- Focus on attracting good matches
- Have an “about us” section
- Have an “About you & what you’ll do here” section - including specific and memorable descriptions of what they’ll do.
Managing Your Own Career
How to find engineering leadership roles - Will Larson
Larson describes how to look for lead-of-lead type roles (20+ computing staff) in his world (tech industry), but I think it applies in our world, too:
- Hearing from peers - part of what I want to do starting with this newsletter is create a community of practice of RCT managers where this can more readily happen
- From listings - Ditto! - although Larson suggests here trying to find a referral of some sort rather than just a cold apply
- Search firms - pretty unlikely in our line of work
- Crowd-sourced searches - basicaly hearing from a broader range of peers
- Sharing that you’re looking online - which obviously has downsides
Product Management and Working with Research Communities
Construction Kit: a review journal for research tools and data services in the humanities - Construction Kit Editorial Team
This is cool - a peer-reviewed, open-access journal specifically for research tools and data services in the humanities:
CKIT understands computational work as an essential infrastructural and intellectual part of humanities research within and outside of the digital humanities field.
They’re actively looking for papers:
The editors of CKIT invite authors to submit a review of a research tool or data service in the Humanities. Reviews should have a length of 1500 to 4000 words and can be written in English or German. Each review should address the underlying concept of the research tool or data service, the domain specific research questions it addresses, and the engineering practices it implements.
The Actual Next Million Cloud Customers - Corey Quinn
This is about cloud and AWS, but it’s really about early vs middle adopters of technological solutions, and how organizations can best serve the median potential client. I think there’s a lesson for Research Computing and Data (RCD) teams, too.
AWS famously has approximately four kazillion services that act as building blocks for - well, almost everything. And for tech-savvy customers - the first million cloud customers - that’s exactly what they want! But to start making inroads into the next million - and the million after that, and… they’ll need something different. And that would be a big change:
The entire AWS sales and technical field teams would need to learn to have very different conversations with customers. But consider how unworkable the alternative is. If today’s sales and marketing motions continue doing what they’re doing, customers […] growing sense that AWS is and remains an “infrastructure company” instead of a company that builds services that empower its customers but can’t articulate how it might do so.
As RCD teams start trying to make inroads into broader ranges of research and research groups - something that is absolutely essential if RCD is to unleash the full potential of research - there’s going to have to be a decreasing focus on “selling” bytes and flops and disk space and undifferentiated “software development”, and more of a focus on services that make sense and are obviously valuable to a non-big-compute-savvy wet-lab biologist or historian or pure mathematician. And what services make sense and are obviously valuable will be very different for those three folks.
Research Software Development
Tests aren’t enough: Case study after adding type hints to urllib3 - Seth Larson
I love a good migration story, and this is a pretty big one, both in terms of change size (eventually touching every function in the code base) and package importance (about 5,500 packages - and like 2/3 of a million repositories - depend on urllib3).
Python has a lot of drawbacks, as everything in computing does, but one thing I really like about it for research software development is that it enables an incremental path to maturity. Python type hints allow you to put type annotations in code starting with Python 3.6, and a lot of tooling (IDE and static checkers like mypy) can then highlight problems for you.
In this article, Larson describes the process, and leads with why it was worth it (“hundreds of engineer hours across several months”) - even with 1800 tests and 100% test coverage of lines of code, this process still identified code quality issues and bugs they hadn’t been aware of.
The scope of the effort made it a little challenging. It couldn’t be done all at once, but doing it incrementally was challenging, given how the typing (or lack thereof) “spreads” across files when they’re imported into others, etc. Their approach involved:
- Keeping a list of known-annotated files, running mypy on them, and filtering out errors from files not on the list
- Developing conventions about temporarily disabling warnings on some functions, with comments as to why (because the type mismatches get flagged with too little context to see immediately the reason) and being explicit about which errors are being ignored, both for tighter checking and as a form of documentation
- Adding typing to the test suite, which in principle isn’t necessary for correctness of the library, but let them identify issues with typing
- Resist the easy-out of “typing.Any”
TIL: 1.4 Million Jupyter Notebooks - Vincent Warmerdam
A Large-scale Study about Quality and Reproducibilty of Jupyter Notebooks - João Felipe Pimentel, Leonardo Murta, Vanessa Braganholo, and Juliana Freire
Researchers and analysts need something like notebooks, and notebooks are by and large super great for software engineering.
Warmerdam gives an overview of the paper by Pimentel et al., describing their massive effort of downloading 1,400,000 Jupyter notebooks from GitHub and trying to run them - only 24.1% were able to run at all, and only 4% generated the same results. Quoting Wamerdam quoting the paper:
The most common causes of failures were related to missing dependencies, the presence of hidden states and out-of-order executions, and data accessibility.
The paper itself also gives a number of suggestions for improving the quality of published notebooks.
I’m not really sure what to do about this. RStudio manages to do a nice job of having a really accessible interactive environment, and publish notebooks, while also having really strong off-ramps to getting code into libraries and under version control outside. Julia’s Pluto notebooks are reactive and thus at least avoid out-of-order execution. But I don’t know if there’s any set of small changes to something like Jupyter that can improve matters.
Research Data Management and Analysis
Janssen’s Data Ecosystem and the Role of Data Managers - Allison Proffitt, BioIT World
Maintaining an RCT leader job board gives one a good view on the state of the research computing team job market. I’ve written before, in a post that would be worth updating, that research data management has never been more employable - firms, especially but not exclusively in regulated areas like biomedical and finance, are hiring data management teams or even staffing entire data management organizations.
In this article, Proffitt writes about pharmaceutical company Janssen and the role of data managers in their R&D organization:
Janssen used to be a highly decentralized organization, Stojmirovic explained, with research data acquisition, storage, and analysis all centered on therapeutic area. The approach, he said, led to missing data management workflows, a variety of metadata schema, and inconsistent data curation and storage. As a result, data were not traceable up or downstream of an individual team and there was no way to search or compare data across therapeutic areas.
After years of this, they’ve created a central data ecosystem which has become key to the entire R&D organization, with data managers (along with, it sounds like, data engineers) stewarding the data and interacting with teams across the company:
“Data managers—and the Data Management team as a whole—are really at the center of this ecosystem, corresponding with everyone,” Stojmirovic explained.
Clickhouse Local - clickhouse
Well this could be handy - use SQL to query and analyze csvs, data through pipes, JSON file, parquet files, or other formats in a number of structures using Clickhouse Local, based on the Clickhouse OLAP SQL engine. (h/t to a friend of the newsletter)
Research Computing Systems
How BSD Authentication Works - Dante Catalfamo
I’ve worked a couple of places where research computing systems were Linux, but with a few *BSD systems in various places - for running particular stacks, or even as login nodes. Unlike Linux, or even most BSDs, OpenBSD doesn’t use PAM (Pluggable Authentication Module) for authentication, but has its own module system called BSD Authentication. Catalfamo walks us through how authentication modules work for OpenBSD.
Emerging Technologies and Practices
Extract the Ensembl gene catalog to simple tables - Daniel Himmelstein
Nice and practical example of GitHub Actions and GitHub Flat Data (#75) for research data - the github actions in this repo keep track of new releases of the Ensembl catalog, for a variety of species, and automatically generates summary gene tables using a series of SQL queries. The tables are output (in tsv and json format) in branches corresponding to species and release.
Incident Review and Postmortem Best Practices - Gergely Orosz
If your team is thinking of starting incident reviews & postmortems - which I recommend if relevant to your work - this is a good place to start. Orosz reports on a survey and discussions with 60+ teams doing incident responses, and finds that most have a pretty common pattern:
- An outage is detected
- An outage is declared
- The incident is being mitigated
- The incident has been mitigated
- Decompression period (often comparitively short)
- Incident analysis / post mortem / root cause analysis - often aiming for within 36-48 hours of the incident
- Incident review
- Action items tracked.
Current best practices seem to be:
- Encourage raising incidents, even when in doubt
- Be clear on roles during incidents
- Define severity levels ahead of time
- Have playbooks ready
- Make time for staff to work on the review
- Dig deep when looking into causes
- Share analysis fairly broadly
- Find or build tools to support incident handling
He then goes into some details of conversations with teams that are going beyond best practices - companies like Honeycomb who, providing tracing for other team’s stacks, have very high uptime requirements (they publicly released an outage report for a 5 min outage!) amongst others.
A long article but worth a read.
How to make your microservice architecture end-to-end confidential - Edgeless Systems
As research computing and data expands into web applications and sensitive data - like health, or social sciences - there’s interest in building sensitive complex applications.
It’s interesting to see the growing tool set for doing this. Here the Edgeless Systems team that writes open-source software gives an example of a simple but nontrivial Kubernetes application involving their open source stack - database, a secure service mesh, SGX-app control plane, and an application tool written in go with the their EGo library - to build a complete secure application.
Events: Conferences, Training
Academic Data Science Alliance Annual Meeting - 10 Nov, Free, Zoom
The ADSA is having its annual meeting, with the topic of “Data science for social impact in university-based programs”.
Random
You can play Doom on all sorts of things now, but what medium would really be the best pairing to the mindlessly violent, nuance-free game….. Oh, of course. You can now play Doom via twitter.
On a happier note: fluid simulations of ducklings or goslings swimming behind their mother show that, spaced correctly, the ducklings ride waves and experience negative wave drag, getting little pushes forward.
A full JSON parser in awk.
The perl community wants you to know it’s not dead, yet.
A teaching implementation of an x86-64 assembler.
An app store - from the 80s. That sold software on diskettes.
A programmatic CAD - really a programmatic interface to multiple CAD engines - with an IDE: cadhub.
Hmm… Calyix looks like it’s aiming to be LLVM for FPGAs, specifically supporting higher level languages to design compute accelerators.
Summary of the larger-than-usual updates to PostgreSQL in the v14 update.
I’m already getting used to github.dev for a vscode-in-browser-directly-atop-the-repo experience; now vscode.dev (at least on Chrome, and presumably Edge) lets you have the same zero-install vscode experience on local files and repos.
Searching for the original FORTRAN compiler.
There’s a feasible proposal to get rid of the Python global interpreter lock; it’s a good article to read if you want a thoughtful but accessible explanation of why the GIL is still there.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 140 jobs is, as ever, available on the job board.
Software Manager - Deepcell, Mountain View CA USA
Our company is focused on developing a new technology to improve biological research and, ultimately, health outcomes, across all of biology, enabling previously impossible applications. We deliver detailed breakdowns on a cell-by-cell basis, by using deep learning to classify cells based on their morphology. We've launched an internal product and are working to develop a commercially releasable version. Come join Deepcell and make a difference! As a software manager you will: coach and mentor a team of infrastructure software engineers. Recruit, interview, and hire a growing team of back-end, front-end, data engineering/ ML Ops, and full stack software engineers. Lead development of an automated deep learning model training application. Lead a team to enable unspecialized scientists to develop deep learning models on their data. Have team responsibility for implementation and bring-up of neural network classifier in hardware, a GPU hardware implementation. This position will require ~50% individual coding contributions initially.
Manager of Bioinformatic Analysis PlatformManager of Bioinformatic Analysis Platform in Single-Cell Sequencing in Single-Cell Sequencing - Sainte-Justine University Hospital Center, Montreal QC CA
The Single-cell Sequencing Bioinformatics Platform is a new scientific research platform from the Research Department of CHUSJ, an initiative dedicated to pediatric hemato-oncology, which will meet the needs of researchers in terms of innovative technology in order to develop strong local expertise in this rapidly expanding sector, by setting up an analysis platform. This platform is looking for a computational biologist highly qualified in bioinformatics, whose main mandate will be to support the researchers of the Research Center in their analyzes. The candidate must be able to carry out different types of analyzes related to the most common approaches in single cell genomics (expression, ATAC, V(D)J, CITE-seq, mutations, etc.) and in “bulk” approaches including but not limited to the exome, the transcriptome and the whole genome.
Project Manager Research Data - McGill University, Montreal QC CA
The position Project Manager, Research Data is based in the in the Office of the Vice-Principal, Research and Innovation (VP-RI). The position reports to the Director, Research Software and works in close collaboration with colleagues in the Digital Research Services (DRS) team, a cross-sectoral group with staff form the office of VP-RI, McGill Libraries, and IT (Information Technology) Services. The Project Manager, Research Data will be responsible for leading the planning and organizing to develop and implement an Institutional Strategy for Research Data Management which is a requirement of the main Federal funding agencies, the Tri-Agency (NSERC, CIHR, and SSHRC), as of 2022/23 (Tri-Agency RDM Policy).
Director, Platform Operations - Planet, remote
Planet designs, builds, and operates the largest constellation of imaging satellites in history. This constellation delivers an unprecedented dataset of empirical information via a revolutionary cloud-based platform to authoritative figures in commercial, environmental, and humanitarian sectors. We are both a space company and data company all rolled into one. Planet is looking for an exceptional Director of Platform Operations with a proven background in hiring, mentoring, and leading high-functioning engineering teams.
Digital Twin Manager - British Antarctic Survey, Cambridge UK
The British Antarctic Survey’s Artificial Intelligence Lab is looking for a Software Engineer, to join a team working on the initial stages of a Digital Twin of the Royal Research Ship Sir David Attenborough. The initial focus will be to work with the marine engineers and data managers responsible for instrumentation on board the ship, to identify the available data streams and storage, and make them accessible to the AI researchers.
Quantum Strategic Initiative Development Lead - University of Toronto, Toronto ON CA
The Center for Quantum Information and Quantum Control (CQIQC) at the University of Toronto (U of T) supports quantum technologies research at the University of Toronto. CQIQC aims to establish U of T as the premier quantum research institute in Canada and a leader in the world by fostering and facilitating high-impact interdisciplinary research, innovative training programs, and national and international academic andindustry partnerships. CQIQC includes about 25 faculty members and associate members, roughly 2/3 in the Faculty of Arts & Science and 1/3 from the Faculty of Applied Science and Engineering. You will work closely with the Director of the CQIQC and its faculty members to develop and execute a strategic plan to enhance and raise the profile of quantum research at UofT with government, industry, and donors.
Associate Director, Research Informatics Systems Engineering - Bristol Myers Squibb, Cambridge MA USA
The Associate Director, Research Informatics Systems Engineering will be responsible for executing an ambitious data strategy to support BMS’s Informatics and Predictive Science’s data needs in the Research and Early Development space. The successful candidate will bridge strategy, scientific, technology and design teams to shape and build the future of BMS’ Research Data Commons. The role will provide leadership guidance and accountability for the direction of our data commons applications. This role will fit someone that thinks strategically about where the latest hardware, software and cloud technologies fit within pharmaceutical research data contexts.
Senior/Principal Engineer, Research Informatics - Tango Therapeutics, Cambridge MA USA
Tango is an oncology biotechnology company focused on exploiting synthetic lethal interactions to discover and develop new breakthrough cancer therapies. Reporting into the Head of Research Informatics, the Senior/Principal Engineer of Research Informatics will lead development and deployment of applications for drug discovery by internal users. This is a tremendous growth opportunity for an experienced, technically strong, and motivated software engineer to be a product owner of our data warehouse. At Tango, our efficient software development cycle reaches the end users (Scientists) very expediently, helping to accelerate the discovery of precision, anti-cancer drugs.
Machine Learning Engineer, Tech Lead - Narrative Science, Remote USA
This role's primary focus is on Data Science and Machine Learning Modeling techniques. You will be expected to deliver moderate and advanced models that will be used in production to support multiple different customer goals. Being knowledgeable of MLOPs best practices and working with engineers to build out architecture that takes models through development and training to production deployment will be a focus. As a Tech Lead you will be working with our product and development teams to improve the analysis and content in our stories. In this role, you will have a solid mix of greenfield projects and platform improvements, along with plenty of opportunities to prototype and build lasting features and enhancements. You will be expected to grow and mentor junior developers as they work with you to deliver enterprise-grade solutions. You will work directly with Engineering leaders as you drive out designs and break down ambiguous customer problems into actionable tickets for the team to deliver
Engineering Manager, Bioinformatics - Invitae, San Francisco CA or Remote USA
The cfDNA bioinformatics team’s mission is the development of scalable, high-quality noninvasive prenatal screening (NIPS) products that provide precise and actionable information to pregnant women. We are involved in the entire lifecycle of product development, from proof-of-concept R&D, to scaling and tech transfer, to validation and launch. As a part of our team, you will be a key contributor to developing innovative instruments, assays, informatics, and products that will make improved reproductive healthcare accessible to women around the world. We are looking for an individual to join our group that can provide team leadership, articulating and aligning around a team vision, managing the development of complex products to be used at scale, and further developing and supporting the growth and development of outstanding team members.
Senior Manager Research Computing & Cyberinfrastructure - Montana State University, Bozeman MN USA
The Lead for Research Cyberinfrastructure will collaborate closely with research faculty across Montana State University; oversee research cyberinfrastructure efforts at MSU, leveraging large-scale data and computational systems; supporting custom software, hardware, and workflows; facilitating engagements between researchers and specialists; and working with administration and community partners to provide sustainable support structures for all aspects of research IT needs. Partners with multiple IT teams, campus-wide research groups, and grant-funded initiatives. Assists with strategic planning and designs research cyberinfrastructure systems.
Research Computing Manager - George Washington University, Washington DC USA
This is a key leadership position with responsibility for leading a team of High Performance Computing and Cyberinfrastructure professionals supporting GWs research technology services. This role works closely with the Director, Research Technology Services in everything from day to day operations and system design to strategic decisions around the deployment of and investments in high performance computing and advanced cyberinfrastructure solutions to enable and support current and future research needs. Working closely with the Director, this position develops and implements team objectives, metrics, policies and procedures, disaster recovery plans, operational and project budgets, and resource allocation plans. As a leader in research technology services, this position builds and maintains the relationship with faculty researchers and technical leads as well as building strong cross-disciplinary and collaborative partnerships with other related teams around networking, cloud and platform services and cybersecurity.