Research Computing Teams Link Roundup, 30 July 2021
Hi there:
We’re going into a long weekend here in Toronto - the second-to-last one of the summer - and it’s very much needed. We have a number of pretty ambitious efforts we’re working on, and it’s been a long year already. I hope that you and your team are taking care of yourself, and that you in particular as manager or team lead are taking some time off. There’s an article below on the importance - both for you and your team - of you taking some time off, to recharge yourself and give your team the opportunity to step up.
Also, there was some interest in the AWS ARM HPC hackathon that was in the roundup last week. I know that a number of readers are, like me, in the genomics space right now. Let me know if you think you or your team might be interested in participating in a similar week-long hackathon for ARM specifically around genomics; as always, just hit “reply” if you get this in your email, or email jonathan@researchcomputingteams.org if you want to talk about that or about anything that comes up in the newsletter.
And on to the roundup:
Managing Teams
When Do We Actually Need to Meet in Person? - Rae Ringel
In the past 17 months, having to work and communicate in new ways, we’ve learned to be thoughtful in planning how to communicate and work together. With teams starting to be able to meet in person, there’s no need to discard that thoughtfulness! What the right approach for a meeting will be will depend on the the goals and purpose of the meeting.
Here Ringel offers a simple framework for thinking about when a meeting benefits from being in-person. Complex goals and building/maintaining relationships push towards favouring in-person meetings, while simple goals and working on tasks favour hybrid or asynchronous meetings. (Incidentally, those are also the meetings where very strong meeting facilitation skills are the most necessary).
Relatedly, a lot of managers are starting to think of ice-breaker/team-building activities to get people used to working together in person again, particularly when new members have joined the team while it was purely distributed. Lots of people are suggesting games like Zip Zap Boing - what sorts of things have people tried?
Writing Better Job Ads - Eli Weinstock-Herman
This is a nice lengthy post on writing job ads. And given what I see scanning job ads for research computing team managers, the advice is needed!
There’s too much for me to completely summarize, but some key points
A Job Ad is a Landing Page… A job ad is marketing. An advertisement.
I can’t agree with this enough. Even if what you have to post on your institutional jobs website is constrained to have to have all kinds of meaningless boilerplate and a dry list of job requirements - and at universities and hospitals there’s definitely some of that - there’s little to nothing stoping you from posting a job ad elsewhere, on your team’s website or on external job boards. You can direct people to the dry-as-dust “official” posting to apply.
What’s worse, most of the stuff we’re tend to put into job descriptions and job ads are… well:
[…] I’m more and more looking at “5+ years of (skill)” as an intellectually lazy statement. […] I wrote a job ad for a fungible human gear.
God yes. Even if “5 years of C++” (or whatever) was a meaningful measure, like any given 12-month period of experience working with C++ was interchangeable, it’s an input. A person with that laundry list of inputs might, if you’ve done your job well, be able to be a capable team member, but what you care about are the outputs - the results the new team member helps the team achieve. And other combinations of inputs might help the new team member accomplish those things just as well or better.
Weinstock-Herman makes the following suggestions for a process:
- Start with the end in mind (always a good focus)
- Create the core of the job ad first:
- What will the candidate achieve?
- What are expectations from a team member in this role?
- What are the specific tools/processes in use
- What does the team do, why is it interesting, what’s the impact?
- What does compensation, benefits look like?
- Boil it down to a pitch
- Work on tone, length, engagement
- Test, test, test
- Post thoughtfully
We have huge advantages in research for hiring. We’re helping advance the frontier of human knowledge. We’re doing meaningful work, not trying to drive up click rates on advertisements. We offer the possibility of going between multiple quite different projects, learning both new tech and new science along the way, and the possibility of outsized impact. Why do so many of our job ads read like working in our field is a chore, that could easily be done by anyone with 3 years experience in linux and 4 years in “a scripting language”?
Managing Your Own Career
Questions for potential employers - Carter Baxter
My questions for prospective employers (Director/VP roles) - Jacob Kaplan-Moss
We do a lot of discussion of hiring from the hiring manager side of the table in the newsletter, but when thinking of our own career prospects it’s worth considering what we should ask when we’re the candidate, too.
Asking questions about how the position came to be free, the goals of the organization, the goals of the position, what six-month success looks like, how much autonomy the role has, travel requirements - these are all important things to know before you take a job offer.
Out of Office Alert: Managers Need Vacations Too! - Samantha Rae Ayoub, Fellow
It’s important to take time off to recharge, even though as managers we’re often not great at this. It’s a little too easy to convince ourselves that our firm hand on the till is too important to completely let go… and that’s a self-fulfilling prophesy. You’re robbing yourself of needed R&R, and your team members of the chance to step up in your absence, by not completely checking out. And the more often you completely step away, the easier it gets for you and the team
Ayoub goes through ten steps to go through - the key ones to my mind are:
- Prep a “While I’m Away” list
- Put one person in charge
- Ask your team to keep a collaborative set of notes
- Turn on your Out of Office Alert
- Do not reply to your email or voicemails
- Carve out 2 hours in the morning when you get back to get caught up
One really clever suggestion I don’t know that I’ve read before is, in to make that “while I’m away” document a shared writable document and have it not only be a checklist of things to do but somewhere where people keep notes of what was done, what happened at the meeting with Prof X, etc - so you come back to a briefing document to catch you up.
The other really crucial thing is to put one person in charge while you’re away - or at the very least to have a very clear decision making process. Decisions will have to be made in your absence, and the team needs to know how to make them. You can rotate between people, but it should be someone who has a pretty good big-picture view of the work of the team.
Cool Research Computing Projects
CLIMB-COVID: continuous integration supporting decentralised sequencing for SARS-CoV-2 genomic surveillance - Samuel M. Nicholls et al.
How UK Scientists Developed Transformative, HPC-Powered Coronavirus Sequencing System - HPC Wire
The UK has lead the world in sequencing and surveilling the evolution of SaRS-CoV-2, the virus that causes COVID-19; and roughly a quarter of the world’s SaRS-CoV-2 genomics data has passed through the COVID-19 Genomics UK’s (COG-UK) CLIMB-COVID infrastructure, which is described in this paper and HPC Wire article.
While COG-UK’s sequencing efforts are distributed, the Cloud Infrastructure for Microbial Bioinformatics (CLIMB) takes a hub model for integrating the genomic and epidemiological data.
The heart of CLIMB-COVID is Majora, a Django-based application with both a web and command-line based UIs, but they had to also create simple sample-naming schemes, nextflow pipelines, and MQTT messages communicating between pipelines and Majora, as well as online phylogenetics analysis, cluster identification, and visualisation.
Research Software Development
How Herbie Happened - Pavel Panchekha
Herbie is an automatic rewriter of numerical expressions that attempts to find equivalent but more numerically accurate expressions, available for local use but also through a web interface. I don’t think it’s as well known in technical computing circles as it ought to be; I only learned of it as my own use of methods that required careful numerical analysis was winding down, even though it’s been in development since 2013 or so.
Panchekha’s article - an older one but one which is circulating again - gives an overview of the story of Herbie development, beginning when he was a grad student. It originally started with a recognition that not much CS or programming language work was being done on the quite important task of improving numerical accuracy, and so even modest progress would be important. The lessons he took from the Herbie effort are:
- Tackle ambitious problems
- Know how you work best - he doesn’t work well alone
- Good benchmarks guide research
- Generating reports from search processes is great for debugging
- You never know what will be important
- Some small things matter a lot
- Don’t submit papers too early
- If you keep rewriting something, think deeper
- Make a demo
One of the the great things about this article is the bracing honesty with which he writes about false starts and dead ends. Also, the online demo, which was originally meant more just to have something interesting on the project’s web page, was very important both for communicating what Herbie does and getting feedback from users. All in all this is a nice behind-the-scenes writeup of a research computing project.
Research Data Management and Analysis
Life on the diagonal — adventures in 2-D time - Luke Plant
It’s pretty common in research computing to not just manage data but the history of the data - how it’s changed over time. There are solutions like temporal tables in SQL:2011 (or other approaches, lumped together under the term of art “slowly changing dimension”) to be able to view what the data values looked like at some earlier time/version. In addition, there’s now a number of newer “git, for data” solutions which make it easy for people to collaboratively update the data while maintaining history. It’s all very cool stuff, and if those tools match any of your use cases, you should absolutely use them, it’s not something you have to implement yourself any more.
But when the data that’s being updated at different times is itself a timeline, all of this can get kind of hard to think about. Plant walks us through a mental model for thinking of this, two dimensional time - the event time (when something actually happened) and the knowledge time (when it was recorded in the database), and makes an analogy that we experience life on the diagonal of this 2-d time; when something happens and when we learn of them are roughly equivalent in importance.
The fundamental reason that this is hard to think about is that “time” means two things, which is why the introduction of“event time” vs “knowledge time” as terms is very valuable. Incidentally, I’ve had two hours of meetings this month trying to come to a technical solution for a data modelling problem, only for us to realize as we were hanging up on the second meeting that we were using “dataset” to mean two slightly different things and that was causing the problem. Naming things is important!
Postgres Full-Text Search: A Search Engine in a Database - Kat Batuigas
While the biggest story of databases over the past 15 years has been the divergence and specialization into a diverse range of capabilities, the second biggest story has been partial convergence - NoSQL developing partial ACID capabilities and stalwart databases like PostgreSQL and MariaDB developing capability for sophisticated JSON handling and text indexing.
If your use case is principally full-text search you’d be better off with Elasticsearch or moral equivalent, of course, or if all you needed were JSON objects you’d go with MongoDB, but increasingly if you need some full text capability or some unstructured JSON support in something that’s already using a relational database, there’s less and less reason to introduce another data store. In this article, Batuigas walks us through what full-text indexing can and can’t do in Postgres.
Digital Humanities Project Charters and Data Management Plans - Marie Léger-St-Jean
Léger-St-Jean posted on twitter her breakdown of (so far) four different project charters and one data management plan for digital humanities projects, to inform those planning similar documents for other projects.
Emerging Technologies and Practices
biowasm - Robert Aboukhalil
Genome Ribbon - Maria Nattestad, Chen-Shan Chin, Michael C. Schatz
I know I’ve been on a bit of a Web Assembly kick here lately and that many seem odd to readers coming from (say) HPC. But here’s a lovely example of a package of real bioinformatics tools and libraries (biowasm) distributed as Web Assembly packages ready to be run interactively in the browser - and with Web Workers, being able to load files from the local file system. And Genome Ribbon is an early example of the kind of complex applications that can be built this way - visualization of complex genomic rearrangements in the browser, without any of the data ever leaving your computer.
A Linux Kernel Implementation of the Homa Transport Protocol - John Ousterhout, USENIX ATC ’21
TCP/IP is an amazing technological achievement and makes the internet possible. Wide area networks were how the internet began, but individual data centres with tens or hundreds of thousands of nodes very much were not, and TCP isn’t great within a datacenter or large cluster. Google’s described their user-space TCP replacement between services, Snap, which they’ve been using since 2016.
In this paper and slide deck, Ousterhout describes their linux kernel implementation of the Homa protocol, which the developers feel addresses the many failings of TCP within a datacenter:
- Connection oriented - high space and time overheads
- Stream oriented - but most within-datacentre communications is more like remote procedure calls, and stream oriented approaches cause head-of-line blocking
- Fair sharing of bandwidth increases latency for short messages
- Sender-driven congestion control requires buffers to detect congestion
- In-order packet delivery makes load balancing very difficult
It’s interesting to compare this approach with that of AWS’s Scalable Reliable Datagram, which we covered in #80, particularly the concern with congestion and tail latency. For all workloads tested, all message sizes, and for a variety of other network traffic happening, Homa had one-to-two orders of magnitude advantage for P99 latency, and 2.7-7.5x improvements in median latency. Homa focuses very much on latency, with receiver-driven congestion control and prioritizes messages based on the shortest remaining processing time first.
Calls for Submissions
Several more SC21 workshops have calls:
Fifth International Workshop on Software Correctness for HPC Applications (Correctness 2021) - Papers due 9 Aug
16th Workshop on Workflows in Support of Large-Scale Science (WORKS21) - Papers due 15 Aug
ScalA21: 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems - Papers due 27 Aug
MCHPC‘21: Workshop on Memory Centric High Performance Computing - Submissions due 31 Aug
IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT) - 6-9 Dec, Leicester UK, papers due 15 Aug
Topics include Big Data Science, Infrastructure and Platforms, Applications, Visualizaiton, and Trends and Challenges.
8th International Workshop on Large-scale HPC Application Modernization (LHAM) - Abstracts due 27 Aug, papers due 1 Sept
From the call:
The International Workshop on Large-scale HPC Application Modernization offers an opportunity to share practices and experiences of modernizing practical HPC application codes, and also discuss ideas and future directions for supporting the modernization.
Topics include
- Programming models, languages and frameworks for facilitating HPC software evolution and refactoring.
- Algorithms and implementation methodologies for future-generation computing systems, including manycores and accelerators (GPUs, Xeon Phi, etc).Automatic performance tuning techniques, runtime systems and domain-specific languages for hiding the complexity of underlying system architectures.
- Practices and experiences on porting of legacy applications and libraries.
Call for Papers - [Electronics] Special Issue on Program Analysis and Optimizing Compilers for High-Performance Computing - Papers Due 1 Sept
From the call:
The recent technical trend toward extreme heterogeneity in processors, accelerators, memory hierarchies, on-chip interconnect networks, storage, etc., makes current and future computing systems more complex and diverse. This technical trend exposes significant challenges in programming and optimizing applications onto heterogeneous systems. The purpose of this Special Issue is to bring together application developers, compilers and other tool developers, and researchers working on various program analysis and performance optimization techniques for an exchange of experiences and new approaches to achieve performance portability in the era of extremely heterogeneous computing.
Topics include:
- Program analysis tools and methodologies to understand program behavior and resource requirements;
- Efficient profiling and instrumentation techniques to characterize applications and target systems;
- Code generation, translation, transformation, and optimization techniques to achieve performance portability;
- Optimizing compiler design, practice, and experience;
- Methodologies for performance engineering
Events: Conferences, Training
Software Engineering Challenges and Best Practices for Multi-Institutional Scientific Software Development, Keith Beattie LBNL - 4 Aug, 1pm EDT, Free registration required
Part of the best practices for HPC Software Development Webinar series:
In this webinar we present the challenges faced in leading the development of scientific software across a distributed, multi-institutional team of contributors, and we describe a set of best-practices we have found to be effective in producing impactful and trustworthy scientific software.
National Center for Women & Information Technology - US-RSE DEI-WG Speaker Series - 12 Aug, 4pm ET
This presentation explores why diversity matters to innovation, how implicit biases play out in technical work cultures, and what actions individuals can take to create more inclusive technical cultures. Attendees will learn key features of strategic, research-based approaches to address the biases and barriers that limit diverse participation in computing.
5th EAGE Workshop on High Performance Computing for Upstream - Heterogeneous HPC: Challenges, Current and Future Trends, 6-8 Sept , €175 - 370
“Upstream” here, for those not in the industry, means exploration for oil & gas, but a lot of the talks here have pretty broad HPC applicability - matrix-free optimizaiton, seismic wave simulation, accelerated computing, HPC modernization for cloud, workload management, DPC++, etc.
17th Int’l Workshop on OpenMP - 13-16 Sept, Zoom, Univ of Bristol, £70/£90
The first day is OpenMPCon, focusing on vendors (including LLVM) and updated supports for OpenMP; the next three days focus on using OpenMP (such as a report on an OpenMP Hackathon, or building a portable GPU runtime atop OpenMP) and extending OpenMP (hardware transactional memory, extending the tasking model).
Random
Interested in Digital Signal Processing? Steven W. Smith has a huge book “The Scientist’s and Engineer’s Guide to Digital Signal Processing” available for free on line.
Also available as a pre-production draft of a book: small summaries for big data, covering sketches of large datasets.
A lovely science communication example of buoyancy forces, stability, and ship design with interactive diagrams and illustrations.
An overview of netcat and variants.
121 questions for managers and ICs for one-on-ones.
If you’ve been using google drive for a while, links shared earlier than 2017 will break shortly.
LaTeX and GFM Markdown table generators.
A deep dive into how python imports work.
A more useful “how to think about Git” tutorial that tries to actually convey meaning rather than just trying to sound smart.
In praise of “baking data in” to application or software deployments.
ConnectorX is a new package that loads DB data into pandas dataframes quickly and with little memory overhead.
Before containers became ubiquitous I was pretty sure unikernels were going to take over (I really liked the Blue Gene architectures) and I still think they have a lot of promise. Of course, I thought the WWW was a fad and that gopher was going to be the future, too, so…. anyway, there’s a new unikernel ecosystem out now, nanos.
An overview of HTTP security headers.
Getting started with a bullet journal.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 179 jobs is, as ever, available on the job board.
Scientific Data Architect - Berkeley Lab, Berkeley CA USA
Assure that NERSC systems provide a high performing and usable software stack for data analytics, machine learning, data management, or data transfer workloads. Provide expert guidance on user support, including one-on-one support, training, and documentation on data and analytics services at NERSC. Stay abreast of new and emerging data analytics and AI research and trends, through R&D collaborations, literature, workshops, and conferences; translate these new directions into actionable opportunities for NERSC or NERSC users. Mentor early career staff members in data analytics, data transfer, data management, and AI/ML techniques.
Programme Delivery Manager- Data Collection and Ingestion - UK Government, Office of National Statistics, Newport or Fareham or WFH UK
As a programme delivery manager, you will be accountable for the delivery of survey tooling and integrations that are being delivered by multiple teams and you will understand the high technical and political risk of these. You will manage dependencies of varying complexity, potentially planning and feeding into larger programmes and portfolios. You will be experienced in removing blockers and managing risks, commercials, budgets and people. You will get the opportunity to be part of our evolution, be a change maker as well as working with a highly capable digitally skilled workforce.
Manager, Data Science, NLP - Thompson Reuters, London UK
As a Manager, Data Science in Labs, you will be part of a global interdisciplinary team of experts. We hire specialists across a variety of AI research areas, as well as Engineering and Design, to drive the company’s digital transformation. TR Labs is known for consistently delivering Artificial Intelligence projects that advance the state of the art in support of high growth products and serve Thomson Reuters customers in new and exciting ways.
Senior Product Manager - Data Science - Data Cloud - Veeva Systems, remote
As the Senior Product Manager for Data Science on the Veeva Data Cloud team, you will own the design and execution for major components of our Projected Data Products. This is a great opportunity for someone who is excited about product design from the ground up and working to solve complex problems to deliver reliable and accurate projected data to customers.
Lead Data Scientist - Northeastern University, Boston MA USA
To execute on this vision, we seek a seasoned data scientist who will be responsible for conducting labor market research, modeling complex problems, and discovering insights through the use of advanced statistical, algorithmic, data mining and visualization techniques in order to contribute to the selection of market relevant learning products and high quality learning experiences with meaningful outcomes – for both the learner as well as our network of partners.
Senior Research Scientist, Computational Chemistry - NVIDIA, various or remote NY or CA USA
NVIDIA is using the power of GPU computing and computational chemistry to accelerate digital biology. We are seeking hardworking individuals to help us realize our mission. As a Sr. Computational Chemistry Researcher, you will join a team passionate about research and development using molecular simulation and machine learning. Together, we will advance NVIDIA’s capacity to build digital biology solutions. You’ll build large scale molecular dynamics simulations. Collaborate with multiple AI research, high performance computing, and digital biology teams.
Senior Deep Learning Bioinformatics Scientist - NVIDIA, various or remote NY or CA USA
As a deep learning bioinformatics scientist, you will join a team passionate about bioinformatics, AI, and high performance computing. This position provides the opportunity to research, implement, productize, and deliver deep learning based algorithms for a wide range of biotechnologies. Together, we will advance NVIDIA’s capacity to build digital biology solutions. Prototype and build data processing pipelines and deep learning algorithms for biological data. Drive the testing and maintenance of the algorithms and software modules.
Quantum Computing Specialist - CMC Microsystems, Sherbrooke QC CA
The successful candidate will oversee the development of quantum algorithms that can be processed by the IBM Q computing platform, as well as the application of these algorithms to solve specific research problems for our members. The individual will have to demonstrate the ability to connect scientific solutions with academic and industrial challenges in various fields: physics, machine learning, chemistry, finance, etc. She/he must therefore be able to create good working relationships with researchers from different technical fields, as well as with scientists and managers from industrial sectors. Demonstrate leadership in project management and supervision of students and interns.
Lead of Data Science and Machine Learning - Respira Labs, Mountain View CA USA
As Lead of Data Science, you will oversee the development of algorithms that will enable detection, prediction, and prevention of respiratory disease. Your algorithms will help decipher and interpret patterns in our acoustic and resonance data, and map them to physiological variables. You will oversee the design and implementation of data processing algorithms for raw data running in embedded systems, mobile applications and back-end servers. You will work directly with the back-end team to design the data storage and processing infrastructure, including provisions for health-related data, and enabling data processing pipelines for classical data analysis and AI paradigms. You will work directly with the front-end team to inform data presentation strategies and UX designs for patients and medical professionals. You will leverage your deep understanding of both data science and physiology to bridge communication between our engineering and clinical teams.