A reader from a U15 research institution [Canadian research-intensive universities; think Russell Group in the UK, or R1 universities in the US] writes in to describe expanding their research computing team with unconventional roles:
First, a grant advisor, specifically to assist PIs in writing sane tech inclusions in their grants. You may have reviewed grant proposals where the medical science, particle physics, quantum chemistry, etc. is very clear, and then the explanation of the computational aspects and the equipment justification sounds like Dilbert’s boss wrote it. That is precisely what this position is intended to improve, but also sitting on internal panels that judge Innovation Fund proposal maturities before they’re allowed to apply, etc. Second, a Communities of Interest Coordinator, who will foster and support research communities of like-minded graduate students, PDFs, etc. around research fields making use of computation—bioinformatics, AI, digital humanities—or around digital research tools—R, Julia, MATLAB, Gaussian, etc. By supporting communities of interest, these groups can become shared knowledge hubs, where newbies can find guidance or “the ropes” and experienced but stuck researchers can find inspiration or “an ear” that might enable them to unstick. Both positions have been filled internally and start in December. More traditional ARC job descriptions are being written up now as part of a further expansion.
I love this! It’s a long-standing tenet of this newsletter that research computing is much more than just technology. It’s teams, it’s communities, it’s product management - it’s people. Connecting researchers and their more directly to the computation, software, and data resources that can advance their work — whether that means in grant writing or capacity building within a practitioner community — is very much part of the our broad remit in research computing and data.
One of the things I wrestle with in this newsletter is how to make things easier for readers. With the discipline defined by the community of readers being so broad, the range (and volume) of material that gets covered here every week is… well, it’s a lot. On the other hand, I don’t know how to distill or partition things further without losing out on these very important dimensions of our profession. As always, if you have any ideas or suggestions, or want to share your own teams accomplishments and stories - or even just want to talk about research computing teams - hit “reply” or email me at [email protected]
For now, the roundup!
Don’t Soften Feedback - Lara Hogan
Reader, I’m not proud to say that I’m actually pretty rubbish at this. I tend to very much want to soften negative feedback, which is easier for me but is in the long term worse for the team member and the team as a whole.
What’s worse, people are not uniformly affected by this. Women, Black, Asian, and Hispanic team members tend to get softer and less-actionable feedback, especially but not only from male managers, which holds back their growth - how can they grow effectively if they aren’t being told what to work on?
Hogan here tells us things we’ve talked about before, but we - at least I - need periodic reminders of. Make the feedback easier and more constructive to give by linking it to desirable outcomes for the team, make the feedback succinct and to the point, and distinguish facts from assumptions. There are also cautions here about peer feedback and potential bias.
The Best Leaders are Feedback Magnets — Here’s How to Become One - Shivani Berry
Relatedly, if we want to grow, we need good, actionable, feedback. In our industry, a lot of our directors are pretty hands off, which certainly has advantages but means we don’t get the guidance we’d benefit from. Berry has two broad categories of recommendations for how to get more feedback and accelerate your growth:
Building confidence in a decision - Martin Tingley with Wenjing Zheng, Simon Ejdemyr, Stephanie Lane, Michael Lindon, and Colin McFarland, Netflix Technology Blog
I honestly believe that having a science background can be a huge advantage for leaders and managers, if we engage that part of our training in making management decisions. Data collection, experimentation, understanding that we don’t know everything, accepting that an approach has been disproven - these are all pretty fundamental skills, but it’s sometimes easy to compartmentalize them, to be things we only use when studying something as part of our work but not for studying how we work.
This Netflix tech blog describes making product decisions at Netflix, using a data- and experimentation-based approach that should be extremely familiar to those of us in the sciences. For that community, we have had these skills drilled into us for years, practiced and honed them - our problem is not that we are too much still scientists but often too little, and don’t take the same rigorous professional approach to managing teams and productsthat we did in our academic career.
Real-time alert system heralds new era in fast radio burst research - McGill Science Blog
Some of the most exciting research computing and data projects today really demonstrate the breadth of RCD. They tie together software, systems, and data, in a way that doesn’t make any sense to break up into silos.
CHIME, a really cool radio telescope, constantly scans the sky (well, the sky constantly scans over it) and it maps 21cm emission from hydrogen gas from the early Universe to better understand cosmological structure.
But of course if you’re looking at large swaths of the sky for a long time in radio, you’re going to see a lot of other things too. One of them is fast radio bursts, transient (very transient - a few milliseconds) powerful radio bursts, mostly of seemingly extragalactic but unknown origin, first detected in 2006. Some pulse, most apparently don’t. 500+ have been detected, but they’re still a mystery.
CHIME has “seen” a number of these events, but to actually identify what cause them, you’d need more than just the single radio signal - you’d want to point a number of other kinds of telescopes at the event as fast as possible to see if you can see other signatures of a big transient event happening there. They’re thought to possibly be merging black holes or neutron stars, supernovae, or even more exotic events (Dark-matter induced collapse of pulsars! Decays of axion clusters! Absent data, the mind will wander to all sorts of weird and wonderful possibilities).
The problem? CHIMEs data pipeline, designed for cosmology, wasn’t really targetting looking for these things. It could detect them, sure enough - months after the fact, way too slow to arrange follow-up observations.
This press release from McGill describes the very cool work of a team, including Andrew Zwaniga and Emily Petroff, in building the CHIME/FRB VOEvent Service, which issues alerts in a standard format (VOEvent) within a minute of the event to subscribers who can arrange real-time followup observation. Since CHIME is now expected to see ~1000 of these a year, there’s an excellent chance that an event can be soon caught “in the act”, helping understand what is causing this phenomenon.
Needless to say, to go from months to a minute requires aligning the system, data processing, and alerting software carefully. It’s a super cool project, and hats off to the team.
The one code review method to rule them all - Jonathan Hall
Hall describes the pros and cons of pair programming vs pull requests for code review - either serve the basic needs of knowledge transfer and another pair of eyes, but the benefits and costs are different; PRs allow scaling to more pairs of eyes, leave documentation automatically, and don’t require synchronization, while pair programming is real-time, fast, and provides more natural opportunities for mentoring.
Which is better? The frustrating thing for technical experts for us is that there’s no objectively best answer, it depends entirely on the needs of the people system that is your team and organization. Either is great! Teams and orgs succeed using either of them - and it doesn’t even have to be 100% one or the other. The important thing is choosing which meets your teams needs, setting clear expectations, and revisiting periodically.
Reframing tech debt - Leemay Nassery, Increment
A Rubric for Evaluating Team Members’ Contributions to a Maintainable Code Base - Chelsea Troy
Once a software product is high enough on the technical readiness ladder - once it’s actually being used by communities - technical debt becomes an issue. The problem isn’t awareness - we all know code should be maintainable and well documented, etc. - the issue is the people systems to support individual developers in deciding to put time into activities that support that.
Nassery describes a rebranding that might help a lot in some circumstances research software teams find themselves in - rather than talking about reducing or eliminating tech debt, talking about building tech wealth, and what that new asset allows - faster development, fewer bugs, etc.
This sounds like a goofy rhetorical game, but I think there’s some value in this. Particularly in our line of work, a lot of people you’d be pitching the wealth-building to were involved in the tech debt in the first place either technically or managerially. By not talking about “debt”, which sounds like fixing bad stuff that happened in the past, it makes it easier to support efforts.
If Nassery’s article is more about getting buy in, Troy’s article is more about implementing activities to get it done. The argument is simple and hard to disagree with - if you want the code to be increasingly maintainable and documented, you need to build those code stewardship priorities into the incentive structure. That means reviewing code based on criteria which advance the maintainable code - flexibility, documentation, discoverability and transfer of knowledge and context, but also in evaluating and giving feedback to the developers on those criteria.
A vision for extensibility to GPU & distributed support for SciPy, scikit-learn, scikit-image and beyond - Ivan Yashchuk, Ralf Gommers, Quantsight Labs
The Quantsight Labs team has been doing a lot of great work on data analysis software, particularly but not exclusively in the python data ecosystem.
This article is a nice example of a architecture design document for a substantial proposed change to an ecosystem - enough concrete detail to get feedback from stakeholders/sponsors (such as, in this case, AMD) and a sense of feasibility from developers, as well as being a good quick overview of the current state of array handling in the python data ecosystem. The proposal is to augment SciPy with a backend and dispatch system, so that it isn’t relying exclusively on numpy for its work.
Best Public Datasets for Public Health Data Science Projects - Andrea Hobby
There’s obviously a lot more interest and motivation for doing work with public health data these days. Whether it’s for doing data work or for training courses, Hobby provides links to some links of health data resources that are available.
HPC and the Lab Manager - Carlo Graziani
We normally talk about data management in the context of experimental or observational data, but the generation of large simulation data sets also require diligent management of the “data” (simulation outputs) and the metadata about the simulations. You see some of this happening explicitly in large climate simulation data set collaborations, for instance, but it’s less often called out in other areas.
Graziani writes about his experience (at a centre I worked at a couple years before he arrived - go team FLASH) being part of large simulation campaigns, and the need for some kind of Operation Manager type role, the equivalent of a data czar or lab manager in observational or experimental labs.
AWS Data Wrangler - AWS
It’s been a long time, but the importance of effective glue code connecting data between services seems like it’s finally becoming more apparent to a wider community. This week I learned that AWS has an open source tool Data Wrangler which seems like it might be useful more broadly than just within AWS, and that connects pandas data frames with any of a number of data sources. I’m a bit skeptical about the product management of the Apache Arrow project, but it is that effort that powers tools like this.
From the github page:
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).
Downloading Satellite Images Made “Easy” - Aaron Geller, Northwestern Univ. Research Computing Services blog
An RSE from Northwestern, Geller walks us through the process of downloading satellite data from Earth Engine using the python API.
Interesting if you’re getting started in GIS, but also a reminder that in research computing we’re constantly teaching ourselves to do things that there’s not great documentation for, and simply putting that up on your team’s blog once you’ve done it is a pretty simple and effective way to contribute to the broader community (and to make sure you remember how to do it the next time!)
Using the Slurm REST API to integrate with distributed architectures on AWS - Josiah Bjorgaard, AWS HPC Blog
AWS’s parallel cluster 3 (like other providers) now has a REST API, more readily allowing you to control clusters programmatically, which is cool - but Slurm has a REST API too, allowing you to create, run, and control jobs within the cluster. Slurm’s REST API is a little nascent and not meant for external use, but with some extra components this lets you spin up clusters, fire off jobs, and monitor the jobs, all with REST APIs.
I’ve got this under Systems because I think a lot of teams could make interesting tooling based on Slurm’s rest APIs, or similar capabilities, but I haven’t seen very much other than some monitoring capabilties. On the other hand, I’ve been out of this space for some years. What teams out there have built cool job submission, or job notification tooling on top of functionality like Slurm’s API?
Security scanners for Python and Docker: from code to dependencies - Itamar Turner-Trauring
With yet more malicious code being found in packages - npm last issue, pypi this one - making sure your software is secure from the container image down to all the dependencies becomes vital. Here Turner-Trauring walks us through a number of scanners - Bandit, Safety, Jake and Trivy - for scanning our code, our dependencies, and our docker base images. It’s sad that it’s come to this, but here we are.
All 229(!) videos from this event earlier this year are up - many of them will have some relevance to this community, whether it’s on general topics like monitoring, observability, and security, or a few (like this one) on specifically HPC topics.
UK National GPU Hackathon - 28 Feb, 7-9 Mar; Application Deadline 10 Jan. Free to accepted participants; online
Looks like a fun event - mentored hackathon with NVIDIA and OpenACC participating.
CNCF Research End User Group: HPC/HTC End User Landscape - 1 Dec, Free
The cloud native computing foundation Researcher end-user group has an overview of HPC and HTC use of cloud native technologies like Kubernetes.
US-RSE Annual General Meeting 2021 - 3 Dec, 1-3PM ET, Free for US-RSE members
For all US-RSE members, the organization’s AGM featuring reports on the activities of the society.
A one-liner in Python that creates an infinitely nested dictionary, and what that tells us about how Python handles assignment.
Using systemd to set up automatic backup to an external disk when it’s plugged in.
When a breakpoint is set to a function, debuggers stop after the prologue to the function, not at the start of the function itself. Here’s why.
Debugging story - stack corruption in a Windows game.
Multiple kinds of inheritance in perl, if you in turn have inherited some perl.
OAuth2 on a static website, using workers/lambdas.
CLI autocomplete (and, eventually, more) for iTerm, Terminal.app, Hyper, VSCode with Fig.
In tech we all love a good story about someone else’s catastrophe. This one won’t disappoint. Cascade of doom: JIT, and how a Postgres update led to a 70% failure on a critical national service.
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers, team leads, or those interested in taking on leadership the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
This week’s new-listing highlights are below; the full listing of 161 jobs is, as ever, available on the job board.
High Performance Computing (HPC) Researcher - Communications Security Establishment, Ottawa ON CA
Communications Security Establishment (CSE) is Canada’s national cryptologic agency. Unique within Canada’s security and intelligence community, we employ code-makers, codebreakers, and secure system creators to provide the Government of Canada with foreign signals intelligence (SIGINT), as well as cyber security services (Canadian Centre for Cyber Security). CSE is the national hub for cyber operations to defend Canada and advance national interests. We provide technical and operational assistance to federal law enforcement and security agencies. The High Performance Computing (HPC) team is currently looking to fill several positions at the UNI-7 and UNI-8 levels. Providing HPC expertise to clients in the form of training, consultation, code optimization, porting, and parallelization, and other HPC programming techniques; evaluating, specifying, deploying, and integrating new HPC tools and technologies (hardware and software) in order to support client requirements; supporting management of the HPC estate, including support to procurement, data center management, infrastructure design, system deployment, systems administration (including lifecycle management and policy compliance);
Research Computing Infrastructure Architect - University of Birmingham, Birmingham, UK
Advanced Research Computing (ARC) at the University of Birmingham is seeking to appoint a Research Computing Infrastructure Architect on a permanent contract. As an award-winning team, ARC has earned a national and international profile for delivering high quality, underpinning research computing services.
As the lead of the Architecture, Infrastructure and Systems (AIS) group, you will be responsible for the design, planning, installation and running of hardware systems and associated services that underpin advanced research computing. This includes both University level and national level services. You will be responsible for managing the budget within ARC associated with hardware and services and, working with the ARC leadership team, will have design authority over the systems and services offered.
The team currently operates at-scale research storage systems, supercomputing (including HPC and HTC), high speed networking and private cloud infrastructure. You will respond to new technology developments, be responsible for purchasing systems and working with vendors to obtain best value for the University.
You will join Advanced Research Computing and will lead the Architecture, Infrastructure and Systems group. You will be responsible for running and inspiring the highly skilled, technical team operating services within ARC. You will also join the leadership team of ARC to provide strategic input into the development of ARC. You will work in a helpful and collaborative environment, interacting with other team members, supporting, guiding and sharing knowhow on a daily basis.
ARC welcomes applicants who are looking for flexible and hybrid work patterns, but with the expectation that the post holder will spend some time on the University’s campus each week.
Head of Data Management, Clinical Trials Unit - University of Southampton, Southampton UK
In this role you will be working within an industry-leading trials unit, with expertise in the design, conduct and publication of multicentre, interventional clinical trials and other well-designed studies. This role will see you lead a well-established team with a reputation for clinical data excellence and a vast experience of working with site teams and industry partners to ensure the integrity and authenticity of trial data. You will join an experienced leadership team with a passion to drive the unit forward and cement our position as one of the leading CTUs in the UK.
Senior Research Computing Systems Engineer - University of Southampton, Southampton UK
You will be helping to support the innovative research being carried out using the University of Southampton’s High Performance Computing facilities. You will be joining a team of dedicated research computing engineers who are supporting the current systems and their use in a range of real-world problem-solving research topics. From Quantum Chemistry simulations and AI modelling for future technologies to Climate modelling the impact of climate change, medical imaging to support medical diagnosis advances and COVID-19 research. Come join the team as we refresh our on-premises clusters and start looking the use cloud HPC to explore new architectures. Delivering training programmes to research students and mentor colleagues.
Engineering Manager - Sofar, San Francisco CA USA
At Sofar we connect the world’s oceans. We build the technology to create global awareness of ocean weather, climate, and ocean health. Our unique ocean data provides insights to science and society, and our products make ocean industries more sustainable. We are looking for a product-driven Software Engineering Manager to join our team as we build a new software product for the maritime transport industry. Our Wayfinder product runs on board the largest shipping vessels in the world, providing critical decision making context for sea captains as they navigate the world’s oceans. Building a great product will require you to have a deep understanding of our customers, their workflows, and the data they work with every day. You will collaborate with a top-notch team of engineers, ocean scientists and product designers to translate these user needs into high quality software that empowers our users with data to get their work done more efficiently.
Project Manager, The Center for Computational Biomedicine - Harvard University, Boston MA USA
The Center for Computational Biomedicine (CCB) provides computational, bioinformatics, and data science support for students, postdocs, research staff, and faculty across Harvard Medical School through education, project collaboration, and professional development initiatives. The scope of CCB projects includes four groups: Data & Analytics Platforms, Ontologies, Functional Genomics, and Image Analysis. CCB is seeking a highly motivated Project Manager to organize project operations across our 4 working groups. A background in Computational Science, Bioinformatics, or Data Science is required for this position. The CCB Project Manager (PM) will work directly with and report to the Center Administrator, helping to organize, launch, and manage project collaborations as well as CCB’s Education and Professional Development initiatives.
Associate Chief Technical Officer, Hardware and Operations, SciNet - University of Toronto, Toronto ON CA
Under direction of the Chief Technical Officer (CTO), the Associate CTO provides leadership for and direction of the hardware team as well as key aspects of the operation of the SciNet computational facilities and the data centre. Based on in-depth knowledge and understanding of scientific computing, the Associate CTO is responsible for developing strategic and tactical plans for SciNet, in collaboration with the CTO and the Academic Director; for developing procedures and determining resources to reflect client and service needs, projects and priorities. The Associate CTO leads and manages a team of highly-skilled systems administrators in the provision of scientific research computing services, develops operational plans and work processes for the unit, including planning for new systems software and upgrades; developing technical solutions; and for monitoring related financial resources. The Associate CTO provides input into the design, development and construction of the facility in liaison with internal and external entities.
Open Science Services Manager - UK Research and Innovation, Didicot UK
SCD provides systems and services supporting the deposit, management access and analysis of research outputs such as documents, data, software and other information, to make them available to the research community in order to enhance the value of scientific results. Two examples of these open sciences services are the Physical Sciences Data-science Service (www.psds.ac.uk) that provides UK academics with access to some state-of-the-art chemistry databases, and the Data and Analytics Facility for National Infrastructure (www.dafni.ac.uk) that provides data and computational resources supporting research and planning related to the UK’s physical infrastructure (transport, power, built environment etc). The Open Science Services Manager will join a team developing and supporting open-science systems and services to ensure their quality and delivery to the scientific community. The post holder will work with a team of developers and system administrators in supporting users, maintaining quality of service, and developing a roadmap of service improvement, and will also work with team leadership in developing an open science roadmap to provide additional services.
Data & AI Project Manager - AstraZeneca, Cambridge UK
At AstraZeneca we are treating Scientific Computing as a strategic asset underpinning our advances in science. Groundbreaking research strategies critically depend on best in class Computing capabilities. We are looking for a highly motivated, ambitious and independently working Scientific Computing Platform aligned Project Manager to join our global team. AstraZeneca R&D IT is building an outstanding organisation at the forefront of the digital revolution in healthcare. We’re applying technologies such as AI, machine learning, software and data engineering and analytics to provide critical insights. Our goal is to engage with our partners as a strategic partner in delivering life-changing medicines to our patients.
Lead Data Scientist (Python, R) - OpenText, Richmond Hill ON CA
The AI Data Scientist will focus on business analysis, data analysis and AI/ML model building to support business partners across OpenText. This individual will be a professional that is self-motivated and driven to accomplish company goals and who is comfortable multitasking in a fast paced, dynamic environment. Candidates should be comfortable optimizing outcomes while taking into account ethical and regulatory considerations. The team is global and very collaborative. We are a fast-paced organization and there are numerous projects in progress at any one time.
Group Lead, Software Platforms - Pacific Northwest National Laboratory, Richland or Seattle WA USA
The Group Leader (GL) of Software Platforms is responsible for excellence in the delivery of software solutions and platforms to the laboratory. Software engineering is prevalent across all our mission areas and the demand for production-ready software and services has never been higher. The position reports to the Director of Research Computing and will be a key member of the Research Computing Leadership Team.
Group Lead, Computational Platforms - Pacific Northwest National Laboratory, Richland or Seattle WA USA
The Group Leader (GL) of Computational Platforms is responsible for excellence in the design and deployment of computational architectures to include high-performance computing, machine learning, quantum computing, cloud computing and research data management systems. In addition, they lead multiple teams of subject-matter experts that provide their support and expertise to the users of these platforms, while contributing to the laboratory computing community. The position reports to the Director of Research Computing and will be a key member of the Research Computing Leadership Team.
Project Manager II - Science and Technology - Frederick National Laboratory for Cancer Research, Rockville MD USA
The Biomedical Informatics and Data Science (BIDS) directorate works collaboratively and helps to fulfill the mission of Frederick National Laboratory in the areas of biomedical informatics and data science by developing and applying world leading data science and computing technologies to basic and applied biomedical research challenges, supporting critical operations, developing and delivering national data resources, and employing leading-edge software and data science to effectively enable and advance clinical translation. The Strategic and Data Sciences Initiatives (SDSI) group in BIDS works collaboratively to accelerate cancer research with a focus on building effective use of innovative data and scalable computing and guiding development of future computational infrastructure and workforce capabilities needed to address key cancer research challenges including predicting treatment outcome, reducing racial and socioeconomic disparities and shortening the time it takes for new treatments to be developed.
Project Manager, Secure Research Computing (FISMA/FedRamp) - Queen Consulting Group, Boston MA USA
The project manager will lead the stakeholder communications program; coordinating directly with functional process and data owners to understand and incorporate multiple functions and user perspectives into the project and plan for successful user acceptance testing and rollout to meet end user needs and school objectives. The project manager will seek out and coordinate with other IT leaders and project managers, technologists, and support staff to ensure projects are planned for successful transition to production support, and to improve the maturity of the IT Project Management processes and toolsets for enhanced collaboration across IT.
Inference Data Scientist (Staff Level) - Mozilla, Halifax NS CA
As a Data Scientist within the Data Org, you will work as part of a cross-functional team that is responsible for understanding and empowering the future of Mozilla. Apply a variety of statistical methods including causal inference to understand the intricate ecosystem of our users, products, partners, revenue. Build key data sets to empower operational and exploratory analysis
Senior Manager, AMD Research - AMD, Bellevue WA USA
AMD Research is an entrepreneurial research organization with a superb track record of driving research innovations into AMD products. We generate innovations in processor architectures, graphics, interconnects, memory technologies, and software to create new business opportunities for AMD. AMD Research seeks a passionate, collaborative leader with strong technical skills and the initiative to motivate an expert team. You will lead world-class researchers and technical teams to create the next generation of computing and graphics technologies. You have demonstrated leadership skills, excellent communication skills, and a proven history of driving others to create new business opportunities. You have managed and mentored a top-tier group of researchers spanning multiple technical disciplines and various levels of experience. You are energized to bring together ideas and teams to work on hard problems with high potential value.