Research Computing Teams Link Roundup, 23 April 2021
Hi, everyone!
I have a somewhat short newsletter for you this week; exciting things are happening at work with product adoption and hiring, both of which are taking up a lot of time but are unmistakably good. In addition, we’re having increasingly lovely weather, dear friends are getting vaccinated, and actions and decisions made months ago are are finally appearing in outcomes that are beginning to come together in a pleasingly coherent manner. I hope you, yours, and your team are doing equally well.
For now, on to the roundup!
Managing Teams
Don’t hire top talent; hire for weaknesses - Benji Weber
Why you should invest in undervalued people - Joe Emison
We need to talk about your Q3 roadmap - Wherewithall
Two common paths into research computing management - from academic research, or from computing and tech - tend to lead to the same mistake in hiring. In both cases, we’re used to looking for the “smartest”, “best”, “greatest promise”, “top” of the applicants, whether for postdoc positions or new senior devs.
But for most of us, this isn’t the right approach. Maybe you are supervising a collection of individuals who will each do independent work. Mostly though, we hire individuals to build a team. For that we need to hire at least in part for gaps in and weaknesses of the team as much as for strengths of the individuals. Or as Weber says:
Instead of “how can we find the smartest people?” think about “how can we find people who will make our team stronger?”
I don’t agree with everything in Weber’s article - we still do need consistent processes, and to look for did-not-demonstrate-needed-skills. But the basic idea of building the team rather than a portfolio of “top” individuals is a vital one. Some of the basic skills you’ll be hiring for will change over time as your teams gaps and weaknesses shift.
The good news about this approach is that it makes it easier to see areas that can be filled by still-junior people, as advocated for by Emison; in #68 we talked about the work it does take making for more resilient teams, but also it’s a move in the direction of equity as well as being, frankly, significantly easier to hire undervalued staff.
But the suggestion that we should be hiring for gaps sits a bit uncomfortably with this week’s reminder from the Wherewithall newsletter that, as we heard last week in #70, there are going to be a lot of people leaving or at least taking extended periods of time off as soon as things start getting back to some kind of normal. The adrenaline has long since worn off, and people are looking for change after an exhausting year.
It’s a bit more work to be prepared to hire for gaps when your list of gaps might change suddenly if people leave. Your “intermediate openstack administrator” or “senior research data curator” or “junior research software developer” positions won’t be cookie-cutter, fungible people. The roles will have a consistent set of base-level requirements to be a successful and productive member of your team but a shifting set of skills and behaviours you need to make your team stronger. Understanding your current team’s strengths and gaps, and the full package of what each of your current team members bring to the team - and documenting all of this! - will help you be ready.
How to have meetings that don't suck (as much) - Danielle Leong
More and more collaboration occurs asynchronously these days, but meetings are vital for coordinating that collaboration. Meetings are also routinely done really poorly, and academia is (or should be) famous for how poorly they’re done. Whether we’re having a meeting to make a decision, solve a problem, gather input, share information, or point everyone in the same direction, Leong calls out some things that should be crystal clear:
- Who is leading this meeting?
- Why are we having this meeting?
- What is the purpose?
- What is the agenda?
- What are the action items?
The middle item, “what is the purpose”, is badly under-used. I used to think that having an agenda was enough; but having a really crisply defined purpose, especially for recurring meetings, is in the long term even more important. You can’t evaluate whether a meeting was effective or not unless there’s a goal or purpose in mind. An agenda should serve the purpose, and it often implies the purpose, but having it explicitly stated makes it much easier make a meeting better.
Having a purpose also makes it clear when a meeting should be multiple, smaller meetings. If a meeting turns into a grab-bag of purposes, it should be split up. Leong has a list of suggested (short) lengths of times for many different kinds of meetings.
There’s a lot of other stuff in Leong’s article, including links to other good resources like designing useful meetings.
Product Management and Working with Research Communities
COVID-19: One year on for medical research charities - Association of Medical Research Charities
Another reminder that even with the importance of health research never more salient, charity-driven health research foundations have been pounded by the pandemic, and other sources of funding for research in general are in some danger of retrenching. For many of the researchers we support, it’s going to be a tough year or two to secure funding.
Commission seeks further views on microcredentials - Ben Upton, Research Professional News
The Decline of the Master’s Degree - Alex Usher, Higher Ed Strategy Associates
There was a lot of breathless punditry around MOOCs, and then microcredentials, last decade. Post hype, I think we’ve learned that such things are indeed potentially pretty useful when aimed at people who have already been trained to learn on their own (so more at the graduate level than undergraduate) as a way to pick up new skills.
And I think there’s opportunity here for academic research computing groups. As Upton rights, the EC is looking into microcredentials, to see what role they could play in learning and skills development in the future in Europe. Other governments and organizations elsewhere are starting to take a more nuanced look again, too.
There’s an interesting pairing here with an article from the start of the year. Usher wrights about a potential future decline of the Master’s degree. Universities love to set up professional Master’s degree programs, because they can charge whatever they want, set them up in any of a number of ways, and churn out students with new skills and desirable credentials. There’s nothing wrong with that - everyone’s happy! But the very pliable and anything-goes nature of professional master’s program suggests that the value is in the training and experience, not in the precise structure of the packaging.
Some research computing groups have had success in setting up masters programs in HPC, or Data Science, or something similar. But it’s a big commitment for typically pretty small groups, and it can easily tie up a lot of resources (or get taken over by a CS department or similar). As micro-credentials begin to be taken more seriously, research computing teams - with skills in data analysis and management, systems operations and architecture, and software development - may have something very useful to offer.
Research Software Development
The Initial Preview of GUI app support is now available for the Windows Subsystem for Linux - Craig Loewen, Windows Command Line Blog
Microsoft enables Linux GUI apps on Windows 10 for developers - Tom Warren, The Verge
I find this all very disorienting, having come of computing age during the era when Microsoft was actively trying to kill Linux, but Windows seems to be an increasingly plausible development environment for research software, even for tools that will largely be deployed on Linux systems. In particular, WSL2 now has preview support for Linux GUI apps.
Heterogeneous Processing Requires Data Parallelization: SYCL and DPC++ are a Good Start - James Reinders, The New Stack
Reinders writes in favour of the new SYCL (#44, #49) standard and DPC++, an earlier Intel-supported project which supports SYCL and lead on the development of
Reinders writes convincingly of the need for a common programming language for expressing parallel algorithms across GPU/CPU/FPGA/DSP/etc. I’m less convinced that such a model is possible, or that SYCL or DPC++ are it, but certainly Intel has put real resources into DPC++, and FPGA maker Xilinx actively participates in SYCL; there’s certainly real momentum there. Do any readers have experience with either of these two technologies?
Research Computing Systems
Deep Dive into Intel’s Ice Lake Xeon SP Architecture - Timothy Prickett Morgan
Intel has some catching up to do against AMD. The long-awaited Ice Lake architecture is both a “tick” and a “tock”, a process change and a new microarchitecture, and there’s a lot going on here. Though details and benchmarks are still scarce, Morgan gives one of his usual context-rich descriptions of what the new chips are like. There’s been significant improvement in per-core parallelism, with notably higher average numbers of instructions executed per clock cycle (basically the only way to get higher per-core performance for CPU-bound loads these days), and a lot of enhancements for modern data centres. Encryption is a big one, with new instructions for modern encryption methods, full-memory encryption support, and updated secure enclaves; there’s also improved memory and cache performance, particularly better and more predictable latencies and NUMA latencies.
Can You Trust Floating-Point Arithmetic on Apple Silicon? - hoakley
Yes, it turns out you can. This is a blog post is in research systems because I think we’re going to keep seeing increasingly exotic CPUs, and this article is a nice example of using known-hard problems to find out how, if at all, the M1’s floating point math differs from that of say Intel’s on nontrivial calculations. The author took three pathological problems from the Handbook of Floating-Point Arithmetic and ran them on Intel and M1 macs, to get exactly the same answers.
Emerging Data & Infrastructure Tools
Microsoft to Provide World’s Most Powerful Weather & Climate Supercomputer for UK’s Met Office - Oliver Peckham, HPC Wire
Met Office and Microsoft to build climate supercomputer - Cody Godwina, BBC
Microsoft brings Azure supercomputing to UK Met Office - Erin Chapple, Microsoft
Weather service computing is an interesting mix of data integration plus ensembles of modestly large simulations, all with very high SLOs for production runs - the weather forecasts need to be made on time! - along side model development, climate simulations, and joint analyses of past weather data with the simulation predictions.
This is as far as I know the first weather service that’s proposing to partner on its production environment with a cloud provider. Way back in #6 we talked about ECMWF moving its big-data analysis and data sharing environment into the cloud with the HiDALGO project, but this is quite different. It sounds like the production workhorse will be a dedicated HPE/Cray system, but I can’t tell whether it’ll be essentially hosted in an Azure datacenter (an offering they’ve had for a while) or whether it will be an on-prem hybrid arrangement. Either way, the workhorse system is specialized and has redundancy built in, but sounds like it will be integrated with Azure offerings via Azure Arc and will use the cloudy services for things like data analytics but maybe also for model development or testing production runs.
If any readers know more about this project, I’d be very interested in hearing more!
Not Your Usual Supply Chain Hack: The Codecov Bash Uploader Blunder - Steven J. Vaughan-Nichols, The New Stack
Don’t leak your Docker image’s build secrets - Itamar Turner-Trauring
Every time a new method of distributing software is adopted, we start realizing that we need to be careful about not including anything, like keys or credentials, in the new format that shouldn’t be there.
Vaughan-Nichols what happened with the very nice and useful Codecov service, and how teams and tools that used the bash uploader tool for potentially any time in 2021 likely had all of their environment variables stolen, which in this context probably included CI/CD credentials. This happened because a someone was able to maliciously replace the bash uploader tool, because a credential of the service was leaked in one of their docker images.
As Turner-Trauring points out, that can happen in a couple of non-obvious ways. A credential could be included in a base layer and then “deleted” in a later layer of the image - but OCI images keep those lower layers, and so a credential may appear to be missing but can be extracted. In addition, build arguments are baked into the image, so even if you never copy a credential explicitly into the image, it may be there. Best practice with Docker is to use buildkit and explicit docker secrets which are exposed to the running container but are not saved not inside the image.
Random
Using HTTP range queries to list the files in a remote zip archive without downloading the long whole thing.
You know the rule by now - embedded databases always make the roundup. Mongita is SQLite, for MongoDB.
We talk a lot about legacy codebases here, but as data gets more important we also need to be able to learn how to understand and work with legacy data models. Here’s how to get started with 250 tables and no documentation.
Git-xargs supports running git commands against a list of github repos.
Not a big surprise, but jobs in tech are increasingly being offered as remote-friendly - roughly 1/3 of devops jobs in one sample are being offered as remote - and if we hope to compete we’re going to have to figure out how to do that too (at this point, for many of us the institutional barriers are the problem).
Materials for an image rendering course - the Graphics Codex.
I’ll be adding search capability for newsletter back issues shortly using stork-search which seems pretty nice. if you have a static website with a lot of (say) documentation on it, this is a pretty painless way to generate a speedy full-text search.
Some hard-won lessons from working with very large (billions of rows) databases in PostgreSQL.
COVID has changed a lot. One good thing - we’re never going back to old practices that tacitly enabled researchers who were reluctant to share data. One million coronavirus sequences: popular genome site hits mega milestone.
Written communication is a huge part of what we do in research computing and in managing teams. Coursera has a writing course which people seem quite happy with.
Legate-numpy looks like a very interesting NVIDIA-supported effort to build a drop-in replacement for numpy which supports GPUs but also distributed GPU computing, using the runtime from the PGAS language Legion.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 134 jobs is, as ever, available on the job board.
Assistant Director, Population Health Data Sciences - Cancer Institute of New Jersey, New Brunswick NJ USA
Under the direction of the Associate Director of Population Science and Community Outreach, directs and manages the data and analytic services that support population science research, community outreach and engagement including ScreenNJ activities, and catchment area responsibilities.
Technical Program Manager - Seven Bridges, Remote friendly (?) USA
As a Technical Program Manager, you will ensure Seven Bridges leadership in the global standards community (i.e. CWL, GA4GH, FHIR, etc.). You will develop technical strategies for empowering the use of standards by our platform users and representing Seven Bridges’ interest at community standard meetings, guiding development through participation in development/oversight committees. You will help develop an action plan that identifies immediate areas to address following a review of Public-Program’s activities and the Program Roadmap. Guide and promote the use of the CWL specifications across international analysis efforts and continue to evolve the standard to match the needs of our users and collaborators.
Math Libraries Software Engineering Manager, Dense Linear Algebra - NVIDIA, Santa Clara CA USA
As the engineering manager, you will not only lead and mentor, but also be responsible for planning, scheduling, execution, quality, and performance of the cuBLAS library. Join our diverse and dynamic team and help us build and improve high performance GPU accelerated software libraries that are used by applications around the world and support NVIDIA's vision and growth. If you are a leader with a strong background in HPC and linear algebra then this role is a great fit for you and we would love to learn more about you!
Senior Research Software Engineer - Oak Ridge National Labs, Oak Ridge TN USA
The Software Engineering group focuses on engineering the next generation of high-quality scientific software. Our group innovates and inspires the next generation of cutting-edge scientific software, thus enabling Oak Ridge National Laboratory (ORNL) to host the world’s premier scientific software engineering group and transform science with software-defined solutions that are reliable, usable, and trustworthy.
Director, Data Insights, Analytics, & Continuous Improvement - World Vision International, Mississauga ON CA
As World Vision Canada evolves into an insight-driven organization by ensuring everyone is acutely aware of our performance to make rapid high-quality data-driven decisions to accelerate learning, increase efficiency, and deliver optimal results. The Data Insights, Analytics, & Continuous Improvement teams will lead this evolution through developing and implementing tools, capabilities, and an environment that enables a data-led performance-driven culture. The Director, Data Insights, Analytics, & Continuous Improvement is a strategic leader who provides oversight and guidance of all data collection, management, analytic, and process improvement activities including recommendations and insights to improve the internal and external experience for stakeholders and supporters. She/he works with World Vision Canada partners and stakeholders to ensure insights are shared, actioned, and aligned with enterprise Objectives and Key Results (OKRs).
Manager, Data Management - Labcorp, Ottawa ON CA
The selected candidate will be responsible for the overall management, administration and organization of Data Management in Pharmacometrics. Contributes to the development and implementation of data management business strategy. Responsible for ensuring the efficient utilization of all human and physical resources.
Scientific Software Lead - Unknown, London
This is a great opportunity for a Senior Software Developer with a well-established research institute in London who are world leaders in identifying cancer genes and discovering cancer drugs. This role will sit within the Scientific Computing team which provides a number of key services to researchers across the institution including; High Performance Computing (HPC), Research Data Storage (RDS), Research Data Management (RDM) and Scientific Software. This role does allow remote working in the UK but travel to their London sites a few times a month is required. This role will be providing support for third party research software and development of new scientific applications and pipelines.
Deputy Director, Research Computing Service Centre - University of New Hampshire, Durham NH USA
This includes direct oversight of RCC Service Line Leaders, budget accountability, and filling in for the Orchestrator as necessary. The Deputy Director will engage regularly with the Orchestrator on all administrative and organizational needs while focusing primarily on those services directly relating to research across all USNH campuses. The Deputy Director coordinates the research centric elements of the following service lines: Data Center Operations, Software Development, Research Data Management Services.
Medical Data and Systems Manager - GSK, Missisauga ON CA
The DDA & Systems function within MEE enables Medical Engagement/Medical Information and Content Management globally through the provision of business processes, best practices and technology platforms. As GSK’s technology strategy matures, there is a clear opportunity and need for harmonisation and integration between platforms, particularly with the continued deployment and development of Software-as-a-Service, (SaaS), platforms such as those provided by Veeva Systems, along with a clear Data Strategy (ingestion, storage, curation and governance of data used by Medical).
Consulting Manager - Laboratory Informatics - Accenture, London ON CA
Accenture Scientific Informatics Services (ASIS) works with “best in breed” Laboratory Informatics software products to provide total solutions for our clients. ASIS are experts in architecting, configuring, deploying, validating and supporting laboratory systems. These systems include LIMS, ELN, SDMS, and CDS, among others. This domain where laboratory science intersects with information technology is our passion, and unlike some other companies providing similar services, it is our singular focus!
Program Manager - Machine Learning and High Performance Compute - AMD, Markham ON CA
As program manager in AMD’s machine learning software engineering team, you will drive end-to-end delivery of leading-edge technology in high performance GPU-accelerated compute and machine learning for the Radeon Open Compute software stack. You will learn about how the power of open-source software can be applied to solve real-world problems. You will interact with product management, customers, software and hardware engineering teams, quality assurance and operations in a new and growing team.
Technical Project Manager - Exact Sciences, Oxford UK
This an opportunity for an experienced Technical Project Manager to work at the forefront of science bringing a revolutionary technology to market by driving forward both research projects and product development activities. This role is responsible for the day-to-day running of projects, ensuring development milestones are completed in time and on budget. Working with a highly skilled multidisciplinary team covering all aspects of our technology, the successful candidate will be able to build on their experience of managing large scientific projects using agile and waterfall methodology, whilst expanding their knowledge and skills in an innovative fast moving scientific environment.
Product Manager – Bioinformatics - Wellcome Trust Sanger Institute, Cambridge UK
Do you want to play a key part in advancing the world’s leading cancer genomics resource? COSMIC, the Catalogue of Somatic Mutations of Cancer is looking for that person. We are seeking to recruit an experienced Product Manager with a background in Genomics or Bioinformatics as well as good understanding of the technical processes that are used to turn bioinformatic data into information via both website visual representations and data downloads. Customer focus, and demonstrable pro-activeness, facilitation, prioritisation and communication skills are a must.
Engineering Manager - Core Infrastructure - Benchsci, Toronto ON CA
We are currently seeking an Engineering Manager to join our rapidly growing Core Infrastructure team. Reporting to the Director of Engineering, Data & DevOps, you will be responsible for planning and delivery, as well as the mentoring and coaching of engineers. You will work closely with multiple teams and our Principal Engineer to build and execute a long-term roadmap as our infrastructure and security needs rapidly grow.
Research Community Specialist / Manager - Monash University, Melbourne VIC AU
An exciting and innovative opportunity exists within our team as a Research Community Specialist / Manager where you will combine your project management skills with your community engagement experience to help deliver high-quality research computing services for our researchers. You will be responsible for project managing the Australian Characterisation Commons at Scale (ACCS), a 3-year national-scale initiative that is building and deploying digital infrastructure across Australia. You will enjoy the variety of working with individual researchers, University organisational units and institutes, national partners, and organised research communities. The ACCS involves staff at 7 organisations, therefore strong communication and project management skills are essential.