Research Computing Teams Link Roundup, 11 Dec 2020
Research Computing Teams Link Roundup, 11 Dec 2020
Hi, everyone!
I don’t have anything of my own to share with you this issue, but it’s been an interesting week in research computing and so there’s lots of nuggets in the link roundup.
As always, if you find anything particularly interesting, or if there are topics you’d like covered, please let me know! Feedback is a gift, and even though I’ve been shamefully slow getting back to a couple of readers this week I really value your thoughts. I enjoy getting email back, even about things that you don’t like or disagree with.
So on to the roundup!
Managing Teams
42 Employee Review Questions Every Manager Should Ask - Fellow.app
There’s always lots of of “annual review” posts up towards the end of the year, but even if you don’t do annual reviews, this is a good set of topics to make sure you raise with your team members periodically - on goals, strengths, what they like best/least about their current role, what new challenges they’d like, how your working relationship with them is going, and how the working relationships within the team are going.
Starting in research computing (bioinformatics) as a non-academic - Mick Watson on Twitter
An important reminder for those of us in research - and especially in academia - that the world of research is intimidating, unfamiliar, and as a result very stressful to team members who weren’t trained in that area. In Watson’s case, he has been enormously successful eventually anyway, but it was a pretty trying experience. All though he doesn’t say it, my guess is that with a little extra mentorship and support, he likely could have flourished a bit earlier and with a lot less anguish.
Managing Your Own Career
Where Are the Career Paths for Staff on Campus? - Lee Skallerup Bessette, Chronicle of Higher Education
In research computing we frequently talk about the lack of career paths for ourselves as managers and - to an even greater degree - our team members as individual contributors. Bessette’s article reminds us that this is an issue at research institutions quite broadly - in traditional support units like HR and finance as well as in newer ones like our own or, at Universities, educational design (a field that is booming with recent interest in virtual and hybrid education). In all of those areas Bessette notes consequences quite familiar to us - salary compression, a feeling of stagnation even if responsibilities are growing, people hopping between institutions or projects even when they’d otherwise be quite happy to remain just to try to get some kind of vertical trajectory.
So the bad news is that the problem is bigger than our units, but the good news is that we may very well have allies in trying to set out career ladders for IT and management tracks.
Product Management and Working with Research Communities
Visualizing Objects, Places, and Spaces: A Digital Project Handbook - Beth Fischer and Hannah Jacob, Wired!, Duke University
Fischer and Jacob are starting off on what looks like a really exciting project, assembling a handbook for starting digital projects in the humanities. It will be interesting to what comes of this effort - and if your team has helped support such a project, they’re seeking contributions.
Cancer Research UK forced into £45m ‘dramatic cut’ to research - Mićo Tatalović, Research Professional News (paywall)
The headline is enough here - those of our researcher colleagues who perform charity-funded research face a bleak couple of years. Such funds depend heavily on donations which are always sensitive to the economy, and don’t bounce back as quickly as the economy does.
Research Software Development
Finding Critical Open-Source Projects - Abhishek Arya, Kim Lewandowski, Dan Lorenc and Julia Ferraioli – Google Open Source
One reason for the push to cite research software is that it’s one of the few ways we have to give credit to the work, and to show that it’s widely used as a justification for funding for improving/maintaining the software. Software citation remains an uphill struggle, especially for software that a research user might not directly interact with (like key libraries). This is a problem in areas other than research, of course - maintenance of crucial pieces of software for internet software is famously under-funded. In this article, the authors describe how google is trying to address this by creating a metric by which to evaluate how critical open source projects are (by tracking activity, issues, etc as well as its use as a dependency) to make these tools importance more visible.
As more and more research software - especially that software which becomes widely used as a dependency - moves to online code repositories, it’s hard not to imagine that this approach would be increasingly feasible in our community, and that we might be able to use such metrics to justify funding.
Command Line Interface Guidelines - Aanand Prasad, Ben Firshman, Carl Tashian, Eva Parish
We create a lot of command line tools in research computing, and we tend to unconsciously mimic common linux tools when doing so - but those were designed a long time ago, and their interfaces were frequently optimized for use in scripts rather than interactive use by people. This set of command line interface guidelines formulates a consistent philosophy for modern use cases, integrates advice from a number of different style guides, has a short and opinionated list of tooling for Go, Python, Node, and Ruby which mostly support their guidelines, and has suggestions for how to distribute the resulting command line tools. It’s a good and not very long read.
How We Built Scalable Spatial Indexing in CockroachDB - Sumeer Bhola, CockroachDB
Geospatial data is a growing part of research computing, and this is a nice overview of how one database project implements spatial indexing to implement efficient spatial queries (things like, “Find all objects in this region”).
This is neat in and of itself, but it’s also a great reminder of the power and transferability of research computing expertise. The same kind of data structure and concept - space filling curves - that helps power geospatial database indexing also helps with certain kinds of adaptive mesh refinement for high-speed computational fluid dynamics, or even some kinds of particle tracking methods. The investments we as team make in deeply understanding one kind of approach often have implications for very different kinds of problems.
Research Computing Systems
CentOS Project shifts focus to CentOS Stream - The CentOS Project
The future of Linux distributions in the age of docker and k8s - Joe Landman
Some big news this week, long feared after the announcement of CentOS stream, that stable CentOS is being replaced. CentOS stream, rather than being a clone of the stable, long-term, supported (but costly) RHEL distribution, is a continuously running-updated distro showcasing what will appear in RHEL. Since the main reason in research computing for the use of CentOS is the stability, this has caused some concern - people wondering what to switch to, and a new bistro being started, rocky, by some of the same original CentOS folks, with the goals of the original CentOS.
Landman’s article is interesting to read in that context. As we start packaging things in terms of applications rather than libraries, research computing (and computing in general’s) relationships to OS distributions is changing - maybe surprisingly slowly, all things considered. In HPC environments it’s long been common already to have multiple alternate versions of all key libraries, compilers etc - essentially entirely parallel “distributions” - built and maintained. Even on personal systems, outside-the-distro package management systems like home/linux brew are commonplace. Linux distributions already exist that are very slim and optimized to be nothing more than a container host. Is that the way forward for shared research computing systems, even those not using containers?
ZFS: You should use mirror vdevs, not RAIDZ - Jim Salter, JRS Systems
Salter offers a deep look into ZFS, virtual devices (vdevs), pools, and RAIDZ, and explains why nowadays mirroring vdevs is generally a better choice than RAIDZ.
Emerging Data & Infrastructure Tools
Don’t Panic: Kubernetes and Docker - Jorge Castro et al., Kubernetes Blog
If this directly affects you, you’ve probably already seen this or related discussions - yes, Kubernetes is dropping support for Docker, and it’s not necessarily a big deal.
Since Docker (or, recently, Singularity) is the main way most of us create and interact with OCI-compliant container images, we tend to conflate that particular runtime and toolset with containers in general. But the runtimes Kubernetes supports changing to exclude Docker may not change anything about your interactions with Kubernetes at all - using Dockerfiles and docker build to make images is fine. On the other hand, local tooling that uses the docker runtime - things like docker ps - to interact with images in the Kubernetes cluster will have to change.
Hewlett Packard Enterprise […] high performance computing […] HPE GreenLake - HPE Press Release
(Good product management tip here, too - if you want an announcement to get some press, make sure people can make puns in the headline, e.g. HPE “floats” HPC-as-a-Service with GreenLake Cloud, HPC Does a Cannonball into HPE’s Greenlake).
Interesting to see still other partners getting into the HPC-as-a-service game, with HPE.
Events: Conferences, Training
Christmas SORSE Event: When Spreadsheets Attack! (and other maths disasters.) - 17 Dec, 14:45 UTC, Free online
From the webpage:
Are you already looking forward to the festive break for some relaxation? Yes, so are we! But just before you close your computer down for possibly the last time in 2020, join us for a lighthearted look at When Spreadsheets Attack! with the hilarious, well-known standup comedian and mathematician, Matt Parker, followed by some tales from the community.
Qwiklabs Kubernetes training - 30 days free if signup by 31 Dec
This Google cloud post lists a number of training opportunities; maybe most relevant for our community is this item:
If you sign up by December 31, you’ll get access to unlimited Kubernetes training and the opportunity to earn Google Cloud skill badges on Qwiklabs at no-cost for 30 days. We recommend you begin with the following quests on Qwiklabs: “Deploy to Kubernetes in Google Cloud” and “Kubernetes Solutions.”
Software Sustainability Institute Collaborations Workshop 2021 - 30 March - 1 Apr, Online, £50 From the website:
The Software Sustainability Institute’s Collaborations Workshop series brings together researchers, developers, innovators, managers, funders, publishers, policy makers, leaders and educators to explore best practices and the future of research software. Collaborations Workshop 2021 (CW21) will take place online from Tuesday, 30 March to Thursday, 1 April 2021.
Random
Speaking of modern command language tooling - fig for visual applications and shortcuts for command line tools looks really interesting.
NAND game - get all the way to building a simple processor using just NAND gates. Good quick crash course to git stash.
Git wip, a git alias to list branches and when you last worked on them, not that any of us ever create a branch and move on and never do anything with it only to be surprised by its existence later.
A reminder you can use sockets directly from the shell. This plus named pipes and process substitution are my favourite underused shell tricks.
As cloud data engineering technologies mature, they’re going to start getting rewritten with performance in mind. Seastar, a C++ framework using concepts that would be familiar to those working on high performance multithreaded research software, has already had success doing just that with Cassandra and Memcached. Here’s a quick intro.
Vim tips for the intermediate user.
An Introductory tutorial for postgrest. Postgrest is an unusually powerful low-code way of spinning a ReST api up atop a (postgres) database.
A nice tutorial use case for github actions for a blog, using issues and pull requests plus github actions to automate some steps for a Hugo blog (and to nudge the author to get those blog post ideas out).
Cute tutorial for federated learning - a topic close to my heart - with Python.
Free pre-production PDF version of a really interesting looking book, High-Dimensional Data Analysis with Low-Dimensional Models: Principles, Computation, and Applications by John Wright and Yi Ma.
Nice debugging story/review of what happened when an Advent of Code website couldn’t keep up with demand. It’s also a really nice review of all of the things that are likely to go wrong with a simple web application, finally concluding with an extremely unlikely culprit.
LinkedIn has some neat-looking School of RSE training materials.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Job postings are starting to slow down a bit at the end of the year, but they’re still coming in. Some highlights below; full listing available on the job board.
Senior HPC Engineer - University of Southern California, Los Angeles CA USA
The University of Southern California’s (USC’s) Information Technology Services is seeking a talented Senior High-Performance Computing (HPC) Engineer with an exceptional commitment to service excellence to join its team. As the Sr. HPC Engineer, you will be an integral member of the Center for Advanced Research Computing (CARC), collaborating with diverse and talented team members to support USC research community, improve customer experience, and generate value for our campus stakeholders across a broad base of departments and constituencies
Machine learning / High Performance Compute Program Manager - AMD, Markham ON CA
As program manager in AMD’s machine learning software engineering team, you will drive end-to-end delivery of leading-edge technology in high performance GPU-accelerated compute and machine learning for the Radeon Open Compute software stack. You will learn about how the power of open-source software can be applied to solve real-world problems. You will interact with product management, customers, software and hardware engineering teams, quality assurance and operations in a new and growing team.
Data Hub Service Lead - UC Berkeley, Berkeley CA USA
The Data Hub Service Lead will lead and coordinate efforts around designing and scaling the Berkeley DataHub - a service that provides interactive computing environments to educators and students across campus using open source tools in the Jupyter ecosystem and beyond. This position will directly interact with Faculty, IT Support and the DevOps backend team, and provide training, communications, and resources to a range of users. It will also interface with open source communities that develop tools we use in our deployments - in particular the Jupyter community and the Jupyter team at UC Berkeley.
This role is split between two organizations: The Division of Computing, Data Science and Society (CDSS) brings together programs, schools, and departments from across campus to create rich educational opportunities and ignite groundbreaking research to meet society’s greatest challenges. Research, Teaching, and Learning (RTL) supports the teaching and research needs of Berkeley faculty, students, and academic staff by offering expert consulting, providing vetted tools and essential services. This is a full-time position, in a career appointment, jointly funded by these two organizations.
Senior Data Management Lead - Parexel, Various EU - UK, Ireland, Poland, Romania, Spain
The Senior Data Management Lead provides leadership and expertise in all aspects of Data Management. Develop and manage timelines for study data deliveries, including Go-Live, Interim Deliveries and Final DB Lock. Collaboration with the relevant functions (Clinical, Biostatistics, Database Programming, Medical, Medical Writing etc.) across all geographies. Data Management single point of contact to ensure that the contracted Data Management deliverables are being met – specifically in terms of timeliness, financial management and quality.
The Senior Data Management Lead can competently and independently lead large, complex projects and/or programs with little to no guidance from their Line Manager and/or Subject Matter Experts. Senior DMLs may act as a mentor for ADMLs, DMLs and other Senior DML peers.
Director, Researcher Engagement - Princeton University, Princeton NJ USA
The Director of Researcher Engagement has the depth of technical knowledge and proven judgment and integrity to guide their team and implement a strategy that engages academic researchers and staff, and provide high-level technological support in alignment with the mission of the organization.
The Director provides leadership and oversight of the core functions of researcher engagement to ensure service excellence, efficiency, and operational effectiveness.
We are looking for an individual who strives for high performance and someone who establishes and maintains effective customer relationships and seeks to gain their trust and respect. A creative person with strong organizational, technical, and planning skills, the Director of Researcher Engagement possesses strong interpersonal and communication skills. The Director also has a strong commitment to service, teamwork and collaboration.
R&D Master and Reference Data Director - AstraZeneca, Gothenburg SE or Cambridge UK
Own the development of the Master and Reference Data Governance Framework for the in scope subject area(s) and supply to development of the business area goals and roadmap.
Build, own, curate and lead the pipeline for reference data requests (crowdsourced + strategic priorities) on priority base approach and in alignment with existing/emerging data use cases.
Apply deep domain expertise to drive solutions to complex business issues and steer the triage and demand routing to the appropriate senior partners to influence strategies.
Lead the direction of and handle Master and Reference Data forums and multi-functional forums
Define, manage the reporting on performance of Master and Reference data processes, standards, quality and compliance within AZ functions and act upon outcomes.