Research Computing Teams Link Roundup, 13 Mar 2020 (well, 16 Mar 2020)
Hi everyone -
Sorry for the lateness of this roundup. Like many of us, my team switched to into pure-remote mode this week; a move that has a lot of positives (No commute! Coffee we actually like!) but real negatives too (Last minute scrambles; juggling family and work commitments; increased pressure on us to manage effectively in this new environment).
In research computing, we’re better positioned than most to handle a push to remote. And as disruptive as shifts like this can be, big changes at work can create a moment for us to try new things and reset expectations. If you’ve been toying with the idea of doing one-on-ones, giving more consistent feedback, delegating more tasks and decisions, or making other changes to how your team works, this can be a good time to make one or two of those changes. In the next week I’ll put up a blog post on getting started with one-on-ones; if people are interested I might even do a webcast.
But most of us have read more than enough on remote work or COVID-19 in the past weeks, so let’s make the rest of this late and somewhat briefer link roundup a remote and coronavirus-free zone. The theme that seems to have emerged is productivity - managing our own time, our external communications, and even some of our computing more productively and with higher performance.
Let’s get started!
Managing Teams
Too Many Things - Sven Amann
As research computing team members and managers, we all have way too many things on our plate, and the battle to being productive and effective is focussing relentlessly on our priorities and letting less important tasks slide.
I actually generally do a pretty decent job of that - except when workloads peak and I’m much busier than normal, which is of course exactly when I need to be best at focussing on the priorities.
In this blog post, Sven Amann describes going through a process (recommended at least as far back as 1967, but also described in this Manager-Tools episode on prioritization) of timing one’s work for a while - it doesn’t have to be very fine grained, here he put his time into one of 13 buckets - and seeing how it lined up with his most important priorities. Like most of us, the results aren’t especially pretty. Going through this once can be a bit of a sobering process, but it’s a baseline that can be used as a point of comparison. And seeing exactly how much time is spent on tasks that could be done by someone else or done more efficiently can be a way to encourage change.
Remote Working Productivity - Bashayr Alabdullah
The title and context of this post is on remote working, but really it’s about choosing how to focus your time in a productive manager. The five suggestions are:
- Maintain daily habits and choose the environment - Find daily habits and an environment that work for you in getting your mind in gear
- Prioritize tasks - What are the priorities you have to get done today? This week? This month?
- Start a timer - Use something like the Pomodoro Technique - choose one of your priority tasks and set a timer: I’ve seen people use 20, 45, or 90 minutes, depending on the tasks - and focus on the problem exclusively for that period…
- Schedule breaks - then take a break.
- Batch questions (and answers) - schedule your requests to people you need answered, and answering queries made to you, so as to not interrupt your flow.
The 7 best work and productivity timers for freelancers, workers, and managers - Jory MacKay, RescueTime
Whether you’re timing yourself to keep track of whether your efforts actually match your priorities, or whether you’re using the Pomodoro technique to focus on something for 20, 45, or 90 minutes, you’ll need some kind of timer.
This “Best 7” list is on the blog of RescueTime, which is a company which makes productivity tools, so - spoiler alert! - RescueTime makes an appearance. Even so there’s some nice and free tools on here which I hadn’t known about, and several of which I may end up trying.
Buster Benson on the art of productive disagreement - Buster Benson, Brian Donohue
Why We Need to Disagree - Tim Harford
We’ve talked before about the lack of disagreement on a team (especially a technical team!) being a bad sign, and about how Google’s Project Oxygen found that psychological safety (which is very much about being willing to express disagreement) was one of the most characteristic features of successful teams.
Tim Harford article emphasizes the importance of disagreement, and points out that some of the most catastrophic failures are often due to people being willing to disagree.
Buster Benson, who’s written a book on the subject, gives five guidelines in particular for having productive disagreement. None of the advice given is necessarily surprising, but all take real effort and practice to put into place; and these were the ones he identified as most important in his study, so are likely the most bang-for-the-buck in order to foster productive disagreements:
- Use friendly language
- Understand first; be genuinely curious about the points being made
- Ask honest, open questions
- Speak for yourself - don’t assume what the other person is thinking
- Use the disagreement as an opportunity to learn something new or find a better third option
Research Software Development
Use the Git History to Identify Pain Points in Any Project - Preslav Rachev
The title more or less covers the article, but it’s a simple idea worth repeating from time to time - we don’t have as much data as we’d like on where the bugs are or problems are in a code base, but we do have one ready proxy - where the code changes have been made.
Linux Systems Performance — Brendan Gregg
This isn’t really about software so much as entire system performance. Brendan Gregg works at Netflix where he is the performance guru, and has written an enormously useful book on systems performance that he’ll be updating soon.
This post embeds a talk that he’s spent several years honing on understanding the performance of the entire system, from the highest level of the application to databases, filesystems and device drivers, to the performance of the hardware itself. (In fact you may know Brendan Gregg better as the youtube “yelling at hard drives” guy). At 40 minutes it’s well worth watching both for the encyclopedic listing of tools and the systematic approach he outlines to finding and understanding performance issues.
Product Management and Working with Research Communities
How (some) good corporate engineering blogs are written - Dan Luu
Dan Luu makes his second appearance on the roundup, this time discussing how good development and infrastructure blogs are written by some of the big tech companies - and how some of the less successful ones aren’t.
Blogs are really useful ways to share knowledge and to increase awareness of your teams work, but it’s hard to keep them going. The problems identified here are more about a stultifying corporate process around approving blog posts, which aren’t normally an issue in our teams (he says, as he’s been sitting on a team member’s blog post for a week). But he does identify one other feature of organizational blogs that are successful - an internal culture of blogging/long form write-ups.
Does your team have internal writeups - whether presentations or documents - that you have successfully (or not!) been turning into blog posts? I’d love to hear more.
How To Run A Free Online Academic Conference - Franklin Sayre, Tisha Mentnech, Amy Riegelman, Vicky Steeves, Shirley Zhao
Successful research computing projects build a research community around them, but not always on the scale where throwing a national or international conference or workshop to bring practitioners together seems like it would make sense. And even if it might make sense, wouldn’t it be nice to be able to test the idea first, to see how it goes?
This evolving Google Doc distills what the organizers learned from putting together the virtual Librarians Building Momentum for Reproducibility conference. It’s largely a list of questions to help guide planning, including (I think crucially) an outcomes checklist, and a list of other resources.
Have you put together a virtual academic conference or workshop around a research computing tool or method - or attended one? Has it worked? Let me know, because there are some ideas I’d love to try out…
Cool Research Computing Project
Leiden astronomers discover potential near-Earth objects - University of Leiden
Hazardous Object Identifier: Supercomputer Helps to Identify Dangerous Asteroids - HPC Wire
Identifying Earth-impacting asteroids using an artificial neural network - John D. Hefele, Francesco Bortolussi, and Simon Portegies Zwart
The vast majority of Near Earth Asteroids pose no threat at all to Earth - but by the time enough observations of an asteroid’s orbit have been made to accurately predict its approach, there isn’t a lot of time to do much about the ones that are.
In this fun combination of observational data integration, simulation, and deep-learning data analysis, the authors (who are expert in orbital simulation and dynamics) took large numbers of simulations of the solar system, both purely virtual and time-reversed simulations of known impactors or near-impactors, and used synthetic observations of those paths to train a deep learning system to quickly flag asteroids as being potentially harmless.
This is a great example of a problem where the feature-extraction aspect of deep learning can be of real use - the problem space is surprisingly complex, and there are additional parameters (where the asteroids came from) which aren’t necessarily apparent in early observations. But deep domain-specific knowledge is needed to choose the kinds of data and run the simulations to generate the training set for the deep-learning approach; spraying a lot of virtual particles into a naive N-body simulation would produce garbage, no matter how sophisticated the deep learning architecture.
Emerging Data & Infrastructure Tools
Get unstuck on your infrastructure choices - Fred Ross
A good reminder that there are a lot of perfectly good technical solutions out there and worrying about which one is “the best” probably isn’t worth your time:
Decide based on the following criteria: 1. Has your company already standardized on one of these? Use what they do. 2.Do you already have experience on one of them? Use what you know. 3.Do you have a friend or colleague that knows one of them and who will help you? Use what they know. 4.Pick one at random.
The context here is in the context of VMs, orchestration, and Linux distros, but I think this can also be used for any of a number of tools we use in managing research teams too. Should we use Slack or Teams? Zoom or Skype? Trello, Asana, or Github Projects? Whatever. There are teams using any combination of them successfully. Pick one and start.
A Checklist For Evaluating New Technology - Gustav Wengel
In a similar vein as the above, a pragmatic way of looking at a possible new tool or technology or even methodology to adopt. The checklist items most relevant to us:
- Does it solve a problem you’re actually having right now?
- Is it easily replaced?
- Can it be evaluated quickly?
- Is the project/technology popular and well maintained?
- How mature is it?
- Can it be maintained?
Of all of them, I think “Is it easily replaced” is the one which is important and undervalued.
Modelling Reaction Diffusion Systems with Julia and GPU - Thomas Moll
A nice look at where Julia use with GPUs are at the moment, with a simple differential equation use case that is close to my heart. A lot of other languages can do problems this regular efficiently (even Python would do well with numba), but it’s good to see how mature Julia’s GPU support is and that threading, which had long been neglected, is now solidly a part of core. With those two key items, much more complex computations would also be relatively straightforward.
Like a lot of CPU/GPU comparisons, this one isn’t especially fair to the CPU - look at all those allocations! - but the idea here isn’t to do a benchmark so much as just show the progression of the computation in Julia.
Bottlerocket OS - AWS Open Source
So this is interesting - a Linux OS designed from the ground up to solely be a host OS for containers. Stripped down, with a focus on security and maintainability. APIs for configuration. And special admin and control containers to separate control from operations. I’m not going to start playing with this any time soon but I’ll certainly keep an eye on it.
Random
Part of being productive is automating all of the things that can be automated. Here’s a blurb on using github actions to run simple code checks on check in - a good simple use case for getting started with GitHub actions if you haven’t already.
Sometimes digging for the root cause goes all the way to the floor panels.
Cloudflare provides us a history of the URL. If timing had worked out slightly differently, would we all have ends up using UUCP-style bang paths for everything?
In all the hullaballoo about whether or not commercial cloud computing can provide “real” HPC, the needs of the vast majority of research computing often get drowned out - and the flexibility of cloud can often be exactly what is needed. The Jetstream project out of Texas and IU is doing lots of really great work on both the research side and also here providing resources for training computational biologists at small institutions.
And finally, on productivity: exploiting the fine-grained parallelism of OpenMP for higher performance - in Ada.
That’s it…
And that’s it for last week.
I hope you have good luck this week with your research computing team, and best wishes to you and yours,
Jonathan
Jobs Leading Research Computing Teams
Storage Systems Group Leader - National Energy Research Scientific Computing Center, Berkeley CA USA
NERSC is searching for a knowledgeable and inspired group leader for the Storage Systems Group (SSG) who will provide vision and guidance during this time when storage technologies for HPC systems are changing rapidly. The Storage Systems Group will be responsible for supporting NERSC’s large scale parallel file systems, and archival storage systems with an eye towards balancing performance, stability, and usability for NERSC’s over 7000 users.
Director, Natural Language and Text Mining - Evolent Health, Chicago IL USA
The Director of Natural Language & Text Mining (Sr Scientist) will support building of AI products in Agile fashion that empower healthcare payers, providers and members to quickly process medical data to making informed decisions and overall reduce health care costs. As a research scientist/engineer part of Data Science and Artificial Intelligence team you will be working primarily on unstructured text data to build machine learning models for information retrieval applications.
Director, High Performance Scientific Computing - University of Tennesse, Knoxville TN USA
The Director of UTK High Performance Scientific Computing provides leadership and management for the staff and services for the UT Research Computing Cluster, which provides service to the faculty and students of the Knoxville campus, Institute of Agriculture, Space Institute and UT Health Science Center. This position reports directly to the CIO for the UT Knoxville campus.
Chief Information Officer / Computing Division Director - SLAC National Accelerator Laboratory, Menlo Park CA USA
SLAC National Accelerator Laboratory is seeking a Chief Information Officer (CIO) who also serves as Computing Division Director. This position provides the intellectual leadership for the laboratory’s computing organization providing IT infrastructure and services, business systems, cyber security and is the organizational home of scientific computing support resources.
Scientific Research Program Manager - Medical Science & Computing, Bethesda MD USA
MSC is searching for a Scientific Research Program Manager to provide support to the National Institutes of Health (NIH). This opportunity is a full-time position with MSC and it is on-site in Bethesda, Maryland.
XSEDE/OSG Architect - Georgia Tech, Atlanta GA USA
PACE seeks a talented systems architect to implement, maintain, and manage newly deployed research computing resources, enabling them to participate in national research computing grids such as XSEDE and OSG. This position is a research faculty (non-tenure track) position, and comes with the benefits commensurate with faculty status.
Machine learning / High Performance Compute Program Manager - AMD Systems, Santa Clara CA USA
As program manager in AMD’s machine learning software engineering team, you will drive end-to-end delivery of leading-edge technology in high performance, GPU-accelerated compute and machine learning for the Radeon Open Compute software stack. You will learn about how the power of open-source software can be applied to solve real-world problems in the domains of HPC and machine learning.