Research Computing Teams Link Roundup, 20 Mar 2020
Hi, everyone:
Even with stay-at-home orders and mandated social distancing, we’re very fortunate in our line of work. Much of our research computing efforts can be done remotely; and here in Canada, as well as elsewhere, research funders have made it very clear that they will provide as much stability as possible, and that they’re prepared to be sympathetic to hitches in research plans. While it’s hard to know what the future holds, even with massive government deficits it’s hard to see there being much of an appetite to slash science funding in the next couple of years. The same stability which can be infuriating when needs are growing and we can’t hire or procure at the speed we would like gives us a lot of protection in tough times; colleagues and friends in other industries aren’t as lucky.
But that doesn’t mean what we’re doing right now is easy. Even if no one we know is directly effected by COVID-19, we’re trying to keep research moving as a manager under really challenging circumstances.
You’ve probably already gotten over the first hump. Most of us are working remotely now to the extent possible; and we’ve gone through the initial period of adjustment with tooling and home office setups, and are starting to regain some productivity. But now that it’s becoming clear that this is going to be the new normal for a little bit, the recognition is setting in that it’s not just about tools, but about culture and process too.
We know now we probably can’t just wait for us all to be back into the office to make that big architectural decision around a whiteboard - we need to figure out how to do some of these things purely remotely. It’s not just big things, either; just to keep the team working well together we need more explicit and more frequent communication (not always super comfortable for those of us trained in the research world), more deliberate approaches to team activities and meetings (ditto), and to rely on long-form written communication a bit more (which actually does suit many of us just fine). In addition, some of our team members are finding the uncertainty very hard, or have family members who are directly effected, and need some extra help.
Our responsibilities in research computing extend beyond our own team, too. Communication with our research users, communities, funders, and bosses needs to be tended to in the same way; frequent communication to let them know what’s going on is needed, while being understanding if our questions and needs maybe aren’t their top priority right now.
The good news is that the skills and habits we’re building now are going to make us even more capable managers and leaders of multi-institutional or international collaborations. I’ll do what I can to help, by curating advice here in the link roundups, and writing a bit more; Monday I’ll preview to newsletter subscribers a blogpost on getting started with remote one-on-ones right away,
If you do find yourself having lulls in this newly-remote work — or just need a break from the increased communications demands of managing in this environment — there are a lot of opportunities right now to get caught up on learning about new tools and technologies for research computing, along with learning about new research computing projects. I’ll update those here; several now-virtual conferences are having their attendance opened up, and lots of training materials are being released early.
So with that, let’s get started:
Managing Teams
Stop Rushing In With Advice - Michael Bungay Stanier, MIT Sloan Management Review
Don’t Fall Into the Advice Trap - Michael Bungay Stanier and Marshall Goldsmith
One trap that’s really easy to fall into for those in either technical roles or in research — and so doubly easy for those in research computing — is rushing to give answers or advice to our team members. We got where we are by being experts in stuff, and so it’s very easy to just naturally give answers to people who are hitting issues.
Stanier has recently written a book (the Advice Trap) on this issue, and has given a number of interviews on the topic. He points out that there are three big problems with rushing to give advice:
- We may not actually be solving the real problem they have; e.g. the “XY problem”. (Worth remembering when consulting on any issue: if they could concisely state the exact problem, they’d probably already have most of the answer).
- Even if we got the problem right, our answer’s probably not great. The thing we just thought of five seconds after hearing the problem is likely a pretty unsophisticated answer to the issue they’ve been wrestling with for a while (and let’s face it, they’re unlikely to tell us that).
- Even if we got the problem right and the answer right, it’s just not good management. Shouldn’t we be teaching them to find the right answer themselves - for their own development and so we have to solve fewer problems?
Stainer has a pretty pragmatic approach to avoiding this trap - just don’t be so quick to give advice, even when asked. Instead, stay curious about the problem and their approaches so far (attempted or conceived). Keep asking questions and digging deeper, find out what they’ve been trying, and congratulate them on ideas they have that seem like a good approach. Crucially, even if you theoretically could have come up with a better solution, theirs is still probably the best approach if (a) it develops their problem solving skills and (b) it was come up with by the person who is going to implement it, so they’re fully committed to making it work.
How to manage one to ones - Dan Moore, Letters to a New Developer
I’ve shared several posts here about one-on-ones from our point of view as a manger; this one is written as advice to someone starting out as an initial contributor, focusing on what they should be aiming to get out of a one on one. Whichever side of the conversation you’re on, it’s worth spending some time thinking about what the other person should be aiming to get out of these conversations! One-on-ones are about the direct report, and this gives some idea about what they might want to be hearing. Really it’s not too hard to identify with the points made here; they’d be pretty much exactly what we’d want to be covering in one-on-ones with our boss if we had them.
Creating a Slack Writing Etiquette Guide for Your Workplace - RC Victorino, Slab
This is a great overview on using Slack well in a workspace. Like so much, whether the tool is used effectively or not comes down to setting clear expectations, and it’s our job as manager to set and communicate those expectations.
The points the article makes strike me as dead on, although it took me a while to come to these realizations myself (in particular I hate hate hated Slack threads when they were first introduced, and in my dotage it took a while to get used to emojis, as reactions or otherwise). The point about not using on Slack for synchronous communications as opposed to ephemeral communications I think is exactly right and wildly non-obvious.
- Make sure your messages are well written and have needed context
- Don’t use it for synchronous communication - use it for ephemeral communication (stuff that doesn’t need to be kept) but asynchronously, like quick emails
- Use channels well - have channel descriptions and purge no-longer-used channels regularly
- Be sparing with group DMs - are you sure none of this conversation would be useful to someone else? You can always create a channel and purge it later
- Use threads so as to not derail a channel
- Have one or more non-work channels to keep the other channels work-focussed
- Use reactions as acknowledgements
- Use emojis to make up for lack of body language/facial expression
A Guide to Managing Remote Teams - Know Your Team
Approximately seventy-eleven thousand articles have been published on doing remote work in the last week or two. This 60 page ebook (no email required, although they do politely ask for one after the download) was one of my favourites, just because it was relatively comprehensive; there’s also a one hour workshop version of this material. Some of the sections aren’t relevant for us in this situation - compensation in remote teams, for instance - but most of the other material is quite relevant. The bit on the importance of onboarding is going to be very relevant to our team quite soon - we’ll be onboarding a long-term intern in May and it looks very much like they might start as a purely remote employee.
Research Software Development
Feedback Ladders: How We Encode Code Reviews at Netlify - Leslie Cohn-Wein, Kristen Lavavej & swyx
We had several links about code reviews and the importance of clarity around expectations two weeks ago; in this post, authors from Netlify describe a simple, emoji-encoded 5-level scheme for communicating how urgent and important the code review recommendations are. It’s kind of the code review equivalent of the paper referee’s Reject/Resubmit after Major Revisions/Accepted Pending Minor Revisions/Accepted rubric.
Read the article for the details, but the levels are: (Will the newsletter preserve emoji? Let’s see!)
- ⛰ - Mountain - blocking and requires immediate action
- 🧗♀️ - Boulder - blocking
- ⚪️ - Pebble - non-blocking but requires future action
- ⏳- Sand - requires future consideration
- 🌫 - Dust - take it or leave it
This is the system for their UX team, so it’s mostly about design, but it seems like a useful way to communicate the strength of reviewers recommendations while also keeping them honest - if absolutely every problem they see is a mountain or a boulder, well, then, maybe they see too many mountains and boulders.
How to Grow Neat Software Architecture out of Jupyter Notebooks - Guillaume Chevalier
This is an older blogpost which just became a recent talk.
I’m coming around to the point of view that computational notebooks have real problems - obvious ones like hidden state, and maybe less obvious ones like the structure of notebooks actively discourage reasonable software development practices like unit testing or even version control. People even study this.
But in research computing lots of things have problems and we are kind of stuck with them anyway. They are simply too convenient for researchers to be expected to give up. This blog post and talk give a common-sense approach to take what someone’s made in a Jupyter notebook and turn it into decent software. I don’t think the advice is surprising, but the key is that step one is start ripping the code out of the notebook.
5 coding exercises to practice refactoring Legacy Code - Nicolas Carlo
Carlo has made several appearances on the newsletter before with useful articles on the practice of handling legacy code, which is something we need to do routinely in research computing. This is a little different - five somewhat guided exercises from a number of people on refactoring legacy code, with an eye towards practicing “finding seams”, inserting tests, and then refactoring. The code bases aren’t research computing-related, and the tests cover a mix of languages (from C++ to Javascript) but they seem like useful exercises to try one’s hands at.
Cool Research Computing Projects
Modelling of the tsunami from the December 22, 2018 lateral collapse of Anak Krakatau volcano in the Sunda Straits, Indonesia - Stephan T. Grilli, David R. Tappin, Steven Carey, Sebastian F. L. Watt, Steve N. Ward, Annette R. Grilli, Samantha L. Engwell, Cheng Zhang, James T. Kirby, Lauren Schambach & Muslim Muin
Comet Helps Simulate a Rare Volcanic Tsunami - HPC Wire
I’m a sucker for projects that involve careful data integration and cleaning on one side and modelling/simulation on the other.
Doing a good job of simulating events like tsunamis or even tides is surprisingly complicated - the waves propagating in the deep ocean are simple enough, but where you care about their impacts, at or near the shore, all the details of the ocean floor (bathymetry) really matter. Remote sensing satellites make getting that data a lot easier, but it still needs a lot of cleaning and filtering to make it into something usable.
In this paper the authors use a new bathymetry dataset and test it (and their models of the event that started the tsunami) by running a large suite of simulations using state-of-the-art TVD methods and comparing their results against “post-event field survey results, tide gauge records, and eyewitness reports”.
Research computing is at its best when it’s multidisciplinary like this — involving new data sets, new techniques, and tackling problems of real importance.
(Hey, let me know about your research computing project — is there really tricky modelling? Collection of amazing data? Really cool architecture? Just reply and tell me about it.)
Emerging Data & Infrastructure Tools
Code-wise, cloud-foolish: avoiding bad technology choices - Forrest Brazeal
This article is from the start of the year, but it’s been circulating around, and it is good advice in a short article.
Everywhere in computing, but it maybe worse in research computing, is a tendency towards NIH Syndrome - “Not Invented Here”. There’s a tendency to roll our own rather than lean on existing tooling; which is frankly madness, since our specialty is using and computing technology to solve research problems, not to invent computing technology. We should be repurposing things left and right to push science forward.
One particular manifestation of this is to use commercial cloud offerings and forsake using the managed services of the cloud, “because it’s vendor lock in”, or “because it’s expensive”. I mean, maybe you can run Postgres or Lustre on AWS cheaper than AWS/Azure/Google can, as long as your own time has no value, but probably not, right?
Supermicro Plants a Flag at the Edge - Jeffrey Burt, The Next Platform
The Continuum from Edges to DataCentres - Jeffrey Burt, The Next Platform
AWS IoT Greengrass - AWS
Ok so yes, IoT and Edge computing is super buzzy and overhyped, and I think all of the developments in this area are going to be great for research computing projects that involve lots of data collection from the field - think weather or climate science, oceanology, ecology, epidemiology, urban systems, whatever you can think of.
Fantastic work has already been done in these fields by researchers who have had to cobble together their own devices and data collection systems; it’s going to be enormously easier for teams to take on increasingly ambitious projects when there is more and more sophisticated and affordable commercial off-the-shelf tools for them to build on.
These efforts are coming on the hardware side with big companies that IT departments would recognize, like Supermicro, starting to offer low-power ruggedized hardware for outdoor environments like edge computing; and on the software infrastructure side like AWS Greengrass for data collection and processing. Whether or not such projects would want to use AWS, the fact that the big cloud providers are building services that support embedded device ecosystems like FreeRTOS adds additional heft and stability to such ecosystems - which is exactly what you want if you’re proposing a 5- or 10-year project based on such tooling.
Amazon’s Arm-based Graviton2 Against AMD and Intel: Comparing Cloud Compute - Andrei Frumusanu, AnandTech
Stacking up ARM Server Chips Against X86 - Timothy Prickett Morgan, The Next Platform
Benchmarking the AWS Graviton2 with KeyDB – M6g up to 65% faster - Ben Schermel, KeyDB
I very much do not want this newsletter to deteriorate into the usual research computing FLOPS, Bytes, and GB/s trivia that occupies so much of our online discussions - the purpose of research computing is research, not computing. But I also want this to be a forum where people can be aware of new hardware that might affect their plans as well as new software and architecture tooling.
So for those interested, the most recent ARM servers available on AWS - the Graviton 2, based on ARMs new Neoverse N1 microarchitecture, and exposed on AWS as their M6g instance - are both cheaper than the x86 instances (significantly cheaper than the Intel Xeon instances) and actually faster for some kinds of memory-intensive workloads (for the KeyDB in-memory database, ARM absolutely smokes the Intel chip). Research computing is wonderfully diverse, of course, so YMMV, but these are real alternatives for some of our usecases in a way that the first-generation A1 instances were not.
Conferences
NVidia GPU Technology Conference - Online, Mar 24-Apr 10
The GTC is an excellent GPU computing conference with both high-level overview sessions and very deep technical tutorials and workshops. This year has gone online, and NVidia has made the lecture sessions free and charging very modest amounts of the hands-on training sessions - but all with capped registration. There are many sessions that would be directly relevant to many of us; I’ve flagged a few below that jumped out at me, but there are several others (including some that haven’t been rescheduled yet) so please take a look at the full list; I believe times are US PDT:
- DLI Instructor-Led Workshop - Fundamentals of Accelerated Computing with CUDA C/C++ - 9:00 am Thursday, March 26
- Performance Analysis and Optimization [CWE21720] - 11:00 am Thurs, Mar 26
- CUDA Graphs - 9:00 a.m. Monday, Mar 30
- NVIDIA vGPU: Virtualizing NVIDIA GPUs - 9:00 a.m. Friday, Apr 3
- Accelerated Data Science on GPUs using RAPIDS, 11:00 AM Monday, April 06 and 9:00 a.m. Friday, Apr 10
- Data Center Monitoring and Profiling - 11:00 a.m. Monday, Apr 6
- Multi-GPU Programming - 11:00 a.m. Tuesday, April 7
- Directive-Based GPU Programming with OpenACC - 9:00 a.m. Wednesday, April 8
- DLI Instructor-Led Training - High Performance Computing with Containers, 11 a.m. Thursday, April 9
- Containers Runtime, Orchestration and Monitoring, 2:00 p.m. Friday, Apr 10
Random
Sometimes we need to display interactive graphs or calculations, but standing up a Jupyter notebook on Binder or Google Collaboratory, standing up an RShiny app, or writing some D3.js code seems like ridiculous overkill. The new Idyll language might be a good solution.
A genealogy of 8945 programming languages.
If someone you work with is having trouble getting a handle on Zoom, Jennifer Polk wrote a great 16-page primer on Zoom as a Google Doc. Polk publishes a lot of great information on out-of-academia career advice for grad students on her site and on twitter, so she may be a source of resources for other people in your circle as well.
NIST has a relatively new Federated Cloud Reference Architecture whitepaper out. It doesn’t really have much to do with cloud in particular; as multi-institution data and computation research projects become more common, though, the issues here become something more and more likely for us to come across. This sort of thing is my day job, and the document is quite good and clear on the issues.
I’m always on the lookout for organizational tools - Roam is kind of a… mind-map / wiki hybrid? I don’t know how to describe it. It seems to be the sort of thing that people either love or literally can’t come up with a use for.
Just a reminder that in remote communication, things go a lot more smoothly if you Assume good intent, Clarify any doubts, and Express yourself clearly (ACE).
There’s a good recent post on the importance of useful writing by Paul Graham, and he gives a good four point formula to make sure what you are writing is useful - that it’s important, novel, correct, and strongly written. But in my experience, in research computing our model seems to be the journal article and so I think we hold ourselves to too high a standard of usefulness, rather than too low. That means we don’t share blog posts, etc., often enough, even when we do have useful things to say that could save other people time.
That’s it…
And that’s it for another week.
Have a good weekend. I hope you and yours are safe, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Head of Scientific Computing - Vertex Pharmaceuticals, Boston MA USA
Develop and align partners on a vision and roadmap for the scientific computing teams, technologies and capabilities to support and accelerate Vertex’s scientific and research goals. Lead and build a high performing team of scientific technology personnel.
Senior Specialist - Azure HPC Platform - Microsoft Azure, USA
Microsoft Specialist Global Black Belt Azure HPC is a senior solution sales professional within our enterprise sales organization with a special focus to drive customer digital transformation agenda through the adoption of Azure HPC solutions in specific industries Automotive, Manufacturing, EDA, Oil and Gas, Banking, Media or Pharma.
Site Program Manager - Microsoft Quantum, Sydney NSW AU
Microsoft Quantum has established Microsoft Quantum Labs across the globe, with locations in Europe, Australia, and the US. he Site Program Manager will complement the scientific & engineering efforts and activities at the Lab with contract and grant support, strategic personnel management, intellectual property-related processes, the cascade of and feedback to Microsoft headquarters on policy and compliance, budget and expense accountability, and dashboarding/reporting on key performance indices rolling up into a global Site management rhythm of business.
Business Development & Operations Manager - Data Science Hub - UNSW Sydney, Sydney NSW AU
The UNSW Data Science Hub is being established as a major strategic initiative of the Faculty of Science to cultivate and promote foundational and applied research in Data Science with an applied focus in Environmental Data Science, Physical Data Science and Health Data Science. The Business Development & Operations Manager will provide strategic expert advice and input into the development of strategy, policy and planning as well as actively drive business development initiatives to position the Hub for success throughout all stages of its lifespan.
Technical Project Manager - Eaton, Dublin IE
Our Centre for Intelligent Power located at Eaton’s global headquarters in Dublin, applies data science to transform all aspects of our company. We’re continuing to expand the organisation and are now recruiting a Technical Project Manager to initiate and lead innovation programs, develop business cases, manage customer relationships, stakeholder management.
Technical Product Manager - IBM, Dublin IE
IBM Watson Health uses Cloud & Cognitive Computing to tackle some of healthcare’s most challenging problems. A Technical Product Manager leverages subject matter expertise both internally and externally.
Project Manager - Canadian Security Intelligence Service, Ottawa ON CA
Manage complex system development projects by planning, organizing, directing and controlling multi-disciplinary project teams. The relevant experience is in a role where the core function was to lead and/or manage IT, R&D and/or Scientific projects using Project Management methodologies.