Research Computing Teams Link Roundup, 14 Aug 2020
Hi!
Our question last week was about having difficult conversations with your team members. I got two responses from managers:
Haven't really needed to have difficult conversations in my team; I have peers who do, but my use of Manager Tools-style feedback has been effective. “May I give you some feedback? When you are on your phone during the staff meeting, it appears as disengagement and others can find that rude. Can you change that for the next meeting? Thanks." Positive:negative feedback is about 8:1 or even more.
and
When they do come up, I’ve found having the discussion at the start of the day, when we’re both still fresh, has worked a lot better. There’s no real trick to it that I’ve found; just meet the issue head on, don’t try to guess at the “why”s, be open minded about solutions.
Thanks both for sharing your answers!
The top question on our question board now is on organizational tools:
What tools do you use to keep yourself organized? As a research software developer I had two or three tickets on a JIRA board to keep track of, now as a software development manager I have what feels like dozens of things to be on top of.
I’m going through an organizational tool revamp right now myself, so I’ll be very curious to hear what our readers share. What I’ve found works for me is having a lightly organized “pool” of tasks that are on the go, some repeating and some one-off, and then using a paper notebook to plan priorities for the week ahead and then one day at a time, taking tasks out of and new tasks back into the pool. I had been using Omnifocus for the task pool, but I find that’s an awkward fit. So I’m actually toying with the idea of moving the task pool into something like Trello.
What are other people doing? Hit reply or just email me your answer, and I’ll anonymize it (unless you tell me otherwise) and post it next week for all the readers.
And now, the link round up!
Managing Teams
7 Ways Leaders Can Ask Better Questions - L. David Marquet
One of the things I continue to have trouble with is remembering that as a manager my off-the-cuff remarks can sometimes have an importance given to them way out of proportion than what I had intended. In particular, questions from managers are incredibly powerful, and that cuts both ways - they can help show interest and help you learn things about your team members and their work, or they can cause a flurry of counterproductive effort or even end up shutting people down.
Marquet writes about seven bad ways of asking questions we can try to avoid. It's a short worthwhile read; some that stood out were:
- Question stacking - just ask one question then listen
- Why questions - this has come up before; at worst it can sound accusatory, and best we are all really good at coming up with plausible-sounding reasons for the "why" of things at the spur of the moment; 'why' doesn't dig deep enough
- "Dirty" questions - frame the situation in an unhelpful way
- Binary questions - "Are things ready with X?" as opposed to "What more do we have to do/think about with X?"
Every now and then I catch myself doing one of these, which probably means there are more times that I don't catch. I have a bit more to do on watching how I speak while still speaking freely.
1-on-1s for engaged employees: How good managers run them - impraise
Another good article on one-on-ones. Most of the points are things that you, reader, will be familiar with, but always good to review the basics - particularly the reminder on being prepared and active listening.
There's some good questions to ask in here too, and I'm always on the lookout for good one-on-one questions to add to my toolbox. I particularly found these ones helpful:
- What are your biggest time wasters right now?
- Who inspires you in the team? Why?
- Would you like to receive more feedback from other team members?
- Do you have any concerns when it comes to your role or career opportunities?
- Which part of your job do you feel is the most relevant to your long-term goals?
- How can I better support you in your job?
The Indispensable Document for the Modern Manager - First Round Review
A User Guide To Working With You - Julie Zhuo
These are two recent articles on "Manager READMEs", or as described in these two articles, "Users Guides" for working with a manager. In the first of the two models, Jay Desai describes his and largely sets expectations around process (availability, one-on-ones, reporting, a new hire's first six months, how feedback works) with a little bit of troubleshooting advice ("here's how things sometimes go wrong"). In the second, Julie Zhao focuses hers more on communications styles, values, priorities, behaviours that have caused frictions in the past.
I think going through such a process to be clear in our own minds about how we approach things (and thus, implicitly, that others might approach things differently), our communication styles, our expectations, and where things sometimes go awry, is a really valuable process. These two articles describes different aspects of our our own communications and managerial approaches that might be worth thinking through with that level of clarity.
But posting Manager READMEs is pretty controversial. Not to put too fine a point on it, but part of the reason we're paid extra to be managers is to adapt to our team members communications styles, etc rather than handing them a leaflet saying "here's how I communicate". On the other hand, the other parts - descriptions about process, clarity of expectations, and things that we're actively working on improving - seems like information that should usefully be communicated to our teams one way or another.
Some common hiring manager mistakes. - Will Larson
Most articles on hiring from the land of tech are focussed principally on problems we in research computing generally don't have. Managing sustained rapid growth (if only), maintaining a hiring pipeline (which does matter for us, but with the slower pace it's different) - these aren't everyday problems in research computing.
On the other hand, several of these mistakes are definitely things I see in research computing teams:
- Not getting to yes (or no) by being indecisive, and relatedly Holding candidates in stasis for too long - these are big problems, they cause us to look unprofessional to candidates, and candidates move on
- Not knowing what you’re hiring for and, relatedly, Lack of calibration across the hiring panel. From the article: "Time spent upfront on ensuring you’re hiring for the right role will repay itself many times over. Spent more time than seems reasonable. Then, spend even more." - this is completely true. And once you've really understood what you're hiring for, make sure everyone agrees!
- Overly fluid interview loop and Just-in-time interview question - Winging it in an interview just isn't a good idea. The idea is to compare candidates against first a hire/no hire bar, and then against each other; that comparison is made unnecessarily harder by not having a consistent set of questions you're asking and then digging deeper into the answers.
Experience Doesn’t Predict a New Hire’s Success - Alison Beard, HBR
In this interview with Prof Chad H. Van Iddekinge of Florida State, Beard discusses Iddekinge's meta-analysis:
But when we looked at all these studies—and we sifted through thousands to find the 81 with pertinent data—we discovered a very weak relationship between prehire experience and performance, both in training and on the job. We also found zero correlation between work experience with earlier employers and retention, or the likelihood that a person would stick with his or her new organization.
It may seem obvious, but just because someone has filled a similar role in the past - even if they did really well at it! - doesn't mean they'll succeed in your role. (And by extension, the fact that they haven't doesn't mean that they won't). As managers we need to dig much deeper than the roles to the actual skills and behaviours that the candidate brought to bear on the problems in their previous work and decide whether those are the skills and behaviours we need in our role.
Managing Your Own Career
Notes on RSI for Developers Who Don't Have It (Yet) - Shawn Wang
Even though many of us are no longer coding or sysadmining all day, we still do a lot of typing. The interventions needed to avoid carpal tunnel syndrome are way, way smaller than the interventions needed to mitigate and control it once it starts being a problem! A good friend of mine developed RSI and it has significantly affected his work life.
Wang spells out a number of basic ergonomics steps you can take to avoid it becoming a problem. Tilting your wrists up is bad; lots of travel and multiple pressure at the same time like complex keystroke combinations is bad (I knew it, I just knew emacs was trouble!), routine breaks and stretches are good.
Product Management and Working with Research Communities
The 10 Best Online Survey Apps in 2020 - Emily Esposito, Zapier
It's useful to have survey tools close to hand for getting feedback from our community, whether it's training assessments, prioritizations, or what have you. Some useful ones for our community:
- LimeSurvey for teams that want a self-hosted option
- SmartSurvey for building a library of questions
- YesInsights for one-click surveys in e.g. user emails.
Research Software Development
NVIDIA HPC SDK - NVIDIA
The HPC SDK from NVIDIA is now in general availability. Amongst other things, there's Fortran 2003 compilers for x86, Power, and ARM in there, OpenACC support for GPUs, C++17 parallel algorithms support for GPUs, and interesting libraries. It will even set up Modulefiles for you, which is a nice little touch. It still seems a little unclear what the license is, though?
Improved workflows-as-applications: tips and tricks for building applications on top of snakemake - C. Titus Brown
Increasingly, research software is less about one tool and more a number of tools chained into a workflow or pipeline. That's especially the case in the case in data-intensive fields like bioinformatics, Brown's field, but even simulation science is moving that way, with data volumes from simulation growing to the point where it starts to look like data-intensive analysis pipeline with a big simulation step slapped on the front.
Here Brown walks us through lessons learned through the Lab for Data Intensive Biology's shipping and maintaining of five applications that were each workflows of varying complexity, using snakemake as the underlying workflow tool. Some of those lessons included: how to handle the complex configurations that can come when you're not just configuring the execution of one tool but several, by stacking config files; tools found for supporting both local and cluster execution; and the benefits of having a CLI that can serve as a template for multiple such applications.
If either the topic or the particular tools are of interest, the article is a good read and links to several other workflow resources written by the team.
Code Coverage Best Practices - Carlos Arguelles, Marko Ivanković, and Adam Bender, Google Testing Blog
A nice pragmatic overview of code coverage practice by a testing team that has tested a lot of code in a lot of projects. A representative line: "We should not be obsessing on how to get from 90% code coverage to 95%.". The authors that while too little coverage is bad because it certainly misses things, you're not looking for a high score: the key is to have important areas (like rapidly changing code, and integration cases) well tested. An awful lot can go wrong with tests that hit practically every line of code but is all unit tests.
Research Computing Systems
Improving Postmortems from Chores to Masterclass with Paul Osman - Blameless
Theory vs. Practice: Learnings from a recent Hadoop incident - Sandhya Ramu and Vasanth Rajamani, LinkedIn
Stuff happens, and when it does happen it's a lot of work and stressful. We should at least take the opportunity to make the most of these "unplanned investments", learn from them, and make use of those lessons to prevent related stuff from happening in the future.
The talk and transcript by Paul Osman is a good one about allowing for "blameless postmortems" - finding out what actually happened in the complex system of your team and the computing that lead to the incident. People feel defensive, other people will tend to feel accusatory, and unless you've got a good system in place for getting to the root of things without placing blame on people, it's going to be hard to learn anything meaningful.
The post from LinkedIn about what happened in an incident is nice example of what can come out of a postmortem, a clear and well-written description of an incident and what will happen next:
Here’s what happened: roughly 2% of the machines across a handful of racks were inadvertently reimaged. This was caused by procedural gaps in our Hadoop infrastructure’s host life cycle management. Compounding our woes, the incident happened on our business-critical production cluster.
Small but non-trivial amounts of data were lost as a result of this. What's really nice about this post and the results of their incident process is that the changes weren't small scale tweaks to prevent exactly this error from happening again, but much larger scale process improvements that will reduce entire classes of somewhat related problems from occurring.
Emerging Data & Infrastructure Tools
Emerging
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic - Abdelfattah et al.
Research: A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic - Hartwig Anzt and Jack Dongarra, HPC Wire
With memory bandwidth being an ever-greater bottleneck, increasing algorithmic sophistication in iterative methods, data-intensive research triggering interest approximate and sketching algorithms, and of course deep learning workloads, using a range of precisions in a research computing computation is an increasingly feasible if subtle option. The paper by Abelfattah et al. is a comprehensive survey of mature and emerging numerical methods (particularly linear algebra solvers) that use mixed precision arithmetic. It's summarized in the teaser article in HPCwire. Using (e.g.) single precision can easily be twice as fast (if not faster, due to better cache performance) than double-precision, and the speedup of increasingly common half-precision floats can be twice as fast again, if the numerical method supports it for some fraction of the computation. It's worth keeping an eye on these methods and packages as they mature.
Container Image Retention Policy - Docker
This was probably inevitable - DockerHub will no longer be storing container images for free indefinitely. Free accounts will have images removed after 6 months of inactivity (pushes and pulls). This is both completely understandable and kind of a pain for research computing, where mature software can lie dormant for a number of years before a new wave of activity in the field reignites interest. Quay.io remains an option for now, as does self-hosting images (or just requiring users to rebuild from Dockerfiles).
Amazon Braket – Go Hands-On with Quantum Computing - Jeff Barr, AWS Blog
Amazon Braket, AWS's best named service by far, is open for GA. This allows you to fire up notebooks on Quantum-computing simulation nodes from multiple vendors, play with quantum circuits and then deploy it to real quantum computing nodes. This may be of interest to any of your staff or users curious about quantum computing.
Random
Facebook has released pysa, a security-focused static analysis tool for Python.
Crush is the beginnings of a bash-like shell but with real programming language features, typed variables, and pipes which are more than just strings of bytes but allow typed columns.
A recent article drew some attention to Intel's mOS project which is quietly building a very lightweight, low-noise kernel intended for compute nodes in large clusters; ANL's Aurora is an intended target.
That’s it…
And that’s it for another week. Let me know what you thought, if you have any answers about organizational tools, or anything else you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Highlights below; full listing available on the job board.
Technical Program Manager, Brain Research - Google, Toronto ON CA or Montreal QC CA
Preferred qualifications: Master’s, PhD degree, or equivalent experience in Engineering, Computer Science, or other technical related field. Ability to speak and write in French fluently and idiomatically. Ability to exercise technical judgment in solving software engineering challenges. Exceptional verbal and written communication skills with the ability to interact with technical and non-technical global, cross-functional groups.
Data Management Manager - GSK, Brentford UK
The Data Management Manager will be responsible for governing processes to ensure master and reference data will be actively managed within GSK CH and will work with the Data Management Director to oversee implementation of the strategy, processes, technology and teams to manage master data, reference data, metadata and data quality within the GSK CH Data organisation, working closely with the wider organisation to ensure the compliance with the established company and line of business data policies and procedures.
Health Data Research Gateway Operations Manager - HDR UK, London UK
The role of the Gateway Operations Manager will be to: Work with the HDR UK community to ensure there is rich content on the Gateway that supports the needs of researchers and innovators; Ensure that the Gateway provides a reliable service to all its stakeholders; Manage and report on the operational performance of the Gateway.
Senior Data Scientist/Manager - Cerbri AI, Toronto ON CA or Washington DC USA or Austin TX USA
The ideal candidate is adept at leveraging large data sets to find patterns and using modelling techniques to test the effectiveness of different actions. S/he must have strong experience using various data mining/data analysis methods, using a variety of data tools, building and implementing models, using/creating algorithms, creating/running simulations, and testing its real-time implication. S/he must be comfortable working with a wide range of stakeholders and functional teams, trading off design to help others.
Senior Specialist, Research Software Developer - University of Calgary, Calgary AB CA
The Sr. Specialist, Research Software Development and Support position reports to the Director, Research Computing Services (RCS). RCS offers services to the research community at the University of Calgary. RCS is also part of a national support team that enables researchers to use a national High-Performance Computing (HPC) infrastructure. Daily work activities with research projects include responding to HPC infrastructure requests, deployment of research software, provide expertise to researchers to develop data pipelines to collect research data, curate and process data. This environment allows RCS to work with a diverse research ecosystem including Medicine, Science and Engineering research. This role requires a highly skilled software developer in research software with leadership qualities to lead a small group of software developers/Analysts supporting many research projects.