Research Computing Teams Link Roundup, 11 Sept 2020
Hi, all!
Just a short introduction this week - there's a full set of links to round up this week, with a lot on communicating with teams, watching out for burnout in ourselves, academic communities, and
The newsletter is nine months old now. As the list of resources covered here grows, and we see where the gaps are for topics research computing team members need to be discussed, my plan is over the coming weeks to trim down the number of links covered each week and to focus on a bit of writing on topics that aren't widely covered. That would mean things like grant writing for research computing, research community building and the like. That will probably also include interviews with research computing leaders.
What do you think; what would you like to see more of, and what do you think the newsletter could do with less of? Hit reply and let me know.
For now, on to the link roundup -
Managing Teams
Never Skip Retros - Tim Casasola, The Overlap
In his new newsletter, Casasola argues that one of the most fundamental team meetings you can have are regular restrospectives, because:
- They disrupt the habit of anticipating the future,
- They are low hanging fruit, and
- They put teams on the path to continuously improve.
He goes on to suggest tools like Parabol and Fun Retrospetives as tools to help with the retrospective process.
This isn't exclusively a software development (or even computing) practice; it's widespread in project management generally, and I think any time there's a natural place to take stock and look back, doing some kind of a "what went well, what should we take a look at for next time" meeting is well worth the time for the potential improvements.
Informal Communication in an all-remote environment - GitLab
GitLab has long been an all-distributed company, and this section of their handbook on running distributed teams is dedicated to setting up channels for informal communications in those environments.
A couple of the suggestions in here are very simple because they involve taking advantage of meetings you likely already have. Encouraging informal conversations in retrospectives, for instance, where a bit of brainstorming and riffing off of things is just part of the meeting; or starting meetings early to give people who want to join early a chance to chat.
Other suggestions include taking advantage of communities outside of work to connect people - sending people to conferences (not super relevant right now) or having team members bring back ideas from relevant events, virtual or otherwise, that they attend.
Other examples are a wide library of social gatherings you may have read about elsewhere - talent shows, coffee chats, co-working calls, trivia nights, pizza calls.
Incident updates, interruptions and the 30 minute window - Dean Wilson
One management skill I wrestle with is the tension between giving my team members, who I trust, the freedom to solve problems as they see best (this is the easy part for me) while staying informed enough to make related decisions and make sure no one is falling down any rabbit holes (this is the tougher part). Wilson's article is just a nice story about a previous boss who would consistently, gently, but firmly interject "just enough" during an incident to make sure they knew what was going on so they could communicate upstream, and to keep people on track, while letting the team do their thing.
How to Call Out Racial Injustice at Work - James R. Detert and Laura Morgan Roberts, HBR
At the beginning of the summer there were a flurry of articles on addressing racial or other systemic injustices in the workplace. Unfortunately those have died down a little bit. This HBR article discusses how to call out racial injustice at work - it could just as easily be used to address issues of gender inequality, or dealing with any systemic issues.
The steps Detert and Roberts suggest are:
- Use allies and speak as a collective.
- Channel your emotions (but don’t suppress them!)
- Anticipate others’ negative reactions. ("If your request evokes a furrowed brow or a crossing of arms across the chest, start asking questions: `These seem like appropriate next steps to me, but perhaps they feel problematic to you. Can you help me understand what you’re thinking, and why these may not seem right to you?'")
- Frame what you say so that it’s compelling to your counterpart. (“We are evolving together” rather than “I am revolting against you.”)
And finally, and maybe most crucially,
- Follow up. A single conversation isn't going to be enough.
As managers in research computing, most of us are white, and many of us are white men, and so don't really have to deal with steps one and two when we see issues - we can speak up when we see issues and our voices will be heard and taken seriously without having to have safety in numbers or modulating our emotions. Indeed, we have an obligation to do so. Even so, where applicable it would be best to connect with those most directly affected and make sure we're advocating for the right things, and lending our voices to theirs.
As a bonus, this framework is a very useful one for raising any difficult topic with higher-ups in an organization.
Managing Your Own Career
A 4-Step Process For Avoiding Burnout - Madeleine Evans, The Path Forward
Last roundup there we talked about the emotional resilience report which covered a lot of really good background on burnout. This is a much more tactical article outlining specific steps:
- Do a reality check - how often do you find yourself agreeing with questions like "I feel burned out from work", "I have become more callous towards people lately", or disagree with statements like "I have accomplished many worthwhile things lately".
- Identify your biggest risks - things that cause burnout at work are high demands, unfairness, lack of control, and things that help fight burnout are enough time to rest/recharge, support, good match with your values and the work you're doing, and reasonable recognition/reward for your effort. Which of those things are the biggest issues?
- Have templates and strong habits for the things which replenish you in the areas above
- Plan ahead and review your progress each week on doing concrete things to help avoid burnout.
I wouldn't say that last week's article is a prerequisite for this one but I think it's very helpful for establishing the insidiousness of creeping burnout and gives context to the steps above.
Product Management and Working with Research Communities
Roadwork ahead: Evaluating the needs of FOSS communities working on digital infrastructure in the public interest - Elisa Lindinger, Julia Kloiber, Katherine Waters, Katharina Meyer, Thoka Maer
As mentioned last link roundup, research isn't the only area where essential digital infrastructure development in under- or un-funded. This report focusses on the situation in free and open source software generally, focussing on internet infrastructure, but some of the problems are the same: for instance
- "Funders and infrastructure projects communicate differently."
- "A variety of factors prevent infrastructure projects from applying for funding."
There are also very cogent insights on diversity and inclusion FOSS projects, which I think are very important but I also believe that research computing has diversity issues which are more deeply rooted and harder to bypass than in an open source software project.
Some of the recommendations are I think highly relevant
- "Explicitly funding non-technical positions"
- "Establishing fellowships"
- "Providing examples of good practice for lightweight, result-oriented FOSS project structures"
The report is not overly long and a very clear read. What I'd like to see as follow up are recommendations on how FOSS infrastructure projects could advocate to funders.
Academic jobs take major hit from Covid-19 - Mićo Tatalović, Research Professional News
A reminder that trainees we work with are facing an even worse job market this year than usual for those looking to continue on the academic track. We're pretty fortunate in that research computing jobs, particularly in anything connected to health sciences, continue to be offered in strong numbers.
Organic and Locally Sourced: Growing a Digital Humanities Lab with an Eye Towards Sustainability - Rebekah Cummings, David S. Roh, Elizabeth Callaway, Digital Humanities Quarterly
A useful article on setting up a Digital Humanities "pop up" lab in the University of Utah's Marriott Library, after an earlier attempt had failed. The story told here of learning from (and building on) previous attempts and using the lab not simply at a thing in and of itself but as a concrete thing for a nascent cross-campus effort to nucleate around is a nice example of planning and community building to make something as tricky as an interdisciplinary centre take off. This article is part of an issue which has several case studies of digital humanities labs. The group putting this together fended off (or at least de-prioritized) administration views on what was important (visualization wall!) and focus on:
- Real Academic Partnerships/Collaboration producing real outputs
- People and trained staff, and
- Figuring out how to let the lab identity emerge rather than be prescribed (the above partnerships helped with that)
- Allowing individual things to be tried and fail while ensuring the effort as as a whole was sustainable, and
- A portfolio of efforts and outcomes
It's a good overview of what's involved in putting together something that connects so many different moving parts.
Unpopular Opinion - Data Scientists Should Be More End-to-End - Eugene Yan
I continue to watch how data science/data engineering roles evolve, because I think there's a lot of analogies to research computing work specifically. The large amount of experimentation as different kinds of orgs take on and shape data science/data engineer teams can teach us a lot about how to usefully work with our own stakeholders.
A lot has been written on "full-stack" data scientists, and Eugene Yan feels that's the wrong direction to go in. The important thing isn't the depth of the tech stack, it's the beginning-to-end of the journey. The data scientists who can participate in the process from identifying a problem to its eventual solution and deployment into production are the ones who can most easily contribute to the company's needs.
I think this is especially true in research computing, and easy to forget when we're increasingly specialized and focussed on our technical tools. Someone who can work with the researchers throughout the entire journey is invaluable. That doesn't mean they have to do it alone, deeply understanding every technical piece of the problem, but having at the least a "concierge" or "navigator" who stays with a researcher team throughout the process is extremely valuable.
Research Software Development
Dev huddle as a tool to achieve alignment among developers - Mario Fernandez
Fernandez describes how to organize huddles for software developers. The huddles are somewhere that developers can raise ideas about new tools the team should consider, interesting techniques they read about, or make decisions about how they'll be handling needed development work. They can be lightweight and self-driven, and serve as a method for building alignment between the developers and sharing useful information and knowledge.
Research Computing Systems
Findings From the Field - Two Years of Following Incidents Closely - John Allspaw Incident handling is an area where research computing falls well behind best practices in technology or IT, partly because the implicitly lower SLAs haven't pushed us to have the discipline around incidents that other sectors have had.
And that's a shame. There's nothing wrong with having lower (say) uptime requirements if that's the tradeoff appropriate for researcher use cases, but that doesn't mean having no incident response protocol, no playbooks, no procedures, and going through the stressful and error-prone approach of making it up as we go along every time something happens is a good way to do things. And I've seen many research computing centres where that is precisely what's done.
This is a short presentation slide deck on what Allspaw has learned from following incident handling closely at multiple organizations.
Some common failure modes he's seen in leadership in thinking incidents are themselves a bad sign, wanting to get inappropriately involved, and an insistence on largely irrelevant metrics. Some common among front-line incident support is an exclusive focus on fixing over learning, and treating post-incident processes as bureaucracy and busywork.
In Allspaw's estimation, both groups need to build culture and process around learning from incidents, creating meaningful actions to follow up on what was learned, and to make the most of these unplanned investments in peoples time by having the reviews useful, re-read, and having them inform future work.
Emerging Data & Infrastructure Tools
The HDF Group Announces Availability of HSDS Release v0.6 - HPCWire
HSDS ("Highly Scalable Data Service"), an object-store/S3-flavoured version of HDF5, is nearing v1.0. This takes the well-known scientific computing data format, with its efficient array slicing operations, and support for multiple readers and writers, and moves it to distinctly non-posix systems. For some applications this may be a relatively straightforward way to migrate away from POSIX file systems which are extremely expensive in the cloud and extremely challenging at scale. It will be interesting to see how this continues to mature.
Events: Conferences, Training
IEEE 2020 - 14-17 Sept, Virtual, Free
This year’s IEEE 2020 is virtual and free to attend, with workshop sessions in Intel persistent memory and ARM, and talks on efficient inter-node communications, performance monitoring and characterization, HPC workloads, and storage.
ParslFest 2020 - The Parsl Community Meeting - 6-7 Oct, Zoom, free
Parsl is a parallel dataflow/dynamic workflow library for python supporting a large number of back ends, including common HPC batch queuing systems. The 2020 Parsl community meeting is on the 6th and 7th and includes science applications, cyberinfrastructure talks, and tutorials.
Random
Research librarians are putting together "curation primers" for various research data file formats.
Unikernels, which I thought had promise for research computing in production (especially HPC) before seemingly getting killed off by VMs and containers, might be having a day again due to microservices. Nanos looks pretty slick.
Videos from FortranCon 2020 are available online.
The case against dynamic linking.
Systems software always struck me as having being like research computing software (say for simulations or data analysis) than application software in that the difficult part isn’t complexity so much as subtlety. Here is a blog post on writing comments for systems software.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Highlights below; full listing available on the job board.
Manager / Azure Data Architect, Omnia AI - Deloitte, Toronto ON CA
You will be helping our clients to resolve their most complicated data & analytics problems to build, maintain, improve or re-architect solutions on Azure Cloud. You will be working and leading cross functional teams on architecting, optimizing data systems and building them from the ground up. On an average day you will help our clients understand advantages and disadvantages of specific Azure Data architecture choices and provide subject matter expertise and lessons learned for your previous projects. Most importantly you will develop highly efficient teams of internal resources and guide their development journey on Azure cloud.
Manager Clinical Data Management - Abbott Point of Care, Ottawa ON CA
This role is responsible for clinical data management (CDM) strategy, activities and operations for new and ongoing clinical research studies for the division. Has primary management responsibility directly or through subordinates for all data related to clinical studies. Responsible for the integrity of the processing and evaluation of clinical data. Identifies and implements the most effective, cost efficient and best business practices to execute processes and continually evaluates their effectiveness and appropriateness. Ensure that quality of services meets internal and external customer requirements and reports status to senior management. Responsible for identifying and implementing new, updated and/or enhanced systems for data collection for clinical studies
Manager, Research and Data Science - Bank of Montreal, Toronto ON CA
Applies knowledge of advanced analytic algorithms and technologies (e.g. machine learning, deep learning, artificial intelligence) to deliver better predictions and/or intelligent automation that enables smarter business decisions, improved customer experience, and drives productivity. Applies strong communication and story-telling skills to summarize statistical/algorithmic findings, draw business conclusions, and present actionable insight in a way that resonates with business/groups. Drives innovation through the development of Data & AI products that can be leveraged across the organization and establishes best practices in in alignment with Data & AI governance frameworks of BMO.
High Performance Computing - Director - Modis (Recruiter), Sydney NSW AU
Bachelor's degree in Computer Science, maths or related field
7+ years' experience minimum with large-scale HPC systems
2+ years of hands-on systems architecture, infrastructure engineering, software development, solution architecture or support of research software engineering
1+ years' experience with cloud computing as it relates to HPC.
Good understanding of management and performance optimization techniques associated with large-scale computing system
Executive speaking and presentation skills - Formal presentations, white-boarding, large and small group presentations
Head of Research - European Molecular Biology Laboratory, Hinxton UK
As Head of Research you will report directly to EMBL-EBI’s Co-Directors and play a key role setting vision, strategy and scientific direction for the Institute. The Head of Research also provides oversight of EMBL-EBI's research portfolio, similar to a Head of Department.
The Head of Research will chair the Group Leader appointment panel for dedicated research group leaders at EMBL-EBI and provide decisions on EMBL-EBI research resources. As part of the leadership group you would represent the Institute within EMBL, to the EMBL Council, and externally to the scientific community within Europe and worldwide.
Programme Officer, Data Science - Office for National Statistics, UK Government, London UK
This is an exciting opportunity to work at the heart of data science in the public sector. As Programme Officer, you will help to build data science capability, working within the Knowledge Exchange team to deliver programmes of learning and development for our stakeholders across government and the public sector. The Data Science Campus is a purpose built centre of excellence for data science. We are the UK Government’s data science hub. We play a leading role in developing the UK’s data science capability and international reputation in data science.
Group Leader, HPC Infrastructure and Networking - Oak Ridge National Laboratory, Oak Ridge TN USA
We are seeking a Group Leader for HPC Infrastructure and Networking in the National Center for Computational Sciences (NCCS). Selection will be based on qualifications, relevant experience, skills, and education. NCCS Provides state-of-the-art computational and data science infrastructure coupled with dedicated technical and scientific professionals tackling large-scale problems across a broad range of scientific domains for accelerating scientific discovery and engineering advances. NCCS hosts the Oak Ridge Leadership Computing Facility, one of DOE’s National User Facilities
R&D Manager, Scalable Computer Architectures - Sandia National Laboratory, Albuquerque NM USA
We are seeking a R&D Technical Manager to lead the Scalable Computer Architectures Department in Sandia’s Center for Computing Research (CCR). The mission of this department is research, development, and analysis of innovative computing architectures for national security missions which rely on cutting-edge high-performance computing (HPC)! This organization is sited in the Computer Science Research Institute (CSRI), a world-class institution for foundational and applied research in computer science. CCR and CSRI are committed to nurturing a culture compatible with a broad group of people and perspectives. In support of this vision, the center actively recruits applicants from diverse backgrounds and fosters an inclusive community. We seek a technical manager to support this commitment!