Research Computing Teams Link Roundup, 3 Apr 2020
Hi!
Every single week I wonder if there’s going to be enough material to send out a meaningful link writeup, and each week I end up leaving a bunch of material out.
Still I still wish there were more good sources of specifically research computing materials I could include; if you know of any, please let me know!
One unintentional theme on the technical side of this week’s roundup is free materials for training. As we all are settled into the new normal, some of us have team members whose work has been disrupted. If there are people in your organization with some time to spare, this is a good time for some professional development, and this week there are a number of free opportunities to learn listed below. There’s the entire ACM Digital Library until end of June, Google’s technical writing courses, the HPC-AI Advisory Council’s Stanford conference in April, courses at Pluralsight, and Intel’s DevCloud for playing with the emerging OneAPI.
I hope this week has gone well for you and your team, and that you have some time this weekend to take it easy.
Now, onto the weeks link roundup…
Managing Teams
Why being a manager is terrible…and wonderful - Scott Williams, Zapier
All of the research computing managers I know are very very tired right now. It’s ok to recognize that it’s tough managing even at the best of times, but we should also take the time to recognize that our teams are doing pretty well now and to take some pleasure in that as we enter the weekend.
Are You Sugarcoating Your Feedback Without Realizing It? - Michael Schaerer, Roderick Swaab, HBR
We have to work harder to keep connected with our remote teams now, but that doesn’t mean we stop giving corrective feedback about things that aren’t going well. In fact there’s probably a lot of new expectations about “how we do things”, and there’ll have to be a lot of nudges to get people on the right path.
This article talks about one problem that people have giving positive but especially negative feedback; the “illusion of transparency”. Put simply, people can’t see into your head, and (a) the thing you’re giving feedback about isn’t as clear to your team member as it is to you, who have spent much longer observing and thinking about it, and (b) what you’re trying to communicate the team member may be very clear to you and much less clear to them.
I’m really bad at this, particularly when I’m trying to not be overly negative about something or trying to steer away from being very prescriptive: I often say things that are utterly clear to me but leave people on our team with a lot of questions.
The best way to make sure you’re communicating what you think you are communicating is just to ask! But even just remembering that what you’re seeing isn’t necessarily as obvious to your team members is enough to help us be clearer:
Before the review, we told one group of the managers that the evaluations would not be evident to the employees, and that the employees would be unlikely to see the evaluation the same way as the managers. We found that the managers who were given this warning delivered much more accurate feedback than the others — the gap between the perceptions of the evaluation disappeared.
Managers, Take Your 1:1s to the Next Level with These 6 Must Reads - First Round Review
I’m pleased so many people have read the webpage or ebook version of the One-on-one quickstart; hopefully it’s been useful. Once you’re familiar with them, or for people who have already been doing them for a while and are looking for things to improve, this blog post lists six very useful things to read to add to your one-on-one toolbox:
- Deflect common 1-on-1 “types” into more productive directions
- Have some very large scale themes that you probe into
- Dig into problems, not symptoms
- Stay specific, but be empathetic
- Keep notes and follow up
- Deliver on your actions and make sure the one-on-ones are useful
Most of the suggested reads have helpful questions to ask to help shape the conversation in the way you’re looking to have it go, and those question lists can be useful to keep handy.
5 Best Practices on Nailing Postmortems - Hannah Culver, Blameless
We’ve started doing incident reports - sort of baby postmortems - in our project, which has been an extremely useful practice in growing more discipline about how issues with availability or security are reported, distributed, and dealt with. It also gives us a library of materials that we can look through to identify any patterns that appear.
This article talks about some best practices for running postmortem processes – - Use visuals - Be a historian - use timelines, but sparingly - Publish promptly - the faster, the more accurate the recollections - Be blameless: - People are not points of failure. - Everyone on the team is working with good intentions. - Failure will happen. - Tell a story
I think we’re pretty good with the blameless/tell a story parts in our incident but the prompt publishing has been a problem - we still need to improve the process, and right now I’m a huge bottleneck - and we’re not using visuals just yet.
Research Software Development
The Pyramid of Unit Testing Benefits - Gergely Orosz
Unit testing is increasingly accepted in research computing that it doesn’t really need justification, but when people talk about it’s benefit, it’s usually about fairly low-level benefits - CI/CD and avoiding regressions. But there’s an entire pyramid of benefits:
- Validate your work.
- Separate concerns in your code.
- An always up-to-date documentation.
- Fewer regressions.
- A safety net for refactoring.
Advantages like documentation, and the need to separate concerns in the code to the point that unit testing makes sense, and future architectural benefits like being able to refactor more smoothly, are under-appreciated!
How to SSH properly - Gus Luxton, Gravitational
This won’t be news to many of you, but it’s still a great overview of what modern SSH security practice looks like, and a good reminder for those of us like me who rely too heavily solely on keys that best practice has moved to include certificates for some time.
The article covers:
- Host and user certificates to sign keys
- IP-tables enforced bastion hosts [LJD: I’m not sure a single bastion host is necessary, but certainly being able to identify and automatically check/audit every path in is; if limiting the number of such paths down to one helps you do that, then great]
- 2FA using Google-authenticator
and is a great source to keep handy to point users to.
A visual debugger for Jupyter - Project Jupyter
One of the (not unfair!) knocks against notebooks for research computing is that it at best does not support, and worse actively discourages, good software development process. Testing, version control, modularity, and debugging are all areas where notebooks have been lacking. But the community is aware of these problems and has been working on many of these gaps.
This introduces a first visual debugger for JupyterLab, which can be tested today on binder, with a protocol so that other debugging tools (or front ends) could be supported, and a jupyter kernel which supports the protocol.
It’s early days, and they’re actively seeking feedback, but this would certainly be a very useful tool for those who do complicated research computing and analyses in notebooks.
Our journey to type checking 4 million lines of Python - Jukka Lehtosalo, Dropbox
I love a good migration story! Even if you’re not using the technologies being discussed, we all have to do migrations of one sort or another and it’s great to be able to hear how people thought about and executed the migration, and the sorts of problems that come up along the way.
Dropbox uses a lot of Python, and this article describes their migration to using type annotations in their (substantial!) codebase, and using mypy to do type checking for static analysis, error case discovery, and improving performance. (I’ve argued before that the ability to incrementally move code from prototype to production by adding things like types or tooling for performance is particularly useful in research computing, and is one reason why Python gets used so heavily).
The incremental approach worked surprisingly well for type checking in their quite modular code, especially early on; they built a lot of tooling to gracefully degrade in the presence of un-type-annotated code. They hit a major performance snag in the type checking in the case of cyclic imports, and ended up having to implement a memoizing daemon so that they could truncate recursion in the type checking!
Once the infrastructure was in place they started gradually ramping up the strictness of testing for new code in Dropbox, until type annotations were enforced by process and technology. To support the change effort within the company, they gave talks, built affordances (like editor integrations for many common editors), and ran surveys of their developers to find out what still wasn’t going well.
It’s a great article and well worth reading in full if the topic is of interest. (They also have a nice post on their migration from Python 2 to Python 3.)
Product Management and Working with Research Communities
Google’s Technical Writing Courses - Google
Some of us, particularly those of us who were trained in engineering departments, got technical writing training — but most of us didn’t, and the training we did get was focussed more on reserach papers (which let’s face it is a terrible model for almost any other form of writing besides research papers).
Google has made available two of their internal courses on technical writing. The first course is sort of “Strunk and White for people who work with computers”, which to be clear is a good thing - it focusses on short clear sentences in the active voice, considering your audience, and good paragraphs. The second course focusses more on the organization, drafting, and rewriting of larger documents.
The pre-course materials are available at the link above; the course materials themselves and facilitation guides are available upon registration to their technical writer instruction community. This would be of interest to many of us individually, but also to arrange courses within our own teams or to our broader communities. The course is set up to be taught by people who are roughly peers — you don’t need extraordinary expertise in technical writing to facilitate the course well if you follow the materials. There are also links to external resources.
Design Docs, Markdown, and Git - Caitie McCaffrey
Azure Sphere Security Services used a Word/Sharepoint workflow for drafting, circulating, refining, and approving design documents wasn’t working, so they trialed a move to using markdown and git for their design documents. It was a success, and here they write up their approach.
Not every design document corresponds to just on repository’s worth of code, so they chose to have one single repo for design documents for their organization organization, to support discoverability and large/unconstrained multi-codebase architectural designs.
They used standard github comments for discussion, and pull requests with formal lists of approvers for making changes. Although not every PR requires significant discussion, when they do, review meeting invitations are sent around with a link to the specific pull request being discussed.
One of the biggest benefits I’ve observed is the clarity it has brought to how decisions are made, and durably recording when things are done. On a fast growing team like Azure Sphere having clarity and durable communication are key to successfully scaling the business and the team.
Getting in the room - Will Larson
One problem we sometimes have, when “managing up” within our organization, or trying to work with research communities, is the struggle to be in the right room when decisions are being made. Research computing staff, being a supporting role, can often be an afterthought (if thought of at all) when making plans, funding decisions, or setting strategy — with the end result that sometimes decisions are made which are unimplementable, or implementable only with great difficulty, when it comes time to build technical solutions.
This article describes what you need to get into those rooms:
- To bring something useful to bring to the room…
- …that the room doesn’t already have.
- A sponsor in the room.
- Stay aligned with your manager.
- Optimize for the group.
- Speak clearly and concisely.
- Be low friction.
- Come prepared.
And now to avoid being removed from the because of:
- Misunderstanding the room’s purpose.
- Being dogmatic.
- Withholding consent.
- Embarrassing your sponsor.
It’s also important to know when to leave the room — there are a lot of rooms, you can’t be anywhere, and you have to focus your energies where you can do the best good.
Don’t Just Throw Together a Webinar — The Virtual Events Crash Course You Need - First Round Review
Have a virtual event early, have a virtual event often - this article suggests that if you are thinking of starting a virtual event, you should aim high, put more thought into what you can enable with the technology than just a video of someone presenting slides, experiment with those formats, and keep learning from what you try.
Emerging Data & Infrastructure Tools
PluralSight, amongst other things, has well-regarded online courses on technical topics; these courses are free this month. Some topics of particular relevance to us: AWS, Azure, GCP ; scrum; front-end development if some of your researcher communities need good front ends for portals and tools; Network traffic analysis with Wireshark; Being a Technology Manager (hmm…); Scikit-learn; C++; Golang; training for various security credentials; and Docker.
This is an excellent opportunity to test out that learning platform and start learning some of these technologies, for your team members or your user community.
The Zero Trust Learning Curve: Deploying Zero Trust One Step at a Time - John Kindervag
With almost everyone in our community now working from home, we’re all having a very close look at our VPN infrastructure. While we’re unlikely to be making big changes to that infrastructure any time soon, this will give us a chance to think about how we’re doing things differently.
If you’re thinking of what rolling out Zero Trust would look like — avoiding the VPN approach of assuming a given network is “trusted” and just assuming everything has the potential to be compromised — here’s some advice for someone who has participated in many such rollouts.
The trick Kindervag’s estimation is neither to do a “big bang” rollout, nor even going straight for the most vital infrastructure. Instead, the idea is to have several trial incremental rollouts on less sensitive and vital infrastructure, working your way up to the most sensitive infrastructure early on but not first, and then work on the rest.
New Intel oneAPI DevCloud makes it easier for coders working from home - Rich Brueckner, Inside HPC
So having been burned more than once, I’m still somewhat wary of “one-size-fits-all” approaches to parallel computing like OneAPI, a low-level API that is meant to support CPU, GPU, and FPGA programming via programming languages like data parallel C++. But it has a lot of significant backers and it is worth playing with. Intel’s DevCloud is making these technologies available through the start of July, so if this has been on the radar of you or anyone in your team, this is a good time to start learning…
Conferences
HPC-AI Advisory Council: 2020 Stanford Virtual Conference Agenda - Apr 20, 21
The HPC-AI Advisory council’s 2020 Stanford meeting this month is, of course, all virtual now, and does not appear to have a registration fee.
Distributed Systems Reading Group - Murat Demirbas
Ok, not a conference, but Demirbas, who blogs about interesting distributed systems works, has started a Zoom paper club on distributed systems papers — the first one was on Wednesday. There’s not yet a schedule of upcoming papers but one should be posted to his blog shortly.
Random
The ACM Digital Library is now free and open until the end of June. This is a very useful set of resources.
The presentation Software Engineering Advice from Building Large-Scale Distributed Systems by Jeff Dean at google is circulating around again, and it’s always worth it to take the time to flip through it. The emphasis on Fermi calculations with realistic architectural numbers should be familiar to those with physical science research backgrounds, and it’s a great overview of how to think about these architectural questions from someone who’s thought about a lot of them.
On the more pure math side, I’m not quite sure how to describe the new free book A Singular Mathematical Promenade by Étienne Ghys. Martin Gardner’s old “Mathematical Games” column paired with Gödel, Escher, Bach and a survey graduate course in Mathematics?
Thinking of taking on a new side project? This roundup of Open source, experimental, and tiny tools roundup for game or interactive development contains lots of cute, weird, and interesting small tools to help you get started.
As we build web applications with richer capabilities for data analysis, it starts to make more sense to push computation to the client. Emscripten has long allowed the cross-compilation of compiled C code to Webassembly which allows high performance code execution in the browser, but LLVM’s support for natively compiling to WASM is growing. WebAssembly without Emscripten walks you through some cases.
So I heard you were looking to be able to edit in vim on a rotating, 3-d cube? Great news!
Oh, wrong 3d? I must have mis-understood — I see, you were looking for an immersive stereoscopic terminal experience like with an Oculus? Right, my mistake. Here you go.
That’s it…
And that’s it for another week.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Senior Scientist & Team Leader - Active Sensing Team - European Centre for Medium-Range Weather Forecasts,
The Team Leader will divide their time between the role of providing scientific and technical leadership and in team management, as well as continuing to make their own personal R&D contributions, related to assimilation of actively sensed or in-situ observations.
Director of Bioinformatics - Grail, Menlo Park CA US
In this role, you will head up a team of bioinformatics scientists to analyze some of the largest, richest biological datasets in the world to find biological signals and patterns and guide our assays and develop new methods for future products. You will work with diverse teams including software, clinical, operations, research, product development, and external collaborators.
Director, Product Management (Test Development) - Grail, Menlo Park CA US
Perform trade-off analyses based on corporate priorities, make recommendations to management and work cross-functionally to understand all technical and non-technical roadblocks and resolve them. Stakeholders include R&D, bioinformatics/data science, biostats, operations, clinical development, software product/engineering, commercial, medical affairs, quality, regulatory, legal and senior leadership.
Senior Staff Bioinformatics Scientist - Grail, Menlo Park CA US
Modeling and analysis of large, complex genomics datasets using advanced machine learning and statistical methods to generate predictions and biological insights. Create and perform end-to-end analysis that include design, data gathering, processing, analysis, iteration with stakeholders, and presentation of results. Lead, develop, and mentor a small team of bioinformatics scientists.
HPC Operations Manager - Florida State University, Tallahassee FL USA
The operations manager will support the Research Computing Center (RCC) for a diverse set of research areas, including bioinformatics, economics, meteorology, materials science, molecular biophysics, computer visualization, and others. Mentor, train and supervise a team of staff members assigned to systems administration tasks and lead the management of the operation of all computer facilities within RCC.
Technical Director High Performance Computing Core Facility - University of California at Davis, Davis CA USA
The Technical Director is charged with leading HPCCF system administrators in the development and maintenance of cutting-edge research computing systems with an emphasis in high performance computing, distributed data storage systems, high-speed/low-latency networking, and scientific and research computing.