Research Computing Teams Link Roundup, 18 Sept 2020
Hi -
Sorry for sending the newsletter out somewhat late today. There's a lot of good material in the roundup - noticing change, understanding why your team is doing things you don’t think they should, Rust in science, transferring knowledge, telling stories with video, HDF5, a better tar, and underwater data centres.
As always, reply or email me (jonathan@researchcomputing.org) if there are things you’d like to hear more about, stories or topics you think our readers would like to read, or any other feedback.
Now on to the roundup!
Managing Teams
Noticing Change - Aviv Ben-Yosef
One of the recurring themes of this newsletter is that research computing is important enough to do with professionalism, and that professionalism is nothing more than being deliberate about what you’re doing, while continuously learning from what you and others have done.
Learning from what you and your team has done necessarily means noticing that there’s been a change. And there’s no way to systematically notice improvements - or regressions - without gathering data, taking notes, and otherwise keeping track of changes and their response.
One suggestion Ben-Yosef makes that I’ve been meaning to try is to make “decision logs” as a manager - or even as a team. When you make a change, or a decision, log it and periodically review to see how that decision played out.
From Procedural to New Knowledge: Leveraging Your Team’s Know-How - RC Victorino, Slab
This article comes at an interesting time for our team, where we've been trying to figure out how to do exactly this.
The argument Victorino makes is that "procedural knowledge" - how to do X - is undervalued, and hard to usefully document. One of your team members learns how to do the thing - say how to (in our team's case) set up a particular test suite, or configure VS code to develop in a remote container, or convert certain workflows to a new tool. Now, sure, they can probably write up a document that has all that information in one place, but many of those facts probably already existed in other documentation - it still takes someone new real time to learn how to do it.
But this procedural knowledge is actually really useful, and is a stepping stone to more valuable knowledge. Now that people have learned skill X, they can use it in all sorts of ways -- "Hey I can use X to speed up that thing Y that's been holding it back". So transmitting it within (or beyond!) the team is valuable. And the best way to do that transmission is with shared experiences - pair programming, talks, etc. Anyway, it's a good article and if it interests you, you should read it.
From our teams point of view, this would also have the advantage of helping the team grow together, would help more junior team members get practice giving talks and mentoring, and potentially put videos up of their talks helping grow their (and our project's) visibility. So we're going to try to figure out a way to do this - current best idea is to batch 2-3 short (10-15 min) talks/demos/workshops on related topics into mini "conferences" for mainly internal consumption but with relevant people from interacting groups invited.
Has your team tried similar things? How has it gone?
Your Values are the Rules You Break - Stephen Prater
When they don’t know what to do, they’ll do what they know. - Marcus Blankenship
These are two interesting articles on understanding two different reasons why your team may not be doing what you are telling them to do. They’re probably not intentionally thwarting you or ignoring you; they’re probably responding to different incentives.
Prater’s article focusses on larger picture concerns. Say your organization, or even you, routinely tell people you “value” X — high quality code, collegiality, inclusion and diversity — and want them to, too; and yet they repeatedly break those rules and behave differently. The issue here can very easily the difference between those stated values and the actual values or behaviours of you or the organization. If you “value” high quality code but actually reward or reprimand based on meeting overly-tight deadlines. you aren’t going to get high-quality code. They’ll break that rule in favour of the real values, cranking out whatever to get stuff out the door. Similarly, if the organization “values” inclusion and diversity but doesn’t actually act in a way that brings in people from diverse backgrounds and truly include them, they’ll break those rules too because they see that those “values” aren’t actually lived up to.
The second article is more focussed on your actions as a manager - if a new project, task, or behaviour isn't clearly and continuously communicated, with instructions and examples as to how to Do The Thing, people will inevitably drift back to what they know how to do. Because then they're doing something, instead of sitting around being uncertain. Part of a being a manager is communicating about something in a zillion different ways until you're sick and tired of hearing it - because it's just about then is when it's starting to get heard by people.
Managing Your Own Career
Bucketing your time - James Stanier, The Engineering Manager
We've talked about organizing tasks in buckets before - In Issue 37 I've mentioned my experiments with trello, and in Issue 39 I linked to an article about having a "dashboard" that covers both issues, things to keep an eye on, and future-looking work.
This is a nice article about why I find these approaches work well for me - it's a way of systematizing the discipline of not just getting lost in the day-to-day while also highlighting important-but-not-urgent tasks at a variety of timescales. If this something that you wrestle with too, this is a nice article to read. I've certainly found being able to keep track of "today/this week/coming month/coming quarter" tasks useful to keep my eye on the ball. Stanier also distinguishes between recurring tasks and one-off tasks, as a way of showing what tasks would be more valuable to delegate.
OffBoarding as an Engineering Leader - Iccha Sethi
We’ve had a few articles here about your-first-90-days at a new job; this is an article about your last days as a manager as you move to a new position.
Sethi mentions several areas to focus on:
- Your team - informing them, and documenting pending performance issues, salary, equity, or promotion status, and then informing them of the departure
- Documenting the status of the team as a whole and their projects
- Documenting the status of any projects or initiatives you were pushing for
- Stakeholders - peer teams, and for us research groups - informing them and having a clear plan to hand them off to a specific individual
- Documenting a clear transition plan
There’s a lot to do! The good news is that if you’ve been following the practices we’ve been recommending in this newsletter - one-on-ones with notes, regularly quarterly planning and reviews, project documention, and the like - many of these tasks become much easier. And if not - well there’s no need to wait until you’re going to start a new job!
Modern Data Engineer Roadmap - Alexandra Abbas
One thing I’ve learned from maintaining the research computing manager job board is that there are a lot of “Manager, Data Science” or “Manager, Data Engineering” jobs out there, and that a lot of research computing managers could pivot into those kinds of roles. Here’s a proposed roadmap of technology expertise that would be needed to run a modern data engineering team; note that a lot of them will be very familiar to those working in modern research computing environments.
Product Management and Working with Research Communities
Storytelling for Nonprofits: Using video to tell your story - Chiara C, Tech Soup
One of the areas research support groups can learn from nonprofits is how to communicate important ideas, regularly, on a shoestring budget. Most research computing groups do very little on this; but with a very modest amount of effort, consistently applied, and an even more modest amount of money, one can reach and influence a large number of people, drawing them to your resources, courses, and services - or their equivalents at their home institutions.
This article gives a quick rundown on the importance of story telling for nonprofits, and some resources for use. The list of useful resources would be a little different for educational institutions (which generally don't get the same price cuts as nonprofits), but it's still a good resource list. The other thing I'd add is that there are a number of services out there that will help you build short animated explainer videos or promotional videos for a very modest amount of money. I used renderforest - which was fine but there's lots of others out there - to make a quick explainer video for our current project which has worked very well, for a total cost of one afternoon and $27 USD.
European Commission Declares €8 Billion Investment in Supercomputing - Oliver Peckham, HPC Wire
State of the Union: Commission sets out new ambitious mission to lead on supercomputing - European Commission
The EC has proposed a new, significantly larger, tranche of funding for supercomputing, expanding and extending the 2018 EuroHPC Joint Undertaking, as a way of underpinning other R&D goals. The funding, from 2021-2033, would include hardware, software development, and also has a significant "digital sovereignty" component. From the HPC Wire article:
“I am pleased to announce an investment of 8 billion euros in the next generation of supercomputers – cutting-edge technology made in Europe,” van der Leyen said. “And we want the European industry to develop our own next-generation microprocessor that will allow us to use the increasing data volumes energy-efficient and securely. This is what Europe’s digital decade is all about!”
Research Software Development
C++20 Is Now Final, C++23 at Starting Blocks - Sergio de Simone, InfoQ
C++20 is now finalized, and you can expect to see increasing levels of support in the newest version of various compilers. Big new features include:
- Modules - significantly improving C++ modularity and namespacing
- Coroutines
- Traits - implemented as templates with constrained types
Performance of the Versioned HDF5 Library - Melissa Weber Mendonça, QuanSight
HDF5-UDF - Lucas C. Villa Real, Gerd Heber
Lots of interesting work going on with HDF5 lately. Last issue we talked about HSDS, an HDF5 data service on S3-like object storage; two weeks earlier in in issue 39 we introduced versioned HDF5.
This week, two HDF5 articles - first is giving a performance summary of the Versioned HDF5 library in terms of both time and space. The performance is quite good! No obvious overheads on the file size, diffs are handled quite efficiently if the chunk size is chosen correctly. The speed performance is also better than I would have expected.
The UDFs are something else - user defined functions compiled and stored in the HDF5 file, which allow for “views” of the data, or processed results, or even synthetic/procedural data - anything you’d like to implement. Wrappers for Lua, Python, C/C++.
Seven technology leadership lessons from TV show writing - Daniel Jarjoura, TLT21
The community has built a lot of analogies between software development and engineering, but engineering isn’t the only discipline where people have to work together to build complex and intricate stuff under tight deadlines and shifting requirements. Jarjoura tells us seven lessons from successful show runners that he believes carry over to computer systems or software development teams
- They know their show and tell everyone what it is - there’s a common, shared, and continuously communicated vision.
- They create a safe space - show runners need to create an environment where everyone feels safe to share so the best ideas can surface, even in an environment where dozens and dozens of ideas will get knocked down before one or two are chosen to work with
- They make writers pitch - no episode gets written just because “it’s Joe’s turn” or “because Jessica said so”
- They give everyone a chance to talk - along with 2, show runners make sure everyone speaks and has their voice heard
- They combine creative thinking and passion
-
They rotate writers and make them work collaboratively - well before “pair programming” was a thing
-
They write and rewrite quickly
The same analogies over and over again can be limiting, and I like the idea of borrowing from other successful industries
Rust in Science and ever-changing requirements - Amanjeev Sethi
Co-worker Sethi writes here about scientific programming, the need for prototyping and adapting to changing requirement. He argues that a statically typed language - and particularly one like Rust which is very particular about a variety of correctness checks at compile time - has advantages over languages like Python even for protyping:
This is where I would conclude that if you are starting a journey and are sure that the things will change many times over, you may be better off with a language that: gives you a solid structure to build upon, with tools to warn you when parts of that structures are moved, by giving your advice on “why”, “where” and “how” to do that.
Research Computing Systems
Microsoft’s underwater server experiment resurfaces after two years - Chaim Gartenberg, The Verge
Microsoft sank a data center the size of a shipping container 2 years ago in a wild experiment and just brought it up to see how it went - Mary Meisenzahl, Business Insider
Microsoft proves practicality of renewables-powered underwater data centres - Renewables Now
That's not a water-cooled datacenter; this is a water-cooled datacenter...
Microsoft submerged a small containerized datacenter under a Scottish sea (the Loch NAS monster? Has anyone done that yet?) with over 800 servers and 27 PB of storage, for two years. The idea is that actually underwater, the conditions (temperature, humidity) are extremely constant, so there should be less thermal shock, etc to components - plus of course the ease of radiating heat out to the entire lake. They claim that they see 1/8th the component failures as in their more terrestrial regular data centres - which is a good thing, because they can't just send someone down in SCUBA gear to swap DIMMs.
Gartenberg’s article concentrates on the Scottish experiment, Meisenzhal’s goes into a little bit more background (and a lot more pictures) about the history of the experiment (including an earlier Pacific Ocean test) and shares that another reason for longer equipment life is that the container was filled with nitrogen gas, reducing the atmosphere content. Microsoft will look at how the atmosphere in the container changed over time.
The Renewables Now article adds the additional colour that the datacenter was powered solely by renewable energies.
Archivetar - A better tar for Big Data - Brock Palen
Long-time research computing podcaster and Director, Advanced Research Computing – Technology Services at UMich introduces us to his ArchiveTar which wraps tar to solve problems with tar'ing large project directories that include both big data files and small code files to archive. It uses mpiFileUtils to parallelize scans of large number of files, clumps small files together in a number of tar files to keep object counts low while making it easier to pull out that one file the researcher wants, and provides a manifest.
Emerging Data & Infrastructure Tools
Data Cleaning IS Analysis, Not Grunt Work - Randy Au, Counting Things
Au's article can be summed up in one pull quote:
The act of cleaning data is the act of preferentially transforming data so that your chosen analysis algorithm produces interpretable results. That is also the act of data analysis.
Again, professionalism is doing things deliberately. I think we tend to get sloppy about things that are "just" data cleaning or "just" having decent uptime or "just" putting together a script - but these things matter.
Data cleaning in particular requires pretty deep expertise in both the data generation process that lead to the data, and the data analysis steps that will come after it. It may not be glamorous or exciting, it may not use sexy recent algorithms or cool frameworks, but it very much is, in and of itself, data analysis.
Calls for Proposals
Sixth International Workshop on Serverless Computing (WoSC6) 2020 - Call for Papers due 28 Sept
Topics of interest include but are not limited to:
- Infrastructure and network optimizations for serverless applications
- Debugging serverless applications
- Programming models
- Use cases, experiences
- Benchmarks
- Cost models, pricing models, and economics of serverless
- DevOps
- Other topics related to serverless computing
HPC-Europa3 - Next call for applications - 12 Nov 2020
The next call to the HPC-Europa3 program for travel for HPC-based scientific collaboration is open, with applications due in early November. This is a really interesting program and one I wish there were analogies to in research computing more broadly (not just HPC) and in North America.
Events: Conferences, Training
US Food and Drug Administration: 8th Annual Scientific Computing Days (SCD) - 29-30 Sept
A recurring theme of the pandemic is that the rise of virtual events means that events that never would have even been on one's radar suddenly become interesting possibilities. The 2-day meeting, usually pretty internal to the FDA, has a broad range of food and drug related talks (and an extensive poster session that may be of interest to some readers.
Random
Now that C++20 is final, Microsoft Visual Studio now finally supports C11 - and C17.
Five ways to undo a commit in Git.
The incorporation of typing into Python continues to allow a lot of cool tools, such as Nagani, a static verifier for Python modules.
Robustness testing with fuzzing for scientific codes continues to get easier with tools like libFuzzer which allows you to fuzz-test libraries built with LLVM compilers.
Jailbreak your TI-CE calculators.
We sometimes underestimate the importance and value of research papers - and we often underestimate their influence. Even wikipedia pages can be hugely influential - a randomized control trial showed that edits to chemistry wikipedia pages can not just change citation rates but arguably research directions (via twitter).
There's finally a Numpy paper - and it's in Nature. On the one hand I'd like other kinds of research contributions to be recognized, not just papers; but on the other, a high-profile journal like Nature publishing a software article is huge.
Now it's official, all of the major cloud companies are making strong plays to be able to serve all areas of research computing, including HPC - Google Hires Longtime Intel Exec Bill Magro to Lead HPC Strategy.
The GitHub CLI is now out in v1.0.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
Jobs Leading Research Computing Teams
Highlights below; full listing available on the job board.
Data and Systems Architect - Fate Therapeutics, San Diego CA USA
Fate Therapeutics is seeking an experienced Data and Systems Architect with biotechnology domain experience to develop and oversee implementation of an innovative and flexible data strategy / architecture to enable the storage, organization, analysis and reporting of data in alignment with the scientific objectives across various business units supporting Fate’s growing pipeline of clinical programs. The successful candidate will collaborate extensively with research scientists, clinical data managers, data scientists/analysts, and vendor representatives and take an active role in developing and aligning data standards across the Company. This role will provide the thought leadership, facilitation, analysis, and design necessary to develop a robust data and information architecture focusing on clinical and translational research data, with the flexibility and scale appropriate vision to support multiple cross-functional business units and solutions. The successful candidate will balance multiple projects, analyze business trends and strategy, as well as change requirements to guide solution decisions.
Lead Data Scientist - GSK, Brentford UK
With a stretching plan for company growth and as part of an ambitious Data and Analytics (D&A) transformation, ViiV has decided to make a significant investment in building advanced analytics capabilities.
Lead technical aspects of data science and analytics, including building, evaluating and deploying predictive analytics models and deep dives.
Champion a broad range of data products to ensure business impact (e.g. multi-channel marketing optimization, next best action, recommender systems and deep-dive visualizations)
Collaborate with team-members in code reviews and critical feedback sessions to continuously improve product delivery
Manager, Scientific Computing & HPC - Pfizer, Andover MA USA
The Scientific Computing Manager role is a very important role required to support Pfizer’s drug discovery strategy. This role will provide key scientific application user support and management of Pfizer High Performance Computing (HPC) & Scientific Computing applications. She/he will work with the rest of the SciComp engineering and development team to develop, deploy, and support tools that scientists use to run complex simulations, sequence analysis, statistics, and AI/ML models on our large supercomputing environment. This scientist facing role will provide expertise with managing bioinformatics applications, databases, automation, and tuning apps to run on HPC environments. This includes applications like R, Python, Rstudio, databases (Oracle, MongoDB, NoSQL), Matlab, Jupyter Hub, genomic sequencing, image analysis, compilers, and other scientific computing utilities
Director of Cloud Operations, Center for Translational Data Science - University of Chicago, Chicago IL USA
The job manages a team of professional staff responsible for designing automated, scalable, and rapidly deployable solutions to infrastructure development and server configuration. Manages the provision of hands-on maintenance for production servers as well as Windows and Linux servers.
1) Manages a single team's progress by maintaining accurate and up-to-date logs, ensures that all projects have the necessary management oversight and approvals for successful completion., 2) Ensures the implementation of approved best practices and information technology policies that result in the highest quality systems administration., 3) Manages the creation of standards and procedures to maintain production servers that run the operating system. Manages the installation, configuration, and maintenance of operating systems and utility software., 4) Performs other related work as needed.
Manager / Azure Data Architect, Omnia AI - Deloitte, Toronto ON CA
You will be helping our clients to resolve their most complicated data & analytics problems to build, maintain, improve or re-architect solutions on Azure Cloud. You will be working and leading cross functional teams on architecting, optimizing data systems and building them from the ground up. On an average day you will help our clients understand advantages and disadvantages of specific Azure Data architecture choices and provide subject matter expertise and lessons learned for your previous projects. Most importantly you will develop highly efficient teams of internal resources and guide their development journey on Azure cloud.
Manager Clinical Data Management - Abbott Point of Care, Ottawa ON CA
This role is responsible for clinical data management (CDM) strategy, activities and operations for new and ongoing clinical research studies for the division. Has primary management responsibility directly or through subordinates for all data related to clinical studies. Responsible for the integrity of the processing and evaluation of clinical data. Identifies and implements the most effective, cost efficient and best business practices to execute processes and continually evaluates their effectiveness and appropriateness. Ensure that quality of services meets internal and external customer requirements and reports status to senior management. Responsible for identifying and implementing new, updated and/or enhanced systems for data collection for clinical studies
Manager, Research and Data Science - Bank of Montreal, Toronto ON CA
Applies knowledge of advanced analytic algorithms and technologies (e.g. machine learning, deep learning, artificial intelligence) to deliver better predictions and/or intelligent automation that enables smarter business decisions, improved customer experience, and drives productivity. Applies strong communication and story-telling skills to summarize statistical/algorithmic findings, draw business conclusions, and present actionable insight in a way that resonates with business/groups. Drives innovation through the development of Data & AI products that can be leveraged across the organization and establishes best practices in in alignment with Data & AI governance frameworks of BMO.
High Performance Computing - Director - Modis (Recruiter), Sydney NSW AU
Bachelor's degree in Computer Science, maths or related field
7+ years' experience minimum with large-scale HPC systems
2+ years of hands-on systems architecture, infrastructure engineering, software development, solution architecture or support of research software engineering
1+ years' experience with cloud computing as it relates to HPC.
Good understanding of management and performance optimization techniques associated with large-scale computing system
Executive speaking and presentation skills - Formal presentations, white-boarding, large and small group presentations
Senior/Principal Solutions Architect - HPC - Amazon, San Francisco CA USA
As a trusted customer advocate, you will help organizations understand best practices around advanced cloud-based solutions, and how to migrate existing workloads to the cloud. You will have the opportunity to help shape and execute a strategy to build mind-share and broad use of AWS within enterprise customers. The ideal candidate must be self-motivated with a proven track record of customer obsession and delivering results. The ability to connect technology with measurable business value is critical to a solutions architect. You should also have a demonstrated ability to think strategically about business, products, and technical challenges in HPC.