Research Computing Teams #127, 25 June 2022
Before we get started, please answer if you can a question from a reader: What do you use to get and keep track requests for your work from researchers? Is there a ticket-tracker type piece of software that works for you, or do you use something else? Does it fit your workflow - What do you like, not like, and wish for?
I’ll summarize results next week - it’s an important topic!
One of the great things about this community is the different kinds of research computing and data teams I get to talk with, “here” or outside the inbox. Sometimes in these conversations it’s pretty clear that some teams, or the team and their funder, or a team and I, are talking a bit past each other. And that’s usually because they or we are (currently) operating with very different mental models.
Some research computing and data teams are operating as Utilities, and see the world through that lens; a growing number are operating as Professional Services Firms. Different groups are at different places along that very abrupt transition. Some kinds of groups (like bioinformatics cores) are much more likely to already be operating in service mode, while others (like research compute infrastructure teams) are more likely to still think of themselves as utilities. It varies from place to place, though, depending on local conditions. But they’re very different models!
Utilities, like power companies or garbage collection or municipal potable water, were really the only sensible role models for the first decades of research computing and data teams. Those teams were entirely about operating large equipment purchased from vendors. Costs were mostly a big capital expense. Everyone who needed the utility needed the same thing - undifferentiated flops and bytes, or 60Hz 120VAC. Because everyone needed the same thing, economies of scale led to natural monopolies; the most reasonable provision model was for the local jurisdiction/institution to own or control a single operator. Differentiation or strategy, or gaining new customers, weren’t meaningful discussion topics. Innovation happens slowly, top-down, at the industry-wide scale (“hey, did you hear about those new gas compressors Dyneco announced?”), and diffuses outwards. Employees take pride in and the organization values operational skill and things ticking along smoothly. Customers value reliability. The only thing that matters for any individual operator is to operate effectively and to provide the standard service with the right amount of cost: high enough to absorb the available subsidy, low enough to not go broke. If a customer needs something other than what the utility provides, rather than that being a market opportunity, it’s either an inconvenience or an irrelevance. The power company or the water utility or the old phone monopoly just doesn’t serve that need.
Professional Service Firms — say engineering firms, or architects, or consultancies — are very different beasts. They might very well have significant capital investment in specialized equipment, but their main selling point and their biggest cost is expertise. Competing for and retaining that expertise, and developing that expertise in house and amongst their clients, are principal concerns. As part of a “full-service” offering they they likely have some fairly standard services they offer at the low end, where operating cost and efficiency is vital. But what the organization values, and the employees enjoy, is at the high-touch end — getting deeply involved with the client work, and being as much a collaborator or partner or “trusted advisor” as a service provider. Different clients want very different things, and that high-touch high-expertise work is specialized and labour intensive, so the firms themselves need a clear focus; they can’t meet all needs. Clients can go elsewhere, so there is redundancy and competition, but less than you’d think at a distance. In civil engineering a geotechnical firm is complementary, not competing, with one that specializes in water resource engineering.
As in the rest of our lives, in research computing we need to have some utilities. As research data management matures, institutional or regional data depositories become mature and “enterprise” enough to become utilities, likely run by IT or the Library. Teaching or CI/CD or MLOps resources for data science or software development are likely best served by this model.
But as research computing becomes broader and faster changing, we need more professional services firms, too; nimble groups specialized to particular needs and ready to adapt as those needs change. As even infrastructure is becoming less one-size-fits-all, and methods for making use of computing and data for diverse fields grow more complex and expertise intensive, the preconditions for the utility model are met in fewer situations than used to be.
A lot of research computing teams are interested in providing something more like professional services, but were created in the Utility model, and are stuck there by their funders. The institutional or external funders still have this very specific (and to their mind time tested and successful) operating model in their plans. And utilities are funded very differently than professional services firms. At utility scale, it doesn’t make sense to outsource things, or develop non-standard services (who wants non-standard power coming into their house!) Funders requirements on eligible expenses may focus almost entirely on the capital spend, and not on operating funding that’s needed to make effective use of the capital, or to be more agile in how services are delivered.
Even those teams who aren’t being held back by funders and who want to make the switch to professional services from their original utility model find it a hard transition. There’s no obvious, incremental path to go from providing a standard, stable commodity to changing, specialized, bundles of expertise. Their existing customers don’t want change, and new customers aren’t yet interested in getting appliance suggestions from what they perceive to still be the power company.
But research computing and data is changing, increasingly quickly, and the Utility approach only meets a piece of these growing needs. Navigating the transition isn’t going to be easy, for RCD teams, leaders, or funders; but expressing it clearly and talking about it more will maybe mean we’re not talking past each other so often.
With that, on to the roundup!
Managing Teams
RCT Job Definition Worksheet v1: Letter, A4 - Research Computing Teams
I’m going back through past issues and trying to group the most useful resources by topic (jeez, folks, I write a lot of stuff in these.. is it too much?). A common topic, and a common problem people talk with me about, is hiring and onboarding. There’s two common threads I see:
First: one huge difference between our work and that of (say) IT or large tech firms is that figuring out what the job is even supposed to be takes some doing. Our teams aren’t large or static enough to have pre-existing cookie cutter roles requiring previously documented skill sets. (Startups are more like us this way.) The work is fluid. Each hire, each job, is a bit of a one-off, a bit bespoke.
Second: Hiring and onboarding (and, for that matter, early evaluation of performance) are the same process. Once we’ve had an offer letter accepted, we don’t dust off our hands and declare our work done. The reason we hire someone is to have that person become successful at some work that needs doing. We need to bring them up to speed, and (for both their sake and ours) to have a clear progression of goals for them during that process. The best time to figure out what that would look like is during the process of defining the job, in parallel with writing up the job ad and the interview evaluation process.
So I’ve distilled what I could from resources that have shown up here, combined with things I’ve used in the past, into a worksheet. The goal is to have something the hiring team can hash out together. This is version one - feedback welcomed! (Feedback is a gift). The section headings aren’t rocket science, but the order matters:
- Figure out what the job is for
- Define what a succesful hire at the end of onboarding looks like
- Back out from that an onboarding progression
- Then, from that, figure out the must-already-have job requirements,
- Then figure out the must-be-able-to-learn requirements
- Then figure out what support will be needed during the onboarding process
Iteratively hashing this out with the team will help develop consensus about what the job is and how to interview. And at the end you’ll be have the beginning of an onboarding plan. You’ll also be able to include clear 30/60/120 day goals in the job ad. Good candidates appreciate those goals: those alone offer more clarity about the job than most entire job ads do.
Product Management and Working with Research Communities
Ten simple rules for improving communication among scientists - Bautista et al., PLOS Comp. Bio.
This paper is intended for scientists - especially but not solely juniors - to get more connected and integrated with a scientific community. They outline seem particular approaches, and (importantly) the idea that there should be some clear goal or goals in mind when applying them.
The same approach and methods are useful for teams operating in a more “professional services” styles, where connecting to your current and potential clients and learning their needs, as well as making yourself known, is vital. Power companies or the old phone monopolies don’t need to learn about their customers.
The suggested approaches re:
- Know your audience
- Use social media
- Listen to how other scientists present their work
- Network with scientists and ask for feedback
- Get involved with scientific organizations
- Create opportunities to practice public speaking
- Organize scientific meetings
- Identify and enrol in scientific activities
- Collaborate
- Pace yourself; don’t overcommit
Women are Credited Less in Science than are Men - Ross, Glennon, Murciano-Goroff et al, Nature
Who gets credit for science? Often, it’s not women - John Timmer, Ars Technica
Timmer’s article in Ars Technica is a good summary of and pointed me to the Nature article, which looked at data covering nearly 10,000 individual research teams and over 125,000 people, cross-linked to databases of publications and patents. The results are pretty grim - women are systematically less often ever made authors than men. Maybe worse, the disparity is worst for more junior people (including research staff, like many of us, where the disparity is close to a factor of two), all but ensuring that women find it harder to become more senior with fewer authorships on their CVs.
Lots of good shell tutorials out there for researchers or new-to-the-terminal staff, but most of them cover the same things in the same order. Effective Shell is one I haven’t seen before, and the order is quite different: it covers working with the clipboard right away (pretty handy for a shell tutorial so people can copy-and-paste lines - right?), then simple pipelines, then command line short cuts, so people can start to feel efficient and fast, and only then the rest of the usual stuff. Really interesting.
Cool Research Computing Projects
Here’s a case study of very successful software project that’s both a teaching and learning tool for introductory software development (originally just in Python but now also in Java, C/C++, JavaScript, and Ruby) and used in researching how computer programming is learned and taught most effectively.
Guo credits Python Tutor’s success - both in number of users and longevity - to staying extremely simple and focussed, insisting on a clear vision of building a single useful thing. He added capabilities only where it was clearly worth the development effort and was aligned with that vision; ignoring most user’s feature requests is one of his ten recommendations (listed below). This singularity of focussed was both made necessary and possible by the fact that he is the only developer of this tool, and it was a side project when he wrote it in grad school. Besides the number of users, it has been used as a key part in 55 papers that he knows of.
Research Software Development
You don’t need a Platform, you need One Thing That Works: All carts, no horses - Paul Craig
Enterprise architecture is dead - Sean Boots
Companies Using RFCs or Design Docs and Examples of These - Gergeley Orosz
The first two articles came out this week from the Canadian Public Sector blogosphere. While they are aimed particularly at govtech, they’re also relevant cautionary tales for large efforts in the broader public sector, including many of us.
There’s a huge tendency for technologists who have been around a while, when put in charge of a large effort, to start from the big picture down. That’s true of a national effort or of a large single project. This tendency is a version of the the second-system effect. If we’re asked to build something that will eventually be big, the final state of a successful effort is likely a platform. So there needs to be standards! And well-defined interfaces! And a coherent overall architecture! Those will be necessary to accelerate the platform’s success!
Those are good things, but that’s the wrong order. Having any success at all is very much not a given. Standards and interfaces shouldn’t come first. There are many, many, many stories of large efforts that started with coherent overall visions failing to ever materialize at all. Way more than the number of stories of successful small working things made by one organization failing be work together somehow over time. Yes, after-the-fact refactoring and integration may be a lot of not very fun work. But when the small things are things we built, things we have under our control? That’s work we know how to do. It’s work that we only know is worth doing when the small things are already useful.
Both articles are excellent. They’re hard to summarize, except Craig’s article has some pithy categories for failed platforms:
- Platforms for nothing
- The sorry-you-have-to-wait-for-it Platform
- Diagram-driven Platforms
and for how to avoid them
- Planning for change with “sacrificial architectures”
- The Rule of Three (wait until you have three actual useful successful things to get working together to start thinking about standards, and then only as much as necessary)
- Smartly scoped “vertical” services (vs “horizontal” platform infra).
- Services first, platforms later.
(I’ll just add that we hear about these failures in government because of transparency laws. These failed platform or “enterprise architecture” efforts happen all the time in the private sector. In the government, it happens under media spotlights. But the private sector has the luxury of being able to wait until nightfall, roll the failed effort up in a carpet, and sneak it out the back door to a nearby swamp for disposal. Failed academic initiatives sadly usually get the same discreet treatment, making it hard to learn from past mistakes.)
The third article points out that while a lot of large tech companies do in fact have very sophsiticated and coherent architectures, they’re often bottom-up or at least collaborative efforts rather than being imposed top down. Orosz has a list of > 100 companies that he knows of that use some sort of request-for-comments process for architecture, with examples from companies like Google, HashiCorp, and others.
Research Data Management and Analysis
Many researchers were not compliant with their published data sharing statement: mixed-methods study - Mirko Gabelica, Ružica Bojčić, Livia Puljak, J. Clinical. Epidemilogy
A sobering reminder that we just can’t rely on individual researchers if we want a research ecosystem where data is shared and reused. We demonstrably need FAIR data repositories.
Authors from 1,792 manuscripts where its was stated that data was available upon request, only 7% actually responded and shared the data. 1,432 didn’t even bother responding. The reasons given from the 132 that did respond and said no were either clearly inadequate, or were reasons that would have been known at time of publication so they had no business having said that data would be available by request in the the first place. Michael Hoffman has a good tl;dr summary on twitter.
Open source data-diff - Gleb Mezhanskiy, Datafold
Another use case for pushing operations to the database for efficiency (and to avoid writing code). Data fold’s open source tool data-diff will diff columns between databases (even comparing, e.g., a column of Postgres to a column of MySQL or Snowflake or BigQuery). In the usual case where the data is mostly the same and you’re looking for a small number of different rows, it is is vastly faster than pulling the data out of the database to compare. It divides the dataset into chunks, and hashes an aggregate of the chunk, then doing an n-ary search to find mismatch rows if the chunks differ. Great for e.g., finding issues with automated data replication, or testing to see if a local copy of a dataset has any out-of-date records. Very cool!
Research Computing Systems
Effective Cybersecurity for Research - William Drake, Anurag Shankar, Centre for Cybersecurity Research, Indiana University
The EDUCAUSE Library has a very nice and clear case study written up by Drake & Shankar about their work at Indiana University. They describe the work putting together SecureMyResearch, as part of a larger effort to offer a coherent approach to securing research computing and data at the University. (I’m very impressed by the clarity of the SecureMyResearch landing page. It makes it clear who it’s for, what they offer, questions they can help with, and how they can help). They developed ten principles for their work - starting with focus and success, with the other eight all about people and process, not particular technologies.
The whole effort required only two net new staff, along with a clear person in charge, accountability, and institutional support (the last, admittedly, a mandate issued last year by the US government helped with). Leadership and staff had to coordinate across a number of departments and functions to get the work done. And of course communicating with the wide range of researchers took effort. I’ll highlight one point too often missed:
The lesson taught us to ask, “What is it that you are trying to achieve?” instead of bragging, “Look what great toys we have for you.”
Emerging Technologies and Practices
There’s now an organization under the Linux Foundation, joined by most of the big players, to help standardize the programability of SmartNICs/DPUs/IPUs. That’s terrific news and will speed adoption while making sure research computing teams don’t have to worry about lock-in. One of their first tasks listed is to “Define DPU and IPU” which by itself would be welcome.
Competition in the AI code completion space. Along with the news that GitHub Copilot (which I continue to enjoy) has gone GA and will charge $10/mo, here comes another - Amazon CodeWhisperer, baked into AWS’s IDE products. It’s clearly learned from some negative press about copilot. CodeWhisper will check to see if the code generated is a direct copy of code in its training set, and show the licence of that code if it is. It will also do an an automated security check of the code generated.
(PS - the same internet-lawyer legal discussions about Copilot have popped up now as when it was first announced. Open source advocates, and especially those of us in research, should be very careful what we wish for. Asking copyright law to be so strong that data mining on large datasets is automatically a copyright violation has knock-on consequences. Do we really want Elsevier and Springer to be solely in charge of data mining the medical literature? Exactly how hard do we want the legal battle against sci-hub to be? It’s easy to say “that’s different” on the internet, but finding or enshrining those distinctions in the law may not be so simple.)
Random
Debugging warstory of a weird bash pipe problem.
What Canva learned from putting a half-million files in one git repository. It seems like it would have been easier to just not do that?
Dangit, Git!?! - a cheat sheet of un-fubaring yourself from common git mistakes.
If you’re going to stick with Copilot, here’s how you can use it in Vim, using a container.
There are more search engines out there than I realized - an overview of search engines with their own indices.
This is interesting - Tailscale now has its own ssh support, using tailscale authn without keys.
Fortran’s community-driven stdlib now has hashmaps. This is very much not the Fortran I grew up with.
I’ve mentioned some cool SQL-against-data-in-its-native-format tools in the newsletter, but of course you can query CSVs with sqlite and duckdb directly (and if you’re going to be doing analytics on the CSV, you really want duckdb - analytics is its job).
Fuzzy searching (and bookmarking) your shell history using fzf.
Old, but I hadn’t seen it before. An entire working computer - RAM, ROM, an ALU, and a simple assembly language - to play Tetris, implemented entirely in Conway’s Game Of Life.
Running Doom on an Ikea smart lamp.
Interesting bounty program for submitting PRs to address issues with urllib3 - much finer-grained funding than, e.g., hiring a dev.
A collection of text-format-visualization tools, including for JSON, regex, YAML, SQL, Git, Docker container ops, and Kubernetes.
How CVSS vulnerability severity scores work.
PXE boot from various operating system installers using a single tool over the network - netboot.xyz.
Fixing issues with docker builds of python apps on Apple silicon (and, I assume, other Arm platforms).
Distributed postgres with the now open-source Postgres Citus plugin.
A run-time type checker for bash.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 134 jobs is, as ever, available on the job board.
Director of Research Computing, Faculty of Arts & Sciences Computing - Harvard, Boston MA USA
Harvard University’s Faculty of Arts & Sciences (FAS) seeks a Director to lead its FAS Research Computing (FASRC) organization. The newly reconceived Director position reports jointly to the Assistant Dean of Research in the FAS Division of Science and to the Vice President for Information Technology & University Chief Information Officer in her capacity as FAS’ chief information officer. The Director will provide leadership in the ongoing development and management of research-computing resources for faculty across all of FAS’ divisions and thus will play an important role in advancing Harvard University’s research mission. The Director of FASRC is responsible for providing strategic leadership, fiscal stewardship, and operational oversight for FAS Research Computing. The Director is responsible for broadening the use of FAS computing resources by lowering barriers to entry to HPC and performing outreach to a broad range of faculty from multiple academic disciplines across the School.
Research Software Engineer Team Lead - Queen Mary University of London, London UK
IT Services at Queen Mary University of London is seeking to recruit a Team Leader for our Research Software Engineering team who will support high quality code development support for research applications. They are also responsible for the promotion and demonstration of good software engineering tools and practice to researchers.
Research Computing Architect - Sacremento State, Sacremento CA USA
Under the supervision of the Director of IT Infrastructure & Identity, the position’s primary role is to be responsible for the architecture, design and deployment of Scientific Computing’s computational and data science ecosystem. This ecosystem includes high-performance computing (HPC) systems, research databases, and research network connectivity.
Senior Project Manager, Advanced Research Computing - University of British Columbi, Vancouver BC CA
The Advanced Research Computing (ARC) Office at UBC is an institutional-dedicated service for researchers across all disciplines working on questions that have large data and computational needs. The Senior Project Manager, ARC position is a part of the ARC Office Administration Team and will work under the direction of the Director, ARC. The Senior Project Manager, ARC manages all aspects of the development and implementation of large, multifaceted projects for the Advanced Research Computing (ARC) department. The incumbent will, on behalf of the Director, ARC, direct the planning, implementation, execution, control, and completion of moderately complex to complex digital research infrastructure (DRI) projects, in alignment with UBC ARC strategy and goals.
Sr. Product Manager, Tech, Center for Quantum Computing - AWS, Pasadena CA USA
The Amazon Web Services (AWS) Center for Quantum Computing in Pasadena, CA, is looking to hire a Sr. Product Manager to lead our hardware product effort. Quantum computing is an exciting, nascent technology, with the potential to revolutionize computing in areas such as advanced materials development, chemical engineering, biotech, data security, finance and logistics. As head of product for quantum computing hardware, you will work closely with a cross-disciplinary team of scientists and hardware and software engineers. You will define and execute on our product roadmap, work closely with our quantum computing cloud service team (Braket) on product integration and customer engagement, and drive our strategic communications both internally and externally.
Program Manager (Senior Associate for Biological Research Infrastructure) - NSF, Alexandria VA USA
The Division of Biological Infrastructure empowers biological discovery by investing in the development and enhancement of biological research resources, human capital, and biology centers and other mid-to-large scale infrastructure. These investments support advances in all areas of biological research. Responsibilities of this Senior Associate for Biological Research Infrastructure will be within the Centers, Facilities and Additional Research Infrastructure (C-FARI) cluster within the Division of Biological Infrastructure (DBI), Directorate for Biological Sciences (BIO).
Sr. Engineer, R&D Software Engineering - Clario, Bridgewater NS CA
Clario generates the richest clinical evidence by fusing our deep scientific expertise and global scale into the broadest endpoint technology platform. By doing this, we empower our partners to transform lives. The Sr Engineer, R&D Software Engineering will provide software design, development, and application support for Clario products whilst adhering to departmental SOPs. In addition, will contribute to the architecture of the products and the development process and also contribute to mentoring junior members of the team.
Lead Principal Software Engineer II, Digital Pathology - Roche, Various and Remote USA
Work with our Imaging Scientists to help with the workflow requirements of their algorithms, answering the question of how can they interface with our systems. Evolve our existing imaging platforms from a feature and design perspectives. Partner with existing teams and collaborate with them to leverage existing solutions or evolve them into better solutions. Lead the code-review process and mentor engineers
Product Manager Lead, Fitbit Health Research Infrastructure - Google, Mountain View or San Francisco CA USA
Define, advocate and drive alignment plus execution for Health Research infrastructure. Develop Strategic partnerships and alignment with Fitbit App, Android Health Platform, Android, Takeout and other cross-functional to enable scaled health research. Work closely with team leadership to understand the research portfolio and identify, evaluate, and prioritize study candidates to support Health Sensing Roadmap. Own building a more diverse population on the platform through deep engagement with research partners and advocacy groups. Work closely with business development team in partnership discussions to launch studies on Vico.
Chief Engineer - High-Performance Computing and Distributed System - Huawei Research Germany, Munich DE
The ideal candidate should have a strong interest in distributed systems and parallel processing to improve and contribute to a metropolis-scale distributed city simulator. Hands-on research and supervising research for optimizing the performance of a highly parallel distributed simulation engine for processing metropolis-scale scenarios. Lead and directly contribute to the efficiency of data management, communication, processing, and analytics. Define and run benchmarks and evaluations to analyze the impact of novel communication and processing strategies.
Robotics Simulation Architect for Cloud Robotics - Huawei Research Germany, Munich DE
Gain insight into the research and technology trends in robotics simulation domain worldwide and plan research directions; identify technical challenges; setup technical roadmap; propose innovative and disruptive products and solutions. Lead the research, design, and implementation of critical technologies for Huawei’s simulation platform of for cloud robotics. Explore Drive collaboration resources in with industry and academia, build good long-term cooperation relationships, and carry out innovative cooperation projects with top Universities and industrial partners
Engineering Manager - Data - Benchsci, Remote CA or USA
BenchSci’s vision is to bring novel medicine to patients 50% faster by 2025. We’re achieving it by empowering scientists with the world’s most advanced biomedical artificial intelligence. We are currently seeking an Engineering Manager to join our rapidly growing Data Team. Reporting to the Director of Engineering, Data & DevOps, you will be responsible for planning, delivery, mentoring, and coaching data engineers. In this impactful role, you will work closely with key stakeholders across the organization and be instrumental in cross-team priorities and management. The most successful candidates for this role will be experienced software engineers who have transitioned to leading individual engineers and delivering complex data engineering solutions.