Research Computing Teams #113, 11 Mar 2022

eventually.

        March 12, 2022

Research Computing Teams #113, 11 Mar 2022

        Hi!
We now have 200 members of our community; people who care about research computing and data teams, their potential, and the importance of leading them professionally.  
So, welcome new readers!  Some resources that might be useful:

The Rands Leadership Slack (a community of 20,000 managers and leaders in or around tech) and our new and so far small (16 people) #research-computing-and-data channel there
My “Help, I’m a Research Software Manager!” talk & slides from 2020 which covers in 10 minutes the basic approach I take in the newsletter and elsewhere.

Speaking of resources and online communities: thing that we did for a while in the first year of the newsletter was a recurring “AMA (Ask Managers Anything)” section. Readers would send in questions (anonymously, by default), I’d take a whack at answering in an issue and then we’d solicit answers from readers which we’d include in the next issue (also anonymous by default).
I think enough time has passed and we’ve got a critical mass of new readers, so let’s  try that again.  If you have questions you'd like to hear feedback on, give our community a shout!  Hit reply (emails only go to me) or email jonathan@researchcomputingteams.org, and I’ll post your questions in the next issue.   And of course if you have anything else you want to share with the community, or suggestions or feedback on the newsletter, please also send them along.
In the last issue, we were talking about the benefits of having worked in or around research in our line of work, and I talked about the basic mindset of research: “So while ‘growth mindset’ and ‘comfort with uncertainty’ seem like empty phrases to us, they are high-achieving traits in other parts of the world.”
A reader chimed in share their experience having seen the same thing:

Yes. It took me many, many years to realise that having a PhD marks me out as highly skilled in the majority of organisations.
We think it is normal to be surrounded by high achievers – my own background is from CERN. We do not realise that many organisations have people who do quite mundane work.

There’s other examples of the strengths we bring from spending part of our career in and around research, too.  Here’s an article from earlier in the year saying that having multiple areas of tech expertise is a “superpower”.  It’s pretty common in our profession to have a research area of expertise and a tech area of expertise, and I think that’s a superpower too, for the same reasons given in that article.  It improves communication, strengthens links between groups, helps us understand how the work connects to research, and helps find better solutions.
With that, on to this week’s roundup!
Managing Teams
Everyone's a great manager until they start managing - Jonathan and Melissa Nightingale, Raw Signal Group

Four mistakes I made as a new manager - AbdulFattah Popoolah, LeadDev

What you give up when moving into engineering management - Karl Hughes, Stack Overflow Blog
When we see people do a different job than we do — especially when they do it very well or very poorly — it’s easy to think “that doesn’t look so hard”.  Plumbing, graphic design, customer-facing roles; we watch for a while and figure “I could do that”.   And you know, we probably could, eventually.  Because it is way harder than it looks, for almost any value of “it”.  Competence is hard-won.  The people performing those roles are avoiding (or ignoring) a lot of potential problems we’re not aware of, with background knowledge we don’t have, deftly performing practiced tasks that we’d botch the first dozen times through.
It almost always takes us much longer than we imagine to claw our way up to a basic level of competence in a new field.  In areas where the feedback loop is slow,  it can take still longer.  Absent prompt signals to the contrary, you can quickly get into Dunning-Kruger territory where you fool yourself into believing you’re doing much better than you are.
And, well, welcome to leadership and management.  When I and others say “management isn’t a promotion, it’s a career change”, this is what we mean.  It’s a whole ‘nuther job, where your experiences from your previous job help but also hinder you.
The Nightingales have a nice article that’s hard to summarize, but I like their section headers as an overview - “In theory it’s easy.”  “In practice it’s hard.” And, crucially, “The management is in the doing”.  Good management - assuming you’re in a reasonable situation where success is possible - is a learnable set of skills and behaviours,  that have to be done and redone until they’re practiced.  And then have to continue being done.  You will keep developing mastery of those skills for as long as you keep working at it.
(In our line of work, Management isn’t the only “looks easy; after all, I’ve been on the other side of it and know what not to do” profession we get thrust into.  Teaching and training is another set of extremely important activities that people spend their entire life studying and learning.  And we get tossed into it with zero education or support.)
Popoolah talks about the big categories of mistakes he made as a new manager (which overlap quite a bit with the “Help, I’m a Manager” talk above.  Fundamentally, he didn’t know what the job was, so he had trouble doing it.  He didn’t have a clear vision for the team; he had trouble giving constructive feedback; the first departure on his team threw him for a loop; and he kept trying to be a team lead instead of a manager.  It’s a good short summary of the problems a lot of new managers face.
Hughes talks more concretely about the things that are given up on the management path: focus time; short feedback cycles; conflict avoidance;  making technical decisions; and staying up to date technically.  It doesn’t have to be given up forever — I’m back on my third tour of duty as an individual contributor! — but those are just not part of the job of a manager.

Management Development As Skincare Regimen (Twitter Thread) - Angela Riggs
So how should you start learning the new skills you need to be a manager?  Riggs has one way to think about it.
I’m always on the lookout for new analogies for management, leadership, and strategy.  For management I personally like sports metaphors, but they’re so overused that every ounce of insight that can be extracted from those comparisons have long been exploited.  I’ve always found war and combat metaphors distasteful and aggrandizing, and now especially.  Our jobs are tough, but no one’s getting killed.
In #42 there was a pretty helpful comparison to TV show writing, which is another example of collaboratively creating something new.  Here Riggs uses the metaphor of adopting a skincare routine.  It’s not something you do all at once; it’s something you start with an eye towards the problems you’re trying to solve.  You add and change products as needed. You test each change to see if it advances your goals, even though it can take some time to see the result.  You have to unlearn old things (exfoliation!).  And of course you get input from others trying to solve similar problems.

The Thirty Minute Rule - Daniel Roy Greenfield
We’ve talked a lot about having explicit expectations in your team, especially around communications.  It’s been on my mind having changed teams very recently.
Your team does have expectations about how people work together.  (You’ll find this out very quickly if a new team member starts behaving very differently from team norms!)  The only question: do you have those expectations written down somewhere?  Having expectations explicit makes it easier for new team members to spin up, and for experienced members to mentor juniors and trainees.
If you don’t have such expectations explicit, one good target to start with is: how long should someone wrestle with something on their own before bringing other team members (or stakeholders) in?  
You do have expectations about this.  If someone was spinning their wheels for two weeks making no progress because they were stuck on something someone else could have told them, you’d be annoyed.  You’d also be annoyed if someone constantly asked on the #general channel on slack the second something came up they didn’t know. 
Greenfield suggests a 30 minute rule; don’t let people get stuck because of something they don’t know for longer than 30 minutes.  Maybe in your team, with the kind of work you do, it’s an hour, or a workday, or 15 minutes.  It almost doesn’t matter.  Pick something that feels right, and bring it up with your team at your next team meeting and see if they agree.  Make changes as necessary.  Then write it down somewhere and put it in your onboarding documents; from there you can build up to other shared team expectations.

Managing Your Own Career
Run your day, don’t let the day run you - Kahlil Lechelt
A manager or leads’ day in research computing is much busier and filled with a wider variety of demands than we’re used to as an IC.  It’s vital to maintain some sort of control over the tasks you’re working on.   Lechelt gives good advice:

Everything goes in a task list - email is not the place to store to-dos.
Have a small list of things you will get done today, leaving slack in the schedule for things that come up.
Have your calendar be the single source of truth.
You’ll slip somtimes.  It happens.  You’ll abandon your task list, get stuck in fire-fighting, stop putting work activities on your calendar.  Accept it and start back where you left off.

The Painfully Shy Developer's Guide to Networking for a Better Job - Sam Julien
Conferences are starting to happen in person again.  A lot of us at the intersection of computing and research are pretty committed introverts.  Being in a group of mostly strangers in person, or even online, can be challenging.
But meeting others in your work community and developing professional relationships is important.  It’s not just about finding new jobs!  It helps us find and share relevant ideas; helps us hone our craft; and helps build our community of practice.
Julien gives time-honoured advice that works for him; maybe it’ll help you, or someone on your team:

Make Other People Feel Welcome and Accepted - I find this approach really useful; I sometimes will pretend I’m the host, and then my sense of duty to making my “guests” feel welcome can override my sense of awkwardness
Give first, then give some more - Don’t make it about you; keep an eye out for things you can help people with, whether it’s making introductions (even to someone else you just met), problems you know something about, etc.
Don’t overthink - be genuine and have fun

Product Management and Working with Research Communities
Good mentors boost integrity, survey finds - Erik te Roller in Haarlem, Research Professional News
An important part of our jobs is mentoring juniors and trainees in our team, but also from research groups.  Indeed, even the PIs we work with we’re often mentoring in some kid of technical area.
It turns out that matters in a number of surprising ways.  Juniors who are well-mentored then go on to be less likely to commit research misconduct.  It’s not hard to imagine how that might be; they’re less lost, less at-sea, and feel more connection to the research community.  If we have a research trainee working with us, and we show them how to get their work done and help them when they need it, they’re going to struggle less, maybe less likely to cut corners in a moment of desperation.  It’s really hard to come back once you’ve started cutting corners!

What Does and Doesn’t Happen After You Specialize? - David Baker
Research computing and data has a large consulting component, and for that part of the job we can learn a lot from other consultants.   The basic job - understand some aspect of a client’s work, uncover their problems, connect their problem to our specific expertise, and help them construct a solution - is the same in any field.
Consultants in other fields are much more successful when they specialize.  As long-term readers will know, I strongly recommend research computing and data teams, especially (but not exclusively) consulting and software development teams, devote themselves to a very small number of sub-specialties, unless institutional imperatives flat-out forbid it.  
Again, your team already has things that it’s better at and things that it’s less good at; it’s just a matter of making that specialization explicit.   By doing so, you can further develop your teams strengths, stop spinning your wheels (and wasting your stakeholders time) on projects and products that are less likely to succeed, and make it easier for researchers to know that you’re the team they should contact.
Baker helps other consultants specialize, and knows that making the change to narrowing the focus is scary.  Here he tries to make it less scary by pointing out that the day after you start your specialization… not a lot changes.

Specialization makes your team smarter, faster by focussing your energy.  That doesn’t happen right away.
You start feeling a bit impostor-syndrome-y, claiming to be an expert in… web apps for GIS data (or whatever).  That’s cool and normal, and should be less impostor-syndrome-y than “we write any research software for anyone”.
There’s no law that says you have to immediately start turning down work that fits the new focus. 
Typically, you start realizing that the focus should be even tighter.  That’s fine, can do that later.
You start being able to share more with the community, because the audience is better defined.

Cool Research Computing Projects
When a seismic network failed, citizen science stepped in - Alka Tripathy-Lang, Ars Technica

Citizen seismology helps decipher the 2021 Haiti earthquake - Calais et al., Science (2022)
Despite all the buzziness of it, I’m really optimistic about smart devices for research - to be able to build networks of sensors to collect data without requiring researchers to build and maintain large amounts of infrastructure.
Here’s an example of such a project.  Haiti, for a variety of historical reasons (including having to pay reparations to French slaveholders for daring to declare independence) has problems maintaining complex infrastructure.  That includes seismograph networks, which is a problem in a seismographical active part of the world.  In steps comes Sismo@Ayiti, a citizen-science effort connecting volunteers with small Raspberry Pi devices that can be placed in their house, sending vibration data to a central server (you can see the live data here).  Tripathy-Lang gives a great writeup; these devices are noisy (in people’s homes), but the noise is uncorrelated, and there’s many devices, so signal processing and machine learning can uncover good overall signal.  In particular, it was enough to disentangle the fairly complex event that was the 2021 earthquake.

Research Software Development
How we ship GitHub Mobile every week - Taehun Kim, GitHub Engineering Blog
An interesting behind-the-scenes look at what’s involved in shipping releases regularly.
Ignoring the mobile app parts for a moment, which is something most of us don’t deal with, I was surprised by how vanilla and routine the tools used were.   The main work has gone into developing processes and forms to automate what can be automated and add checklists.   The tools and processes they use could all be readily adopted by a research software team.  The exceptions are trivial - e.g. they keep track of whose turn it is to do which task using an on-call-management tool, because they have it, but one could just as easily use a google sheet.
Their process is:

They use Github actions to create a tracking issue with a long checklist
The action creates a PR to create the release branch
They use Github actions to do their CI testing and create a binary
They use bash and ruby scripts to create a PR to update the version number

Essential Open Source Software for Science (Cycle 5) - Chan Zuckerberg Initiative, Letters of Intent due 19 Apr
I’d love to not post the CZI software calls every time they’re out, mixing it up a little bit instead.  But no one else is doing the work of routinely funding maintenance of important open-source scientific software, so here we are.
Focus areas are foundational tools and infrastructure, and domain-specific tools for infectious diseases, imaging, and single-cell biology.  Applications can request $50-200k/yr for two years.  If invited to submit a full application, it’s a tight turnaround - notices go out May 5, applications are due 2 June.

How Google, Twitter, and Spotify built a culture of documentation - Nik Begley
Documentation can too easily fall behind.  It’s kind of everyone’s job, so it’s no one’s job; and people are focussed on shipping new code not updated documents.
Begeley summarizes talks from teams at Google, Twitter, and Spotify about how those teams improved their documentation and then kept it better.   What I found interesting here is how similar the cases were to each other, but also to a case study on research computing software, the Tasmanian library, that we covered back in #43. 
Common features of the approaches were:

Choosing a common way to do the documentation - meaning there was a standard “we do things this way” approach
Building on that, create templates to make things easy
Having the nicely-formed version of the focus automatically generated and usually on a web page, so you can see the results right away
Have a “documentation day” to get the docs up to date
Then make documentation part of tickets to keep them up to date.

It’s fascinating to me that the problems are the same even at huge corporate teams, and that the solutions are so similar and, well, do-able.

I’ve been talking about NVIDIA’s GTC as a worthwhile event back since #15; ignoring it now just because I work there seems a bit much.  So check out the session catalog, and register if you see something that interests you.  Registration for the sessions are free.  Those trainings that do cost money are not very expensive and are pretty uniformly recognized as being quite good.

Research Data Management and Analysis
Breaking Data Silos Open with an Apache Arrow Platform - Daniel Robinson, The Next Platform
This is an interesting example of what I think is a positive development for research computing and data.  An open source standard and tool (Apache Arrow here, an efficient in-memory representation of columns of data) will be supported by new company Voltron Data (which just announced a successful seed round) with development and enterprise support and software tiers for those who want it.  
Expertise comes from people with experience in data analytics and people with HPC expertise such as well known HPCer Fernanda Foertter, which makes me hopeful that this is something that could be useful across research computing.  And enterprise support subsidizing open-source tools - especially one which serves as something of an interoperability layer for data on disk and in memory - is a powerful approach for keeping those tools maintained and usable.

Some very nice detailed articles on working with Postgres - Postgres Auditing in 150 lines of SQL, How we optimized PostgreSQL queries 100x, and using views for zero-downtime schema migration.

Getting started with ML training using Intel’s oneAPI AI Analytics Toolkit.

Research Computing Systems
The Dirty Pipe Vulnerability - Max Kellermann

Linux has been bitten by its most high-severity vulnerability in years - Dan Goodin, Ars Technica

New Linux bug gives root on all major distros, exploit released - Lawrence Abrams, Bleeping Computer
Hopefully this isn’t the first you’re reading of this (why do these things always break on Mondays?).  I’ve collected the best write-ups I saw; there’s local escalation exploits published, and it wouldn’t shock me to hear that there’s worse circulating around.  Kellerman identified the bug, and his article tells the story of how it was found.  It’s a kernel error involving the interactions of pipes and page caches in memory.
The “good” news is that a lot of research computing systems are still on RHEL7 or the moral equivalent, which has kernels earlier than the roughly 5.8 where this started being exploitable.  Still, check your versions of everything and update as soon as you can if needed.
It’s amazing anything works at all.  Of course we live in an era when business phone systems are being used actively exploited to take part in DDoS attacks, so, you know.

Understanding wait time versus utilization - from reading Phoenix Project - Zhiqiang Qiao
Every so often I see technologists rediscover a very widely known result in operations research - introductory textbook stuff, really.  Wait times (or other bad behaviour) start rocketing upwards once we get to high (somewhere between 80% - 90%) utilization.   You see this in equipment, and teams, of course, too.  Teams, whether they’re cash registers or software developers, start getting into trouble at sustained high “utilization rates”, e.g. overwork.
And yet, a typical metric for the systems we run for researchers includes utilization, with an understanding that higher is better.  After all, if we have one system at 75% utilization and another at 90%, haven't we wasted money on the 75% one by over-building?  
Of course, Qiao points out that even if you discard utilization as a metric, wait time isn’t the only metric we might care about either.  Getting the most important tasks through the queue, whether that’s software features or compute jobs, is what’s important.
Metrics matter!   We bake in all kinds of pathological incentives when we choose KPIs based on what’s easily measured (typically technology or input based) instead of what actually matters (supporting new and high-impact research outputs).

SSH Key Rotation with the POSIX Shell - Sunset Nears for Elderly Keys - Charles Fisher
Keys that never change are dangerous if something happens; Fisher shows that it’s straightforward to write a simple script for rotating your ssh keys.  The primitives - ssh-keygen and ssh-copy-id - are most of what you need; this wrapper script handles the details.  The only thing that’s tricky (and dangerous!) to do automatically is removing the old key.  Fisher walks you through the process to build up the tooling in a way you’re comfortable with.

A quick walkthrough of setting up docker in rootless mode.

Emerging Technologies and Practices
Reinventing High Performance Computing: Challenges and Opportunities - Daniel Reed, Dennis Gannon, Jack Dongarra, arXiv

Will HPC be eaten by Hyperscalers and Clouds? - Timothy Prickett Morgan, The Next Platform

Looking for a Singularity Event for Scientific Computing - Jeffrey Burt, The Next Platform

Bad News for Cloud Computing: OpenStack Use Plummets and Discounts Dry Up - Lawrence E Hecht, The New Stack
Reed published a  blogpost in early February, which has been focussed, fleshed out, and joined by Gannon and Dongarra into a preprint on arXiv.  Morgan summarizes the paper well in his article, starting with this sum-up:

…we get a fascinating historical view of HPC systems and then some straight talk about how the HPC industry needs to collaborate more tightly with the hyperscalers and cloud builders for a lot of technical and economic reasons. 

In a time of Moore’s law ending, where the cloud companies and hyperscalers comically dwarf the old-school computer vendors, and when research computing is ever-broadening to a wider and wider range of important problems to solve, the old ways of building systems just aren’t going to work.  I won’t try to summarize Morgan’s summary; if this interests you at all, I’d highly recommend starting with Morgan’s article, going to the paper as needed.
But I would like to tie it into two other articles.  
First is Burt reporting on BP’s John Etgen’s keynote at the recent EnergyHPC (née Rice Oil & Gas) conference.  Etgen laments the lack of an “explosion in innovation” in HPC that is happening in AI, clouds, and even once-staid fields like databases.  He points to a lack of increasing specialization in HPC, lack of composable components, and of new software:

That said, if the field was in singularity, “you’d see an exploding diversity of applications. Do you feel like we’re seeing an exploding diversity of applications? In computational fluid dynamics, I think the answer is no. In computational seismology, the answer is definitely no. In reservoir engineering, I think the answer is no. Maybe there’s some other scientific fields that actually have an exploding diversity of applications in scientific computing. Maybe, but it’s kind of iffy. I’m not sure I see that.”

Finally is Hecht’s suprising-to-me report of four years of survey data about on-prem private clouds.   The first fact is discouraging but maybe not surprising: OpenStack use is plummeting, dropping by more than half over the past three years.   This is really bad news for people hoping that OpenStack will continue to be supported and extended.
More surprising to me is that use of enterprise stalwarts like VMWare (often used for “enterprise” vs “research computing” workloads) is also plummeting.  The only options that are clearly growing are AWS Outposts (which will ship a rack or individual servers of managed AWS servers to your machine room) or Azure Stack (which will let you buy your own Azure-certified gear and run the Azure managed stack atop it, charging per core). 

I think we need to start getting used to the idea that trillion-dollar cloud providers are better at running computing systems than siloed teams of 5 or 12 or 50 can manage, and much better at writing the tools to support management of the system.  If that’s true then even on-prem systems 10 years from now might mostly have commercially-supported control planes from the cloud providers running atop.  That’s different, in the sense of fewer choices, than paying for commercial support for a scheduler and Lustre, but is it so different that we can’t accept it?  I guess we’ll find out.

Here’s a nice blog post explaining when to use AWS ParallelCluster vs AWS Batch.

The N8 Centre of Excellence in Computationally Intensive Research (N8CIR) is hosting a digital research infrastructure retreat the last week in March in Manchester.  It covers strategy, grant writing, new technologies, developing a profession - it honestly looks like a great event.  Registration is still open.

Random
So I have a Windows computer for work for the first time since Windows 3.11.  For those computers I actually type on (as opposed to those I ssh into) I’ve been using Mac since… 2006?  On the new machine I can use Ubuntu via WSL2, a browser is a browser, and VSCode & Teams & Slack are all cross platform, so it’s mostly ok… except for the frickin’ keyboard shortcuts.
Oh yeah, and frickin’ paths.  Here’s the history of how Windows came to use the wrong slash.  It’s all DEC’s fault, stemming from how options were passed to commands for the TOPS-10.
An open-source microscope built using Lego bricks, 3D-printing, Arduino, and Raspberry pi.  By IBM(?).
Sure, word processors are great and everything, but sometimes I prefer to use an older, text-based markup language to write documents that get beautifully typeset.  And if you use the same one, too, great news - that too, groff, now has an IDE.  Written in Pascal, so you know it’s good.  (Heirloom troff seems to be the current best implementation, by the way).
The history of the inverse-T arrow key arrangement.
Javascript might be getting mypy style type hinting.  One can only hope that, like mypy, it starts from there and grows outwards.
Make oddly soothing art with vector fields.
Using Google Sheets as a database to back a website using ROAPI and Replit for API hosting.
Literate programming/notebook style documents for the mechanized proofs that come out of theorem provers.
The people creating a small community around tiny computing.
A deep dive into ED25519 signatures.
Small thermal models in the browser.
Here’s a C cross-compiler for the 6809 for those of us who had a TRS-80 Color Computer.  To heck with those fancy kids with their Commodores.
Oh, fine.  For you Commodore kids, here’s creating a Commodore C-64 cartridge with a Raspberry Pi Pico.
Quantum Computing from a statistical point of view (authors PDF).
A debugging story of a super hard-to-track-down bug involving infiniband, GPUs, forking a new process, and malloc.
Implementing the lambda calculus in 392 bytes.

That’s it…
And that’s it for another week.  Let me know what you thought, or if you have anything you’d like to share about the newsletter or management.  Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology.  It’s teams, it’s communities, it’s product management - it’s people.  It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly.  But no one teaches us how to be effective managers and leaders in academia.  We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 153 jobs is, as ever, available on the job board.
Data Science Leader - Anaconda, remote US 

Anaconda is seeking people who want to play a role in shaping the future of enterprise machine learning, and data science. Candidates should be knowledgeable and capable, but always eager to learn more and to teach others. Overall, we strive to create a culture of ability and humility and an environment that is both relaxed and focused. We stress empathy and collaboration with our customers, open-source users, and each other.  Anaconda is seeking a talented Data Science Leader  to  join our rapidly-growing company. This is an excellent opportunity for you to leverage your experience and skills and apply it to the world of data science and machine learning.
Scientific Computing Senior Manager - Fred Hutchinson Cancer Center, Seattle WA USA 

The Scientific Computing Senior Manager will direct and oversee all facets of scientific computing including managing a highly capable engineer team, engineering functions such as design, development, installation, and maintenance of hardware and software, and customer service and support, for the organization. The senior manager will create and oversee partnerships with faculty and stakeholders in the scientific community to test and integrate new analysis pipelines, determine innovative technologies and software to support their research, evangelize new and current solutions, and foster technology and data storage best practices. The senior manager will participate in strategic planning and align projects and resources for implementation in partnership with the Project Management Office and Business Operations, and other departments.
Director of Research Computing Services - Iowa State University, Ames IA USA 

Iowa State University is seeking candidates a newly created position of Director of Research Computing Services within Information Technology Services (ITS).  Reporting to the CIO, the Director of Research Computer Services is responsible for providing campus-wide leadership and vision for enterprise technology services and solutions in support of Iowa State’s research community across all research disciplines.  The Director of Research Computing Services is responsible for advancing the University’s investment in cyberinfrastructure through the tactical planning, integration, coordination, and deployment of information technology and human resources to support research, scholarship, and creative work.
Senior Director, Academic & Research Computing - Rush University System for Health, Chicago IL USA 

The Senior Director, Academic & Research Computing is accountable for the strategy, leadership, direction, and delivery of ongoing operations of the systems and technologies used by the University and Research communities. Manages large complex projects and programs.. Coordinates both team member and manager resources to meet IS and implementation team's strategic goals. Drives strategic planning for new IS facilities to optimize organizational design and effectiveness.
Research Project Manager, Data Science for Science & Humanities - Alan Turing Institute, London UK 

The Alan Turing Institute has ten Programmes of scientific research in key areas of AI and data science, each led by a Programme Director. There are also a small number of significant programmes of activity which sit across several research Programmes. This role sits within the Institute’s Programme Management directorate, which is responsible for the management and delivery of these programmes in support of the senior academic Programme Directors and Principal Investigators. The team oversees millions of pounds of data science and AI research, training and knowledge exchange programme initiatives in these programmes, ensuring they are managed to business requirements, specification, time and budget.
Senior Software Engineer, Cloud-Native Health AI - NVIDIA, various remote USA 

Do you want to improve healthcare for the world? We are seeking a Senior Systems Software Engineer who is technical, creative and driven to change the world of healthcare. They will build and deploy high-performance, scalable cloud native solutions for AI deployment in healthcare. Our efforts focus on enabling healthcare workloads to use deep-learning and the latest machine learning advancements accelerated by GPUs. We provide easy to use, scalable both locally and into the cloud micro-services and SDKs. You will use your deep understanding in service oriented architecture, data processing, programming languages, distributed systems, multi-threading, operating systems, machine learning, and cloud services.
Agile Delivery Manager - University College London, London UK 

These roles will sit within ISD or ARC, where they are part of the Portfolio and Product Delivery department. Our purpose is to manage the delivery of technology led change, partnering with colleagues across UCL.  As Agile Delivery Manager you will support, coach and enable one or more cross functional agile teams to deliver their product or platform roadmap. You will lead and facilitate collaborative agile ceremonies, striving to identify and eliminate risks, dependencies and impediments, while ensuring quality is built in. You will work closely with the Product Owner and key stakeholders in ISD and ARC and more widely in across the university, proactively communicating and collaborating with them.
System Administrator - Rutgers University Office of Advanced Research Computing, Rutgers University - New Jersey, USA 

Rutgers, The State University of New Jersey is seeking a System Administrator V in the Office of Advanced Research Computing (OARC).
Reporting to the Director, Advanced Computing Infrastructure (ACI), the highly skilled and experienced System Administrator V will support the university’s Advanced Research Computing (ARC) infrastructure, including High Performance Computing (HPC), High-Throughput Computing (HTC), and Data-Intensive Computing environments.
The System Administrator V will perform the following duties: conceive, design, develop, optimize, integrate, and maintain HPC systems and on-site cloud infrastructure, lead technical operation and continued development of HPC, on-site cloud infrastructure and storage services, and provide hardware, software, and end-user administration and support to a diverse group of end users that need access to ARC resources. Operates as a member of the ARC team with focus on a University campus.
Senior HPC Solutions Architect, EMEA - AWS, various UK or EU 

Are you passionate about cloud computing and its potential to overcome some of the biggest challenges in High Performance Computing (HPC)? Do you have a combination of deep technical knowledge, business acumen and strong interpersonal skills? Do you enjoy tackling large analytical problems as massive scale? Amazon Web Services (AWS) is seeking an HPC-focused Solutions Architect to work with our customers, including the largest CAE and Autonomous Computing customers, financial institutions, and healthcare and life-science customers, ... to craft cloud native or hybrid HPC solutions.
Embedded Software Team Manager - Quantum Brilliance, Canberra ACT AU 

Quantum Brilliance is the world leader in room-temperature quantum computing using synthetic diamonds. Our unique vision is to make quantum computing available as an everyday technology, from data centres to remote and mobile systems like autonomous robots and satellites. The Embedded Software Team Manager is a critical member of the company’s control systems research and development team. You will be the direct line manager for a small team of software and systems engineers in our lab in Canberra. This team are responsible for developing the control systems for our quantum computer hardware and typically have expertise in embedded software, FPGA design, electronics, and work closely with our Lab Team to develop the required hardware control and data processing capabilities.
Software Development Manager (HPC) - AWS, Bellevue WA or Denver CO or Boston MA or New York NY USA 

We are open to hiring across a few different locations; Bellevue, Denver, Boston, and New York City. The AWS High Performance Computing (HPC) group is looking for a Software Development Manager (SDM) to lead a team focused on the next generation of HPC on the AWS Cloud. We build NICE EnginFrame, AWS ParallelCluster, and the overall experience for customers building some of the largest HPC and distributed ML clusters in the world, while at the same time empowering research scientists and front-line engineers to dynamically scale their HPC workflows. We enable a broad set of applications for computational fluid dynamics, weather modeling, molecular dynamics, seismic modeling, and machine learning.
Principal Product Manager - HPC, Batch - AWS, Seattle WA USA 

The High Performance Computing (HPC) team is looking for a talented PM who will act as the lead PM for the Batch PM team, driving our efforts to develop the Batch portfolio of products within AWS. We are looking for a broad and deep leader who will own and drive product management across a range of Batch offerings, which range from services like ECS, Kubernetes, and Lambda. You will work closely with teams in EC2 as well as on new technology from Annapurna Labs. As a Principal Product Manager (Technical), you will own product development from strategy/product vision through feature definition, prioritization, positioning, naming, and GTM/adoption. We are looking for experienced product managers who are passionate about solving customer problems, and have demonstrated success working backwards from ambiguous customer needs, translating them into disruptive, successful products that delight customers.

                            Don't miss what's next. Subscribe to Research Computing Teams:

            Email address (required)

                Share this email:

                                Share on LinkedIn

                                Share via email

                                Share on Bluesky