Research Computing Teams #106, 22 Jan 2022

isn’t a strategy

        January 23, 2022

Research Computing Teams #106, 22 Jan 2022

        Hi!
As you can see from this issue again being late (but less late!), here at RCT HQ I’m still getting into the rhythm of the new year.  I hope you and your team are already firing on all cylinders.
There’s several articles this issue on decision making, priorities, and strategy.  I keep coming back to this, because it’s important.
For research computing teams to succeed, we need priorities and a strategy.  Without those, we can’t even tell if we’re succeeding, or on our way towards success.
Everyone agrees with that, but the next part gets uncomfortable.  A strategy that doesn’t clearly show which perfectly good opportunities the team would say “no” to isn’t a strategy, no matter how many slide decks it appears on; it’s a motivational poster.  A “priority list” which doesn’t explicitly deprioritize some opportunities, projects, clients, and activities is — by definition! — something other than a priority list.
Unfortunately, I see this quite a bit in academic research computing teams, especially around software and systems.  New research software development teams, often one of several across campus, typically scramble to get funds in the short term, taking a “catch as catch can” approach to projects.  A team led by a new leader will often have a “strategy” of something like “enabling research with high-quality software development” and “connecting domain experts in software development with domain experts in research”, which distinguishes them from literally no research software development team anywhere in the world, much less on campus. (Where are the stand-alone “enabling research with low-to-moderate quality software development” teams?) This makes it impossible for them to clearly articulate strengths compared to other teams across campus, attract clients that need those strengths, and build on those strengths.  But doing that is uncomfortable, because then they have to at least implicitly come clean that the team has weaknesses, and thus clients they probably can’t (and shouldn’t) help effectively.  That’s hard for a new leader, and even for some pretty seasoned and senior leaders.
When I see research systems teams fumble on clear articulation of priorities and strategy, it’s usually a little different — a stubborn refusal to admit aloud that they are implicitly prioritizing certain kinds of workflows (and thus clients) over others. Typically it starts with queuing and allocation policies put in place at the beginning of the systems team, aimed to support their first big user groups.  Then as time goes on, those policies somehow become “decisions made for technical reasons” and “the way things are done here” rather than what they actually were, an explicit prioritization choice.  At that point, saying aloud “we specialize in support for users [X]” is scary even though it’s manifestly true and would be the first step in deciding whether continuing to specialize in support for users [X] is in fact the right approach.
Prioritization and strategy, when done in a way that isn’t meaningless, is scary.  Unfortunately, prioritization, strategy, and decision making is the core responsibility of managers and leaders.  Every decision for something is a decision against something else.  Every high priority area explicitly deprioritizes all other possible areas.  A strategy necessarily implies a long list of paths not taken.  If your priorities and strategies aren’t choices that could be wrong, they aren’t priorities or strategies at all.
With that, on to the roundup!
Managing Teams
How to Write a Strategy Statement Your Team Will Actually Remember - Michael Porter, Nobl Academy

Saying "no" - Mike McQuaid
Porter’s article highlights an idea that’s come up a few times in the roundup - a very clear way to write out priorities or strategy is to contrast two things, both positive, and explicitly say that your strategy values one over the other.  It’s too easy to write out “motherhood and apple pie” strategies: “we value moving fast and solid code”.  But those statements are fluff that don’t mean anything; you might as well be announcing that water is wet.  No one disagrees that those are good things with value.  Hard decisions come when good things are in conflict.  When writing a new chunk of solid code will require moving slowly, which one wins?  Those choices are your priorities, the outlines of your strategy.
And once those choices are made, you have to start saying no to things that go against those choices.  Last January (#56, #57, #58) we had a series on stopping doing things; the first step is saying “no”.
McQuaid talks about saying “no” as “frontloading disappointment”, and maybe surprisingly as a way of building and maintaining trust.  He also talks about Brené Brown’s BRAVING framework for trust (Boundaries, Reliability, Accountability, Vault, Integrity, Nonjudgement, Generosity).  He writes that having a clear and explicit “API”, where it’s clear that if you say yes to something it’s going to get done, and that you say no early so they can move on, builds trust long term in exchange for a small bit of friction in the short term.

Make Yourself Obsolete: Your Team Will Thank You - Nathan Broslawsky
Porter’s article above had a simple measure of success for strategy statements - in your absence, would the team be able to make the decisions you would have made based on the stated strategy?
Broslawsky’s article continues on that theme and expands it.  A good leader and manager is continually growing their team members, giving them broader responsibilities, and that means giving up some of your tasks.  This means coaching rather than directly problem-solving, it means giving team members opportunities to take on high-visibility efforts, and making your old job description obsolete.  That frees you, too, to take on other responsibilities.
If you like this article, a related article that roundup #79 included, Always Be Quitting, generated several positive comments from RCT readers.

The Decision-Making Pendulum -  Candost Dagdeviren

Our Role in Effective Decision Making - Kathy Keating

Everyday Decisions, Done Dirt Cheap - Matt Schellhas
Dagdeviren talks about the decision-making pendulum, the spectrum of decision-making processes ranging from authority (“Because I said so”), through advice and consent, to consensus decision making (“So say we all”).  The processes at the extremes aren’t great for either the teams or the decisions that likely result (I don’t agree with everything Manager-Tools says, but I think they’re 100% right about consensus - consensus is a desirable outcome of a decision-making process, but it is a bad choice for a decision-making process).  Within the middle, the important thing is that the decision-making process is explicit, input is gathered and seen to be taken seriously, and decisions are communicated clearly.
That input gathering and discussion-facilitation phase before and after a decision made takes some practice.  Keating gives us (from the point of view of an inveterate introvert!) an overview of facilitating discussions, which can be especially tricky in the cross-disciplinary world we work in, so that the full range of ideas and information becomes available and shared.
And Shellhas gives us a bit of breathing room when it comes to decision making.  The “best” decision is unknowable beforehand, when you actually need the decision; even in retrospect, determining whether it was best available is usually still impossible.  It’s not a power given to we mere mortals to watch all possible timelines unfold.
But there’s often one or more nearly-tied, “good enough” decisions available.  Unless a given decision has huge ramifications — which does happen, but isn’t the common case — quickly making a good-enough decision in some consistent and transparent way (maybe, say, given some underlying strategy and priorities!) is probably better for you and the team than agonizing over trying to optimize over the unknowable.

Managing Your Own Career
Becoming a Better Writer - Gergely Orosz
Work for others is getting more and more remote and asynchronous.  It’s always been a bit that way for us, with large multi-institutional collaborations a routine part of work.  Whether it’s a new or ongoing challenge, we do better with clearer writing.  It’s a fundamental tool for clarifying our thoughts, and for communicating those thoughts to a community of work.  That’s true whether it’s for a pull request, a changelog, an email to users, or an institutional strategy document.  And as we grow in our responsibilities, our community of work becomes broader with more diverse expertise.  Writing clearly gets more and more important.
There’s only one way to get better at something - and that’s intentional practice with real feedback.  Orsoz recommends writing a lot, critical editing - by ourselves and others - and actively soliciting feedback.   Orosz also recommends a couple of tools, including Hemingway Editor, which I keep coming back to.  In academia we write long, rambling sentences.  Twitter and Hemingway Editor help me tighten my text.  As long time readers will know, it’s a continuing struggle.

Middle Managers: The Forgotten Heroes of Innovation - Ben M. Bensaou, INSEAD Knowledge
Something to remember as we rise in our careers: middle management has an undeserved poor reputation.  It’s glue work, which is largely unsung and hard to even understand the value of from the outside.  But middle management is where much of the cross-discipline collaboration across a large organization happens. ICs and front-line managers are typically too focussed on their individual functions, and the view from the executive suite is typically too high-level to see fine-grained opportunities to work together.  Large companies like Bayer rely heavily on middle-managers to find novel ways to collaborate across the organization, as Bensaou describes.

Product Management and Working with Research Communities
Former Google CEO invests in computing help for university scientists - Adrian Cho, Science
Schmidt Futures, a philanthropic foundation founded by Eric and Wendy Schmidt, will be investing $40m over 5 years to establish a Virtual Institute for Scientific Software across Georgia Tech, Johns Hopkins, Cambridge, and U Wash.  Research software developers will be hired at each site.
So this is unambiguously good news, all great stuff, and it’s getting really frustrating that we need to have wealth tech patrons like Google’s ex-CEO or Mark frickin’ Zuckerberg (through the Chan Zuckerburg Initiative) to fund in dribs and drabs the absolutely fundamental work of research software development and maintenance.  And at least for software there are those programs - other areas of research computing and data don’t even have that.
The article ends:

[UWash Dean of Engineering] Allbritton hopes federal funders might take note of the initiative. “One might imagine,” she says, “that NIH and NSF could pick up on something like this and really amplify it.”

Might one?  At this point it’s not clear.

Research Software Development
Code Review is Feedback - Linnea Huxford
A reminder that code review isn’t just about the code in question, it’s feedback.  So that means it’s an opportunity to give nudges to inform future behaviours (code submissions), it’s an opportunity to give positive as well as negative feedback, and it’s important that all team member are providing consistent feedback.

Git Organized: A Better Git Flow - Annie Sexton, Render

Make debugging suck less. Keep a logbook - Conor Lamb
A couple short articles nudging us to do a little better for ourselves when debugging, and for others when committing changes.
Sexton issues a plea for more coherent commits, suggesting following the “commit (locally) as you go along” approach with a git reset (which doesn’t change the working tree unless requested to) and then bundling changed files into coherent commits.  Even parts of a file can be added to a commit with git add —patch or via editor support (in VSCode you can “stage select ranges”).
Even though a lot of us were trained in science, we can lose our scientific approach in the heat of a debugging session.  Lamb suggests keeping a logbook; this is a good idea, coupled with the approach of having hypotheses you test during debugging rather than my usual approach of just flailing.

Stewardship of Software for Scientific and High-Performance Computing - US DOE

Responses to DOE RFI on Stewardship of Software for Scientific and High-Performance Computing - Various, organized by Todd Gamblin
The DOE put out a RFI on Stewardship of Software for Scientific and High-Performance Computing describing the problem and current state of things.  There were 39 responses, most of which are online, but the official government website is a pain to review.  Gamblin organized the responses in a GitHub repo - there’s a lot of good stuff there, worth reading.  I’ll try to distill some of the common (and uncommon!) responses in a future issue.

Research Data Management and Analysis
Getting Started with R and InfluxDB -  Gourav Singh Bais, The Next Stack
There’s a number of time series databases available these days, which are analytic optimized for collecting an extremely large number of (timestamp, value) pairs for a potentially large number of variables.  Queries are aimed at identifying windows of time that meet some criteria within a variable or across several time streams, and potentially doing some analysis within that window.   Sometimes there’s tools for removing old data or at least keeping it at only larger and larger granularity.
If you’re curious about knowing if something like that is appropriate for your usecase, this is an InfluxDB-sponsored post on The Next Stack for quickly playing with some example data (COVID-19 cases) in InfluxDB using an R client.

Research Computing Systems
SUSE announces new distro for those who miss the old CentOS: Liberty Linux - Liam Proven, The Register

CentOS 9 Stream Is Now Available but Should You Use It? - Jack Wallen, New Stack
Well that's interesting - it looks like SUSE (drags on cigarette - It’s been a long time since I’ve heard that name) will be releasing a “classic CentOS” type distro based on RHEL, but with much its much newer kernel.
As you know, I think slow moving, long-term-stable OS images are a trap for research computing systems teams, albeit one that’s hard to dig out of until vendors stop using them as an excuse to update their device drivers as seldom as humanly possible.   But if it’s a trap you find yourself in, there’s a new option from an organization a little less fly-by-night than Rocky Linux.
Alternatively, CentOS 9 is the first Stream version of CentOS, which is kind of the staging area between the faster-moving Fedora and the carved-in-stone RHEL.  After a new version or package has been kicked around in Fedora for a while, it goes into RHEL nightly and CentOS stream.  Honestly, I think something like that could be  close to a sweet spot for some research computing clusters; the issue, as always, is cheap and lazy vendors updating their drivers only when absolutely forced to.  As a community, we should force them to more often.

SSH Bastion Host Best Practices - Sakshyam Shah, Teleport

Restricting SSH agent keys - Jake Edge, LWN.net
A pair of SSH articles this week.
In the first, Shah describes a good set of best practices for setting up a bastion host.  None of the individual steps here are ground breaking - that’s what best practices means - but the collection of them all in one place is useful.  Strip down the packages and services on the host OS, lock down networking capabilities, limit user accounts, add OS-level logging and send the logs elsewhere, add host intrusion detection systems like OSSEC or Wazuh, harden OpenSSH by limiting ciphers, regenerating host keys, etc., add 2FA/MFA, use certificates where possible, and deploy.  There are recommended tools, steps, or parameters for each step, which is nice.
In the second article, Edge describes an experimental feature in the upcoming OpenSSH 8.9, where keys can be ssh-add’ed to ssh-agent for limited uses, e.g. only connecting to certain hosts or even to certain users at certain hosts.

Lets Encrypt for internal hostnames - Julien Savoie
Savoie describes using dynamic DNS (RFC 2136) capabilities in certbot to temporarily create TXT records for relevant internal-visible-only hostnames so letsencrypt can generate certificates.

Emerging Technologies and Practices
AWS Free Courses - AWS
AWS has moved a lot (most? all?) of its self-paced online courses to being free, possibly following Azure’s lead - e.g. here’s an FSx for Lustre Primer.  It’s a good move, but I don’t know why it took so long.  I genuinely can’t comprehend why some vendors charged or even continue to charge money for this kind of training; if I were in charge of such a company I’d be chasing technologists down in the streets trying to get them to get certified in my various technologies.  Yes, good courses cost real money to develop, maintain, and even run if there’s a lot of (e.g.) cloud resources that get spun up for hands-on components.  But do you want them using your stuff or not?

Kubernetes at Home With K3s - Bruno Antunes
We’ve spoken before about edge distributions of Kubernetes like k3s (#40, #56).   Here Antunes walks us through the next step of playing with Kubernetes after poking at Minikube - running a real (if lightweight) Kubernetes with real pods on your system using k3s and virtual box, and auxiliary tools like k9s (think top), lens (a GUI), and Portainer, then setting up a simple web service with nginx ingress, let’s encrypt, and storage, and then more complicated stacks (like using Kompose to translate a docker-compose setup of a wiki, BookStack, into a helm file).
Refreshingly, Antunes is less Kubernetes fan and in more of a  “well, I guess this is what we need to use given the alternatives” sort of situation, so it doesn’t come off as a sales pitch.  And by the time you’ve worked through this post, you will have seen a fair chunk of Kubernetes functionality.

NVIDIA Research Plots A Course to Multiple Multichip GPU Engines - Timothy Prickett Morgan, The Next Platform
There’s some interesting material here riffing on a paper by NVIDIA Research that went online in Dec 2021, describing how NVIDIA might make multichip GPUs - probably not shocking, since AMD is already going there (#100).  Maybe more interesting, and certainly catching the attention of HPC twitter, is that the same paper suggests that deep learning and HPC are going to de-converge a bit, with proposed different GPU SKUs specialized for HPC and for AI/DL.  The deep learning specialized GPUs will have even larger caches.   Under NVIDIA’s simulations, HPC workloads benefited less from high-bandwidth access to large caches, while the AI workloads needed a very large “resident set” of high-bandwidth memory.

Calls for Submissions
Still a flurry of new-year CFPs and summer REU opportunities:
Call for May 2022 mentoring communities - Outreachy, Community deadline 25 Feb
Humanitarian open source communities or Open Science communities can apply for $6,500 to hire an intern from non-overrepresnted groups; like Google Summer of Code, communities apply for a place, then mentors submit projects, then candidates apply and are matched.

4th Workshop on Benchmarking in the Data Center: Expanding to the Cloud Workshop (BID 2022) - 2 or 3 Apr, Virtual, Papers due 31 Jan
This and GPGPU are held in conjunction with PPoPP22.  From the call:

High performance computing (HPC) is no longer confined to universities and national research laboratories, it is increasingly used in industry and in the cloud. Users need to be able to evaluate what benefits HPC can bring to their companies, what type of computational resources would be best for their workloads and how they can evaluate what they should pay for these resources. Recent general adoption of machine learning has motivated migration of HPC workloads to cloud data centers, and there is a growing interest by the community in performance evaluation in this area, especially for end-to-end workflows. Benchmarking has typically involved running specific workloads that are reflective of typical HPC workloads, yet with the growing diversity of workloads, theoretical performance modeling is also of interest to allow for performance prediction given a minimal set of measurements.
We invite novel, unpublished research paper submission within the scope of this workshop. Paper submission topics include, but are not limited to, the following areas: Multi-, many-core CPUs, GPUs, hybrid system evaluation; Performance, power, efficiency, and cost analysis; HPC, data center, and cloud workloads and benchmarks; System, workload, and workflow configuration and optimization

The 17th International Workshop on Automatic Performance Tuning - 3 June, Lyon, Papers due 31 Jan

iWAPT (International Workshop on Automatic Performance Tuning) is a series of workshops that focus on research and techniques that address performance sustainability issues.  It provides an opportunity for researchers and users of automatic performance tuning (AT) technologies to exchange ideas and experiences while applying such technologies to improve the performance of algorithms, libraries, and applications; in particular,
on cutting edge computing platforms.

The 14th Workshop on General Purpose Processing using GPU (GPGPU 2022) - 2 or 3 Apr, Virtual or Seoul, Papers due 8 Feb
From the call:

Massively parallel (GPUs and other data-parallel accelerators) devices are delivering more and more computing powers required by modern society. With the growing popularity of massively parallel devices, users demand better performance, programmability, reliability, and security. The goal of this workshop is to provide a forum to discuss massively parallel applications, environments, platforms, and architectures, as well as infrastructures that facilitate related research. This year, we are no longer limited to GPU applications and architectures. We welcome research related to any highly parallel computing accelerators and devices. Authors are invited to submit original research papers in the general area of massively parallel computing and architectures.

The 5th International Workshop on Edge Systems, Analytics and Networking (EdgeSys'22) -  5 April, Rennes, France (hybrid), Papers due 11 Feb
Co-located with EuroSys 2022, this workshop aims to “discuss the latest research ideas and results on edge systems, analytics and networking, especially those related to novel and emerging technologies and use cases.”

25th Workshop on Job Scheduling Strategies for Parallel Processing (JSSPP 2022) - 3 June, Lyon, Papers due 13 Feb

JSSPP welcomes both regular papers as well as descriptions of Open Scheduling Problems (OSP) in large scale scheduling (see below). Lack of real-world data often substantially hampers the ability of the research community to engage with scheduling problems in a way that has real world impact. Our goal in the OSP venue is to build a bridge between the production and research worlds, in order to facilitate direct collaborations and impact

HPC-IODC: HPC I/O in the Data Center Workshop - 29 May - 2 June, papers due 24 Feb
This workshop held with ISC is looking for papers covering “all aspects of data centre I/O”, including:

Application workflows
User productivity and costs
Performance monitoring
Dealing with heterogeneous storage
Data management aspects
Archiving and long term data management
State-of-the-practice (e.g., using or optimising a storage system for data centre workloads)
Research that tackles data centre I/O challenges
Cloud/Edge storage aspects

Random
The argument in favour of using unsigned ints by default.  Relatedly, gotchas from the “usual arithmetic conversions” in C/C++.
The simplest web server in bash, with while + nc.  Relatedly, carbon.now.sh for making pretty images of short snippets of source code.
GitHub actions will now suggest possible workflows when you start to create a new workflow.
Five options for creating backups of git repositories.
iOS shortcuts for disabling twitter (or other time-wasting apps) before a certain time or unless you’ve done a certain amount of something else.
A key to being a good decision maker is having a good sense of the likely accuracy of your assumptions.  How confident are you in your answers to true/false questions?
An immutable, tamperproof, cryptographically secure, centralized database.
I needed this a few weeks ago, when I had to pass a variable to an awk script (I just did, ok?) - passing runtime data to awk.
Another fast and nerdy shell prompt - starship.
Embedded and HPC development have in common the resource-limited need for parsimony - in embedded because the resources are small, in HPC because the need for resources is large.  Trice is a small tracing library for embedded software.
A SQL playground in the browser based on SQLite - SQLime.
Another random one-on-one question generator.
Tuning vector addition and GEMM matrix-matrix multiplication in webassembly.
Fast hash tables for (single-node) parallel programming.
Simultaneously learn DMARC and test your own SPF/DKIM/DMARC setups with the “Learn and Test DMARC” console.  If none of those acronyms mean anything to you, you have made better decisions in life and email hosting than I have; count yourself lucky and move on.
Whether you support work in the digital humanities or microscopy or fluid simulations, being able to rapidly and interactively shows user huge images is valuable.  Here’s how the Rijkmuseum has optimized large full-screen images.
CryptoLyzer - a cryptographic settings analyzer for TLS, SSH, and HTTP.
Edge computing, including support in tools like fly.io or cloudflare workers, is going to be interesting for large-scale data collection for some kinds of research projects.  If you’re interested in playing with fly.io (they’re doing interesting work with networking, firecracker…), they now have 3GB of free storage and even Postgres DBs.
Like cryptography or linear algebra libraries, licenses and other legal documents are not a roll-your-own kind of deal.  However one might feel about it, there are real, legally sound “open source for non-commercial purpose” licenses out there - don’t try to cobble together a dual-license MIT, but not for commercial use situation, it doesn’t work.
If you’re going to implement a work queue in a database, first, you shouldn’t.  Second, you should know about SKIP LOCKED.

That’s it…
And that’s it for another week.  Let me know what you thought, or if you have anything you’d like to share about the newsletter or management.  Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology.  It’s teams, it’s communities, it’s product management - it’s people.  It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly.  But no one teaches us how to be effective managers and leaders in academia.  We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.

Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 144 jobs is, as ever, available on the job board.
University of Oxford - Senior Data Scientist/Research Data Manager, Oxford UK 

Accountabilities will include working jointly with researchers to define projects, capture data requirements, manage, transform and curate the electronic health record in readiness for large scale data analytics  and modelling. In addition, the post-holder will generate written documentation such as reports to funders and internal policies/guidance documents. The postholder will work closely with a wider team of programmers, clinicians, clinical researchers and administrators in multiple departments within the Medical Sciences Division. You will have an undergraduate qualification or equivalent in a scientific or health-related field; experience in data management e.g. data processing, data cleaning, data manipulation and a high level of attention to detail in order to deal with and process large complex datasets. As the team expands, the post holder will provide guidance to other members of the research group including research assistants, PhD students, and/or project volunteers and users of the QResearch database.
Senior Software Engineer, Graph Sampler - RAPIDS - NVIDIA, Austin TX or various remote USA 

NVIDIA is looking for a self-motivated software engineer with strong Graph and Graph Neural Network (GNN) application and development experience to join NVIDIA's RAPIDS cuGraph team. RAPIDS is the open-source suite of libraries for GPU-accelerated data science, with cuGraph being the library for accelerated large scale graph analysis. RAPIDS cuGraph is growing support for GNN training through better support for substantial graph and faster sampling. As graph sizes grow, sampling can consume a significant portion of the GNN training time. We are looking for an engineer to help us both expand the list of sampling techniques and improve performance across all size graphs. RAPIDS supports customers and partners ranging from individual data scientists to the world's largest supercomputers and Fortune 500 companies. As a member of RAPIDS cuGraph team, you will help develop new algorithms, heuristics, and techniques for solving complex graph problems faster. Join our diverse team to support NVIDIA's vision by developing Python code for addressing the toughest problems in graph analysis and graph neural networks
Data Services Manager - Queen's University, Kingston ON CA 

The Data Services Manager is responsible for the overall planning, delivery, monitoring and performance management of the Centre for Advanced Computing (CAC)’s data services strategies and solutions. This position provides strategic guidance to the CAC leadership team and University research community on data management policy and plan design, development and implementation, and advises the Director, Research Enablement team, and Systems team in making strategic decisions for research data (public, PHI, PII, IP) and non-research data (IP, PII, and KPI metrics). This includes the ingestion, creation, visualization, analysis, storage, backup, transfer, and sharing of data in compliance with the University’s research data policy. The Data Services Manager plans, coordinates, analyzes and adapts business processes and procedures to advance Data Management practices in partnership with key business areas, and oversees execution for data management related projects, special initiatives and activities managed within the CAC.
Software Engineer - Experienced - Sandia National Laboratory, Albuquerque NM USA 

Do you enjoy working with others to design solutions to challenging technical problems, and then making those solutions a reality? We are seeking an experienced software project leader to lead software engineering teams and work closely with other partners to develop software engineering solutions that support a wide array of computational science and engineering (CSE) efforts. This position offers the opportunity to tackle important problems that meet the requirements of a wide variety of customers across CSE and provide general and robust software solutions.  Lead various planning, standup team meetings, demos, and retrospectives, per multiple projects’ software development lifecycle processes, then coordinate design, development, and deployment of software. Lead teams to develop efficient and robust software solutions to technical problems that can be easily maintained by future engineers
Software Development Manager - Anyon Systems, Vancouver BC or Montreal QC or Toronto ON CA 

Anyon Systems is a quantum computing company . The company's headquarter is in Montreal, QC and we have satellite locations in Toronto and Waterloo, ON as well as Vancouver, BC. We work at the cutting edge of technology to develop and commercialize superconducting quantum computers. We deliver near term superconducting quantum computing platforms including the quantum computing hardware as well as software stack to early adopters such as High Performance Computing (HPC) centers, government agencies and enterprise research labs.  You will be leading our Infrastructure Software Team, a team of engineers and scientists dedicated to the develop the software infrastructure that allows users to use our quantum computer.  Your primary focus is building a world-class team (hiring and coaching), and putting them in the best position to succeed. You must also efficiently coordinate across departments to accomplish collaborative goals.
Machine learning / High Performance Compute Program Manager - AMD, Markham ON CA 

As program manager in AMD’s machine learning software engineering team, you will drive end-to-end delivery of leading-edge technology in high performance GPU-accelerated compute and machine learning for the Radeon Open Compute software and Radeon Adrenalin Software stacks. You will learn about how the power of open-source software can be applied to tackle real-world problems. You will interact with product management, customers, software and hardware engineering teams, quality assurance and operations in a new and growing team.
Project Manager, Data, AI and DevOps - AstraZeneca, Cambridge UK 

The Scientific Computing Platform (SCP) is AstraZeneca's state-of-the-art computing environment to pursue todays and tomorrows in-silico challenges. It strongly focuses on a platform concept, building capabilities and services around central building blocks. At its heart it uses 3 compute environments, a classical InfiniBand/Slurm HPC cluster, an OpenStack private cloud as well as various public cloud for elasticity and scale. To exploit these resource pools most optimally, the SCP is deploying strong DevOps tooling and cloud native technologies. It seeks to adjust and adapt according to changing requirements and follow the science.
Program Manager Data Strategy - Sanofi, Toronto ON CA or Cambridge MA USA 

The Program Manager Data Strategy is an essential role in delivering on the various programs required to manage the enterprise data program and enable the data strategy. The Program Manager Data Strategy role includes components of strategy (understanding and translation of the Data Organization vision and priorities into results), operating model design (development and implementation of an operating model to deliver on product development, programs and service delivery), program management (leading highly important projects in white spaces as well as providing oversight for selected program areas), support delivery of quality work and leading flexible data pods (consultative groups responsible for the temporary support for major programs). Facilitate, manage and implement the design of a new operating model for data. Leads pods acting as in-house consulting capability, responsible for realignment and support of major program. Developing strategic documents and articulating the value associated with data products developed within the AI factory.
CAMEO Project Manager, UCD School of CS - University College Dublin, Dublin IE 

University College Dublin has secured funding under the Department of Enterprise Trade and Employment's (DETE) Disruptive Technology Innovation Fund (DTIF) to establish a new Earth Observation platform designed for non-specialist users. The successful candidate will report to the Lead PI (Prof Michela Bertolotto) at UCD and the Programme Steering Committee, providing leadership in all aspects of financial planning, administration, and management of the project. The Project Manager will be a key member of the management team and support the PIs in adhering to the work-plan, its associated work packages, tasks, deliverables, and milestones over the three years of the programme. They will oversee R&D activities, identify new collaborative research and future business development opportunities with partners, coordinate research personnel and stakeholders, and review/update the research work-plan and risk register as appropriate.
Project Manager – Machine Learning Department – School of Computer Science - Carnegie Mellon University, Pittsburgh PA USA 

Carnegie Mellon University’s School of Computer Science is searching for a full-time Program Manager! In this role, you will be responsible for the full range of Project Management activities for multiple projects including handling Project Work Statements, Work Plans, and other customer contract documents; preparing special reports with information gathered from a variety of internal and external sources; identifying and recommending solutions to variances; and working with other departments and external partners to ensure a coordinated effort. Headquartered at Carnegie Mellon University, the Delphi research group was founded in 2012 to advance the theory and practice of epidemic forecasting. Our long-term mission, simply put, is to make epidemic forecasting as widely accepted and useful as weather forecasting is today. Delphi is a team of some 40 data scientists and software engineers, with projects spanning various collaborations in academia, government, tech, and healthcare. We are looking for a dedicated project manager to coordinate between and lead the group's continually diverse set of ongoing projects and assist the Delphi leadership team in translating strategy into concrete deliverables.
Computational Biology Lead - Esya Ltd, London UK 

This role is an exciting opportunity for a high performing candidate with extensive multi-omics experience to join a ground-breaking start-up in London and make a significant impact on the company’s development as they go into a rapid growth phase. You will lead the development of Esya’s neuromultiomics platform underlying our diagnostics products, developing cutting edge solutions at the intersection of machine learning, genetic sequencing technology, biological data, and distributed systems. These contributions will drive our mission to diagnose neurocognitive impairments at its most actionable and early stages. As part of an interdisciplinary R&D effort, you will help us recruit and build a team of computational biologists, machine learning scientists and software engineers.
Principal Scientist Manager - Microsoft, Richmond WA USA 

We are the Gray Systems Lab (GSL), the applied research group for Microsoft Azure Data. We tackle research problems in the areas of databases, big data, cloud, systems-for-ML, and ML-for-systems. Our mission is to advance the Microsoft Azure Data organization, and we do so in three ways: (i) innovating within Microsoft products and services (ii) publishing in top-tier conferences and journals, and (iii) actively contributing to open-source projects. As a manager in GSL, you will have the opportunity to lead a team of researchers, research engineers, and data scientists to advance state-of-the-art in the areas of databases, systems, and machine learning. By design, GSL looks ahead of where the current product roadmap is, exploring how new workloads, hardware, industry trends, or research breakthroughs could affect our products. This creates a broad charter around innovation that provides lots of freedom in problem selection. As a manager in GSL, you have an important role in guiding the problem selection and coordinate the relationship between GSL and our partner product teams. Prior experience in performing research work or technical innovation in industrial settings will serve you well in this role.

                                Don't miss what's next. Subscribe to Research Computing Teams:

            Email address (required)

                Share this email:

                                Share on LinkedIn

                                Share via email

                                Share on Bluesky