Research Computing Teams #123, 28 May 2022
Hi!
My new job, working for a large company that explicitly sells stuff, has indeed been eye-opening — but not in the ways I expected. Mainly I’m confronted with clarity about some of my old jobs, and peer groups I’ve worked with.
I’ve always been sort of puzzled by the decisions made and priorities chosen by some research computing and data teams. Those teams seemed more… insular, somehow, than others. I saw it less often in contract research software development teams, or library research data management teams, or bioinformatics core facilities. I saw it more often in research systems teams — not all, of course, and not only.
Meeting with some of these old peer groups, wearing a vendor hat, and watching them interact with us, it becomes a lot clearer.
The thing is, in this field, We’re all vendors. But not all groups know it.
Thinking otherwise is a completely understandable trap to fall into. It’s especially seductive for people trained as academics who have stayed in the University. I fell for this early in my research computing and data career: “I’m part of the university/research institute, collaborating with my peers for free, same as when I was a postdoc”. Conveniently, this avoided me having to focus uncomfortably on my shift from a researcher to a research support role. It also meant that I could avoid making hard decisions about which research to support how, and why. “Collaborations emerge organically, after all!”
But in Research Computing and Data we are all very much vendors, offering support and services (yes, including expert collaboration) to research groups who have choices about where and how to do their work. We may well have an extensive working relationship with a group and deep knowledge of their needs. They will still take their work elsewhere if they feel it would better advance their research. And they’d absolutely be right to do so.
Research group or RCT, we have the same mission — to advance research and scholarship as best we can. But our roles are different. The researchers know their work best and how best to advocate for it. If they choose to take that work elsewhere, that’s what’s best for the project. It’s our role to make it clear how we can (and can’t!) support the research with our services, and make available the resources necessary for that project to succeed with those offerings.
Sometimes the match won’t work. The researcher will decide their project or programme will best succeed using other team’s offerings. As long as that decision is an informed one, that’s success! A better match was found to advance the research a bit better. That’s our mission. We showed what we could do, and they found another option. Our teams don’t own, aren’t entitled to, research groups or projects.
We, too, can and should “opt out” of a match. Maybe the programme isn’t a good match to our offerings. (Perhaps we even recommend another team!) Maybe the research group isn’t ready to collaborate with us yet. Or perhaps it could be made to work, but it isn’t the right choice for us. It would take too much resources to support the effort well, and our mission — advancing science and our organization’s priorities — would be best served by allocating those resources elsewhere. Recognizing that is also a success. Particular research groups or projects don’t own us, either.
Matchings or mis-matching can both be failures. And that has consequences for how we should lead our teams. Failures include:
- The match didn’t happen because the two sides (‘vendor’ and research group) didn’t know about each other, or understand the other’s needs or offerings
- The match happened and the project failed/sputtered out because there wasn’t enough or the right effort put in
- The match happened and the project was successful, but required far too much effort put in that could have been better spent elsewhere
- The match happened and the project wasn’t as successful as it could have been because the services didn’t match the project as well as was thought.
So yes, we’re vendors. We’re “selling” open-source software, custom software development, data management, or systems services… or commercial equipment or software support. To best advance science and our institutional priorities, we should working hard to make it clear what we can offer, what we can’t, and directing researchers and scholars elsewhere when when that’s what’s best for them or us. That means listening and “marketing” and accepting that we’ll often hear “no” (even when they could make it work) and that we will say “no” (even when we could make it work).
In a way, this realization that we’re vendors makes us more like successful research groups rather than less. The most successful research groups know that there’s a zillion projects they could work on, questions they could ask given infinite time. But resources and time are finite. So they laser-focus on the areas with of funding available, skills available, and high impact, where they can best advance research. They communicate their capabilities widely to attract collaborators, advocate for those projects, get used to hearing “no”, and turn down projects they could do, but won’t. They’re specialized, focussed, and relentless advocates and communicators. They also run highly effective teams.
With that, on to the roundup!
Managing Teams
The Manager’s Handbook - Alex MacCaw, Clearbit
This is a really solid, free handbook for new managers or people thinking of becoming managers. Some things I particularly like about it:
- It covers likely failure modes right from the beginning
- Lots of emphasis on hiring
- It covers managing yourself early on - your behaviour and your mindset are the only things you really have any control over, and I think this is under-addressed in other resources
- There’s a distinction implicitly made between the behaviours you need managing individuals (one-on-ones, coaching, feedback) and managing teams (working as a team, conflict resolution).
I don’t love everything about it - it includes things I’d not, and doesn’t include things I would - but it was made for a particular company’s culture, not mine. It’s a thoughtful and solid resource to have to hand for pointing people to, building on for your own organization, or to read (it’s always good to revisit the basics).
How to Respond When an Employee Quits - Rebecca Zucker, HBR
This is one of the situations where you see how far a new manager has come on the “managing yourself” skills. It feels like a disaster, even a betrayal, the first time a team member quits. It isn’t either, of course — it’s good and healthy or people to move on, and is an opportunity for the team as well.
The only correct initial response when a team member tells you they’re quitting is something along the lines of “I’m sorry to hear that, but congratulations!”. As Zucker points out, that isn’t easy, but it’s necessary. First because you may work with that person in the future, or have opportunities to have their friends and colleagues join your team. Second because it allows you to productively move to other important parts of the conversations, like asking for what you and the team needs before they go, and possibly learning about things you could improve retention.
As we’ve said before, a team is a group of people that hold each other accountable. But for that to be possible and effective, there have to be shared expectations. In What New Teammates Owe To One Another, the team from Nobl has a suggested onboarding document that new team members are walked through of team expectations. (Obviously this has to be hashed out with existing team members first!)
Technical Leadership
When Everything is Important But Nothing is Getting Done - Roman Kudryashov
Kudryashov walks us through a case study of getting a team unstuck:
The last company I worked for was a mid-stage startup with growing pains. What had started out as a nimble organization able to create impressive software now felt stuck. Everything was high priority, nothing ever seemed to get completed, morale was low, and it was starting to coalesce into a learned helplessness where the only solution seemed to be resignation…
You, gentle reader, and other long-time RCTers won’t be surprised at the core elements of the solution — ruthless prioritization and reduction of work in progress, activities dropped entirely, one project being worked on at a time, and a clear definition of done. But knowing the solution is is the easy part. Kurdyashov’s article spends a lot of time on the (hard! time-consuming!) other part: getting to the point where the solution is possible.
Like any big change management effort, the key factors that lead to success include driving a consensus that there is a problem, and that to address it some very big things are going to have to change. (Unfortunately, it’s too easy for people who should know better to fall into the sentiment of “I want things to get better, but I don’t want to change anything.”) Some parts of the problem of too-big projects or everythings-top-priority can come from elsewhere in the organization, and then those are people who have to be part of the consensus.
And of course there has to be contininual followup. The natural state of work is not clear focussed work on widely-agreed-upon priorities. Instead, if organizations are left to themselves, entropy will build up and teams will find themselves in the same situation again. But as Kudryashov describes, it’s worth all this hard, deliberate, on-going work:
It took roughly six months to make this transition and another three months to continue refining the process, but we were in a good place. Projects were unblocked. We delivered two major time-sensitive contracts… on time, and with historically low defect rates. Morale was up across multiple teams, which reflected in better satisfaction scores on employee surveys and more importantly on a radically reduced turnover rate (we went from a 50% turnover rate per quarter to something like 4% quarterly turnover, including zero turnover one month).
Product Management and Working with Research Communities
Uncurled - everything I know and learned about running and maintaining Open Source projects for three decades - Daniel Stenberg
Stenberg, best known for curl, has a great book on the project and product management of a successful open source software product. There’s very useful stuff in here for those hoping for their open source product to take off. Some of the points I find particularly valuable:
- Just do it
- The project is “we”
- If it’s not alive, it’s dead
- Newcomers can be awesome
- Contributors will not stick around
- Over time, maintenance grows
- Volunteers make things different
- Only releases get tested for real
How to Build an Open Source Community - benny Vasquez, The New Stack
Overlapping with but distinct from Stenberg’s article, Vasquez talks more about the governance of setting up the community. Vasquez describes personas for possible levels of engagement you’ll see, engaging the right people early on, creating the culture and processes you want to see, and how to empower community members.
HTCondor Week 2022 was this week, and I don’t think it’s often enough commented on what an excellent job has been done managing it as a product over the last 34 years (!!) when it started as a cycle scavenger. As was outlined in Miron Livny’s talk,
What began as a hunter of idle workstations is a now a manager of HTC workloads. “It’s not about the capacity anymore, it’s about the management of the workflows”.
That shift - from cycle scavenger to high-throughput computing workload manager - is a remarkable one, and too many research infrastructure efforts would have clung on to the original mission, eventually fading away into oblivion.
As a sign of the success of this approach, there were stories highlighted this week of people bringing their own resources - e.g. they didn’t need to scavenge the cycles, they had pre-existing dedicated resources - wanting help setting up HTCondor. They wanted to move to using HTCondor up because it was a really nice workflow management tool for high throughput computing with good researcher experience. That’s a remarkable product management success.
Research Software Development
A practical guide to research software project estimation - Chase Million
We know that waterfall-style, “design everything at the start” project management doesn’t work for research software. Unfortunately, the way most research software development efforts are funded, we kind of need to do that anyway.
Funders who are going to shell out $500k+ for an effort want, understandably, to see a plausible plan that makes them confident of a reasonable likelihood of success. Also understandably, they’re not be overly concerned if the plan doesn’t play out as predicted — this is research, after all. And putting together such a plan, as Million points out, is a great opportunity to bring the relevant stakeholders together to hash out a consensus on what the right thing to build even is and what its scope should be, and whether you already have the right people you need. Even modern agile practices almost always start large efforts with a big kickoff meeting where similar topics are discussed.
Million gives a good, practical, overview of how to plan out a large, multi-stakeholder research software project. The document is worth reading and/or circulating to novice stakeholders in advance of a grant proposal development meeting. The process the document describes naturally produces, as outputs, a consensus on what’s to be done and the kinds of rough-and-ready project planning documents that a funder will want to see. It’s quite good, and I haven’t seen anything as comprehensive.
Stripe is widely known within tech for having excellent API documentation. They’ve open-sourced MarkDoc, an internal tool for generating rich and nice-looking documentation pages using augmented Markdown syntax. MarkDoc is new but it’s already attracting users. As someone who always found reStructured Text powerful but confusing, this seems really interesting.
Julia is a really exciting language with a lot of advantages for research computing - people make amazing DSLs for things like differential equations using it. But this article by Yuri Vishnevsky describes some of the downsides that I’ve seen - inconstant product management leading to a culture where serious correctness bugs or other brokenness can persist.
Interesting - Intel as a CUDA-to-SYCL conversion tool.
Research Data Management and Analysis
Desirable Characteristics of Data Repositories for Federally Funded Research - White House Office of Science and Technology Policy (OSTP)
So the OSTP’s Subcommitte on Open Science and the National Science and Technology Council have put together guidelines for data repositories for federally funded research (does anyone who understands US science policy know why this was done at this level and not by the granting councils as elsewhere?). US Funders are expected to make use of this document when deciding whether a repository is adequate as part of a data management plan, or presumably when funding such repositories.
The body of the document is only seven pages, and lays out the desired characteristics with commendable clarity. Nothing is shocking in here, but there are some characteristics I’m particularly pleased to see included and that naive repositories will have some trouble with:
- Retention policy
- Risk Management, and for sensitive data, Breach response plans
- Organizational and Technical sustainability
- Unique persistant Identifiers
- Curation and Quality Assurance
- Provenance
The Turing Way - The Alan Turing Institute
If you’re setting up a data science/ML/AI group, or teaching students about those topics, this is a nice resource to have to hand. There’s guides on:
- Reproducible Research
- Project Design
- Communication/Dissemination
- Collaboration
- Ethical Research, and
- Maintaining a community
Research Computing Systems
Really cool Arm stories coming out this week - Timothy Prickett Morgan over at the Next Platform sketches out a possible Ampere roadmap for the next few years, Amazon’s first Graviton3 instances look amazing according to Michael Larabel at phoronix, and Microsoft is announcing a cute Arm-powered developer box and native Arm developer tools. Given the growing amount of remote-development offerings coming out (e.g. VS Code has a new dev container CLI), that “Volterra” box will be useful for developing even for non-Arm systems.
Obviously I have a conflict here, as NVIDIA will be selling its own Arm CPU soon, but I don’t think it’s partisan to be excited about the explosion of credible CPU options for research computing. It’s fantastic news for the diverse range of needs we have in our profession.
Google’s joining the Open Secure Software Foundation, working (with Synk) on some tools for ensuring software supply chain management with “assured packages” that undergo significant testing and quality control.
Emerging Technologies and Practices
Interesting update on Google’s TPUs that we talked about in #121 - the new TPUv4s will be in pods of 4,096 chips, with “dozens” of such pods available soon, most or all running on low-carbon energy.
Good interview between Tobias Mann at the Reg and Jim Pappas of Intel and the CXL foundation about what the upcoming Compute Express Link (CXL) is and its likely role in composable systems. They’re walking a fine line between expectation-setting and I think genuine excitement about what will be possible:
“Over this next year, the first round of systems are going to be used primarily for proof of concepts,” he said. “Let’s be honest, nobody’s going to take a new technology that’s never been tried.”
What I keep hearing is that CXL 1.0 and even 2.0 will be more like proving grounds and prototypes, while when CXL 3.0 systems start landing things will be getting interesting.
Random
This is pretty cool - at shell.duckdb.org you can do analytic queries any supported parquet or CSV file on GitHub or elsewhere on the web entirely on your browser. Webasm + embedded DBs for the win.
Relatedly, an interview about and architecture of Datasette, the sqlite-based query-a-dataset tool.
You likely all know this by now, but GitHub’s markdown now has math support with mathjax. (It’s not perfect! GitLab made some different choices which arguably work a little better).
Imagining an alternate history based on SAGE, the (military) Semi-Automatic Ground Environment, where team collaborative computing advanced further before the personal computer was born. It really is remarkable how some very sophisticated early approaches to using computers for collaborating on projects from the 50s-early 70s just vanished from memory.
IBM’s 1957 Fortran compiler implemented order-of-operations with only parenthesis and basically a sed script and wow that doesn’t look like it should work at all.
Convert JSON to CSV with jq.
Continue to love all these query-data-files-in-place-with-SQL tools - here’s sneller, for fast(!) SQL over JSON.
Making JSON more useful in SQLite with virtual columns.
Can’t find grid paper you quite like? Make your own with gridzzly.
3D graphics in the browser with WebGL… or css.
Good overview of colour schemes for scientific figures from the team at Northwestern.
Implementing a lock-free bounded concurrent-reader queue with only 32bits of additional state.
Build a proof of concept distributed Postgres.
Love spreadsheets? Love software from the 90s? Love Linux? You can now run Lotus 1-2-3 on Linux.
Too much quiet work time? Wish you could get a MacOS oriOS notification for every comment, issue, and PR on one of your GitHub repos? Trailer.app is here for you.
In a take that will infuriate most, the case for using tabs in some places and spaces in others.
A login-free and ephemeral docker image registry for, e.g., CI/CD so you don’t need to store credentials.
Hmm - log C function calls with Cosmopolitan Libc, which I hadn’t heard of before.
The case against Shapefile for geospatial vector data.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
If you’ve just had or are having a long weekend, I hope you enjoy(ed) it! Either way, good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below; the full listing of 151 jobs is, as ever, available on the job board.
Research Computing, Data Security and Compliance Manager, Center for HPC - University of Utah, Salt Lake City UT USA
This position will be the lead in planning, directing, and managing the information security posture for CHPC at the University of Utah. The CHPC resources comprise HPC Clusters, Virtual machine deployments, storage, and other systems. The applicant will work with the CHPC team to ensure that services operate in a manner consistent with U of U policy 4-004 and comply within the targeted scope (security zones). Depending on the scope, compliance must meet one or more of the following: HIPAA, FISMA Moderate, ITAR, CMMC 2.0, NIST 800-171 rev2, CUI. The applicant will work in day-to-day operations to improve security posture, analyze threats, develop counter measures, and advise department security policies and procedures. Additional daily operations may include installing new software releases and system upgrades, evaluating, and installing patches, and resolving system related problems. The applicant will also monitor system configuration and data files to ensure data integrity, system integrity and compliance.
Research Computing Program Manager, Foundations for Research Computing - Columbia University, New York NY USA
The Research Computing Program Manager will lead the activities of the Foundations for Research Computing (FORC) program. The aim of the program is to train Columbia researchers in computational skills and overall computational literacy. As part of the Columbia University Libraries’ Digital Scholarship unit, the Program Manager will advance the program and special events for researchers in close cooperation with other colleagues in the Libraries, Columbia University Information Technology, and the Office of the Executive Vice President for Research.
Senior Software Dev Engineer, FSx for Lustre - AWS, Atlanta GA USA
As an engineer on Amazon FSx for Lustre, you will design, build and operate petabyte-scale distributed file systems. Partnering with your team, you will be solving challenging distributed systems, systems programming, and networking problems using the latest hardware technologies AWS has to offer. You will be a key contributor to the future direction and growth of the service. You will be part of a team of highly productive and action-oriented professionals as we architect and implement the next set of features and functionality. Come join the Amazon Web Services engineering team as we revolutionize the world of high performance computing (HPC) and cloud storage!
HPC Systems Manager - Frederick National Laboratory, Frederick MD USA
Within the Enterprise Information Technology (EIT) group our mission is to develop an enterprise-level, consolidated information technology infrastructure that provides exceptional IT capabilities to the Frederick National Laboratory for Cancer Research (NCI-Frederick/FNLCR) in support of basic, translational, and clinical cancer and AIDS research. The Frederick National Laboratory’s EIT group is seeking an experienced High-Performance Computing (HPC) Manager/Engineer to lead our talented team, enhancing our HPC cluster, optimizing community workflows and customer outreach.
Project Manager, Optical Computing - University of British Columbia, Vancouver BC CA
The Silicon Photonics research group from the Department of Electrical and Computer Engineering at UBC (ECE) is collaborating with multiple Canadian industry partners to advance research of systems and phenomena that explicitly involve quantum mechanics and optical computing. Our research group has identified the need for a qualified and experienced Project Manager with a technical background to support and manage research projects relating to neuromorphic photonics, optical computing, and integrated photonics.
Principal Technical PM Manager - AI/ML - Microsoft, Redmond WA USA
This is a great opportunity to have direct impact on Microsoft’s Big Computing and AI cloud offerings as well as our broader platform strategy. This is an exciting time for HPC and AI, as they are undergoing a massive shift. AI technologies are being merged with existing HPC approaches, and both are moving to the cloud. At this critical juncture, the team is expanding and is looking for a leader for team of architects and hands-on engineers leading engagements, and projects focused on our AI/ML scenarios running and optimizing AI training and inferencing workloads at large scale [think Supercomputer Scale].
Senior Software Engineer - Scientific Platform, Computing & Analytics - Amgen, Remote CA or USA
In this vital role you will help build innovative scientific applications with modern technology stacks that will be used by our research partners to drive Amgen’s innovation in drug discovery. You will join our passionate Research Informatics team to deliver solutions by using Agile and DevOps methodologies. You will also support our Research team in crafting and implementing IS solutions for Discovery, Non-Clinical & Early Development.
Compiler Team Leader - SiPearl, Paris FR or Barcelona ES
SiPearl is looking to hire Compiler Engineering Team Leader to join the team. You will work on cutting-edge technologies to design, develop, debug, test compiler software and programming languages. You will be working on advanced compiler optimizations and features specific for Arm Architectures, parallelization and vectorization through compilers, new programming languages support.
Lead Data Manager - Royal Society for the Protection of Birds, Flexible UK
Our Lead Data Manager role involves managing the RSPB’s own conservation data and addressing any challenges in facilitating the flow of data from collection and curation through to analysis and use both internally and externally. With the help of a small team of data managers, you will oversee the RSPB’s conservation data provision services, ensuring that our data are available either through the NBN Atlas (for open data) or through our own in-house data supply (for sensitive data). You will liaise with other organisations to facilitate data sharing with other NGOs, Local Environmental Records Centres, national schemes and societies, statutory bodies and the commercial sector and academia. Working with a business analyst and our in-house development team, you will help RSPB to improve its own data management systems, policies and practices to ensure that our species and habitat data are easily available to everyone who might use them. In particular, you will help drive the continuous improvement of our in-house GIS database (Merlin) and interactive dashboards and reporting tools to support the management and use of conservation data by our staff and volunteers
HPC Senior Software Development Engineer, EFA - AWS, Munich DE
The AWS HPC EFA team is building the software stack that enables low-latency, high-bandwidth networking for HPC and ML workloads. This is an opportunity to engineer systems that enable HPC workloads to scale, interacting with numerous AWS teams and Open Source Communities. As a developer in the AWS HPC EFA team, you’ll partner with research and business teams to build new capabilities that surprise and delight our customers. You’ll be surrounded by people who are passionate, and believe that truly innovative service is critical to customer success.
iCrag Data Manager - Dublin Institute for Advanced Studies, Dublin IE
The iCRAG research centre is currently seeking a Data Manager to manage implementation, maintenance and refinement of data management practices across the Centre. iCRAG is seeking an expert in data management to work with research leaders and Centre management in order to ensure that the significant volume and diverse range of data (geological, geophysical, geochemical, environmental, statistical) produced by the Centre is stored in line with best practices and facilitates open access to both internal (centre researchers) and external (the publics) stakeholders.
Research Computing Product Manager - University of Glasgow, Glasgow UK
It is an exciting time in the University as we look to introduce a new ‘Research Computing as a Service’ (RCaaS) function to provide the right research computing capability and services to support our talented research community. The Research Computing Product Manager will be instrumental in their leadership of a high performing service, that will be a key enabler for world-changing research. They will develop a brand-new service and be accountable for the strategy, strategic engagement, vision, development and delivery of IT services in support of research.
Research Data Lead, University Library and Archives - University of New England, Armidale AU
Located within the Student Experience Division, our priorities are aligned with student success, engagement and collaboration programmes. We are committed to the maximisation of digital access to content and services which bolster our contributions to the University’s overall digital transformation. We are future fitting our vision, services and workforce to nurture a culture of innovation and bold imagination in a digital age. We aim to foster an open and collaborative environment that not only brings world-class information resources to the table, but also showcases the University’s exceptional research outputs to a global audience.
Manager, Machine Learning Applied Scientist - AWS, Melbourne AU
We are seeking to add a Manager, ML Applied Science to an already awesome team. The Manager, ML Applied Science role at Amazon will be a technical team leader working to develop new challenging machine learning applications, services and platforms that optimize Amazon’s systems using cutting edge quantitative techniques.
IT Manager, High Performance Compute - Schrödinger, New York NY USA
We’re hiring an experienced IT Manager to join us in our mission to improve human health and quality of life through the development, distribution, and application of advanced computational methods! As the manager of our High Performance Compute team, you’ll play a crucial role in shaping the future of Schrödinger’s IT strategy. In partnership with the Director and Chief Information Officer, you’ll play a hands-on role in developing company practices that support our on-prem and cloud HPC clusters, creating a world-class experience for our employees who depend on them for their daily activities. This work is key in our effort to help scientists accelerate research and development activities, reduce costs, and make novel discoveries that wouldn’t otherwise be possible. We’re looking to hire someone with a managerial background, a solid competency in the high performance computing field, and who understands how to lead a team.
Software Development Manager, High Performance Computing - AWS, Boston MA USA
We build NICE EnginFrame, AWS ParallelCluster, and the overall experience for customers building some of the largest HPC and distributed ML clusters in the world, while at the same time empowering research scientists and engineers to dynamically scale their HPC workloads to enable scientific and engineering breakthroughs. We enable a broad set of applications for computational fluid dynamics, weather modeling, molecular dynamics, seismic modeling, and machine learning. You’ll be leading a new team in Boston and will be part of a distributed engineering team across US and Europe. The ideal candidate will have strong distributed systems design and software engineering experience, Linux/Unix and networking fundamentals, and a passion for AWS technology.
Sr. Research Computing Systems Software Engineer - Harvard University, Boston MA USA
Harvard is seeking a Sr. Systems Software Engineer that will continue to improve operational visibility of the vast FAS Research Computing (FASRC) infrastructure through strong site-reliability practices. The FASRC infrastructure is core to Science & Engineering, and Public Health research missions supporting over 5,000 researchers. This position will work within a team of RC Systems Engineers to design, implement, deploy, and maintain advanced monitoring, logging, and alerting systems for mission-critical services. The Systems Software Engineering group helps maintain core production infrastructure, provisioning, central version control, central logging, and other systems. This group offers many opportunities to build tools and patterns that help all of Research Computing work better. This is an individual contributor position that will report to the Associate Director of Systems Software Engineering in FAS Research Computing (FASRC).