Success as a manger is defined by a lot of hard work and tough conversations that will pay off over very long timescales. It’s way less immediately gratifying than deploying a new feature or making the CI/CD dashboard lights all green again.
Success or possible success for me this week week: I think I managed to convince some stakeholders to not make a bad and limiting data-related decision which would have limited the scientific effort of a four-year effort; and I’m coaching some team members to take on increasing planning and coordination responsibilities, with an eye towards gauging their interest and current ability towards being leads themselves, over a process which will likely take months. Small steps, tough conversations (there’ll be a lot of feedback conversations about effectiveness in the new responsibilities - many giving positive feedback, but not all), long term payoff.
If you find yourself longing for the more immediate feedback of a “maker” rather than “multiplier role”, be aware that’s a career path that is absolutely possible and badly under-appreciated, as comes up in the “Managing Your Own Career” section. It’s tougher in Universities than it needs to be, but there’s nothing wrong and a lot to be said for going back and forth between the immediacy and hands-on nature of individual contributor work and the big-picture and coordination of people, project, or product management.
But I’m getting ahead of myself - on to the roundup!
Guiding critical projects without micromanaging - Camille Fournier
However, as a senior manager, at some point you can make it harder for your managers to succeed when you give them very little structure to work with. It’s tempting to say “I don’t care how you do any of it as long as it gets done.” But that doesn’t help people figure out what is important to you, so they have to guess at what they share, when, and how.
It’s tough to strike a balance between being involved enough and not being too involved in any major effort. The fact is, if a project is important enough to be your radar as a manager (at any level) radar, it probably involves multiple people reporting to you. At that point it is part of your job to make sure the necessary coordination is happening, and that the objectives of the project are being met.
Fournier started having monthly status meetings on a major project:
First of all, this was a chance for discussion. I got to ask hard questions, and the team leadership got to show off. The team was forced to reach some agreement on the status before showing it to me, and my questions could reveal disagreements that they may not have resolved fully [….] And my presence was good for all of us, because it forced a group that didn’t all share reporting lines below me to get on the same page, and gave each the opportunity to highlight disagreements with the others in real-time when they didn’t feel aligned.
While this comes up all the time in projects, she points out that can be useful in other areas too where there are opportunities for misalignment, or changes are happening that require guidance from someone who sees more context and can share it. In those cases, where there’s no project (and thus no clear end), she councils that it’s important to remember to end the meetings when there’s no longer need for them.
Use a candidate packet to improve your interview process - Jade Rubick
Last week we looked at Rubick’s advice on hiring and recruiting, and one part was to provide a packet of information about the job, the work, and the organization to candidates after initial screening. Here Rubick goes into more details on what to consider including:
It’s a decent amount of work, but it will need updating only occasionally, and will make your organization look considerably more professional, more thoughtful about team members, and thus more trustworthy, than most of the organizations they’re applying to.
Mitchell’s New Role at HashiCorp - Mitchell Hashimoto
From individual contributor to manager, and back again - Gemma Barlow
The founder, once CEO and now CTO, of HashiCorp is taking on a new role - a regular old individual contributor job.
I think we normally think of our next career steps as managers as managing increasingly large efforts, and then becoming a manager of managers. That’s a good and rewarding career path, and you shouldn’t just pursue it just because it’s the default. The engineer/manager pendulum, going back between manager and individual contributor, is also a rewarding path, and one with its own advantages - you get to develop skills both looking at the big picture and maintaining technical depth and currency. You also get to stay strong in multiple very different ways of influencing the work of others rather than leaning on one or the other set of tools.
Personally, I’ve gone from researcher, to research computing staff, to research computing planning as an interim CTO, back to an individual contributor, learning new skills in bioinformatics, into a manager again of a genomics platform effort, into a sort-of-director (in that I coordinate leads). I don’t know what my next role will be, but I’m leaning strongly towards individual contributor again. Being an IC who has seen the big picture can make you a more valuable IC - and it can be more fun work, too, understanding more clearly how your effort plugs into a bigger picture. Being a manager who’s recently gotten their hands dirty recently doing the real work can make you a more valuable manager, too, understanding more viscerally what’s involved with the work you’re coordinating and being better able to foresee issues.
Trying to be a manager and an IC at the same time is a mistake, but doing them sequentially - or cyclically? It can be a lot of fun, and it’s a possibility you should genuinely consider; especially in research, where there is always so much fun stuff to learn.
(Having said all this, I’m not sure I love the fact that Hashimoto is moving to an individual contributor role in the same company. It’s the company that bears his name, where he’s still very close with the CEO and board, after all. How much pushback do you think he will get from “peers” or his manager when he has a bad idea?)
The output of this Hackathon, which we mentioned in earlier newsletters, is a bunch of performance comparisons and Spack build recipes for HPC codes on ARM including ReFrame scripts for CI/CD so that there can be automated tests to validate that they continue to work and don’t have performance regressions. I suspect in coming weeks we’ll see lots of analysis of the work, especially the x86 to ARM performance comparisons.
But apart from the new-hardware stuff it’s worth looking at this as a research community event, in particular what appears to be an extremely successful distributed hackathon. Teams were allocated “points” and bid on particular codes to work on with those points; there was a single repo with dozens of packages (full applications and mini apps often used for benchmarking), a fully-worked example, and a set of guides. AWS and the ARM Users Group provided mentorship through the week, and AWS donated a bunch of cycles; some combination of the two contributed some M1 MacBooks and iPad Pros. In exchange, over a week, there are 31 HPC codes that previously only built on x86 now working on ARM (AWS’s Graviton2 in particular) with reproducible builds, CI/CD testing, and performance characterizations.
I’m sure none of the steps taken were novel to this hackathon, but it’s nice to see how successful it was; I’d love to hear about how communication and coordination worked (there’s already been some discussion of the compute infrastructure they used, spinning up 61 ParallelClusters).
Migration stories are always interesting, and it’s interesting to see that major migrations are hard even for companies with what to our teams seem like essentially unlimited resources:
The 8.0 migration has taken a few years so far. [!!!: LJD]
Facebook had stayed with MySQL 5.6, which first went into GA in 2013 - that sounds old, but the next major release was 5.7 in 2015, then jumped 8.0 in 2018 (6.0 was scrapped, 7.0 had been used for something else. Naming things is hard, turns out even numbering things is hard).
The trick was, in the intervening years, Facebook had built 2,300 patches into their customized MySQL 5.6 and related tooling. Even going from 5.5 to 5.6 was a year long process. Not all of the patches needed to be ported - some were obviated by new functionality in 8.0 - that left “just” 1,500.
Their plan is, aided by integration testing and monitoring:
My understanding form the article is that there are multiple uses of MySQL with different use cases (including use of patches) so that some replica sets are migrated and others are in progress. The long-running migration is causing a lot of maintenance pain - supporting two major versions at a time within a replica set - and they’re uncovering (and contributing fixes to) bugs in 8.0 that other customers haven’t seen.
Anyway, migrations are hard and even at Facebook scale there’s no way through but through.
Rendering 1M+ Particles - Farazh Shaikh
Maps with Django (part 2): GeoDjango, PostGIS and Leaflet - Paolo Melchiorre
No Cost Data Scraping With GitHub Actions And Neo4j Aura - William Lyon
If you’ve been looking for a little project to start playing with graph databases, here Lyon walks us through a simple web-scraping project that can be done using the free tiers of both Neo4j’s hosted Aura service and GitHub Actions. The Flat Data actions that we talked about in #75 that supports data-scraping cron jobs in your github repo, and then another action with a secret to push to Aura. The result is then a graph in Aura that you can visualize and use Cypher to practice writing queries against.
By now you’ve probably heard about this surprising, 7-year-old, and nasty local privilege-escalation attack, where by creating, mounting, and deleting a very deep (like, path exceeds 1GB deep) directory structure, an exploit can overflow a 32-bit int defining the sizes of a buffer. Using that and some eBPF magic, it’s possible to write arbitrary kernel memory to become root.
On a lot of the HPC systems I’ve worked on, creating one million directories on those filesystems take some time, but I’m not sure that means it would necessarily be detected.
Most Linux distros have fixes released, so it’s a good idea to update.
VzLinux Is the RHEL-based Linux Operating System You’ve Never Heard of - Jack Wallen, The New Stack
You’re tired of hearing this, but in my opinion long term stable OS releases are a trap and one the community would best escape; but with CentOS’s future uncertain a lot of people are still stuck in the trap and looking for immediate alternatives.
Wallen points us to VzLinux, a five year old ongoing RHEL clone by virtualization company Virtuozzo. VzLinux is intended as a guest OS, there are plans for VM- and container-optimized versions, there’s guidelines tools for managing containers, there are tools for doing dry-run of conversion from CentOS (and then tools for unattended mass conversion), as well as for snapshot creation and roll-back. The community version of VzLinux 8 is available for download.
Don’t Wanna Pay Ransom Gangs? Test Your Backups - Brian Krebs, Krebs on Security
As the old saying goes: backups are useless, restores are what’s valuable. Krebs interviews Fabian Wosar, CTO at cybersecurity firm Emsisoft, who reports that ransomware targets who had on paper perfectly fine backup strategies often end up paying the ransom because:
It’s really hard to know how safe your backups make you without routinely testing restores, ideally restores from different times in the past.
Pragmatic Incident Response: 3 Lessons Learned from Failures - Robert Ross
Observe Services; not Servers - Piyush Verma, Last9
Like restoring for backups, the best time to test and develop the practice of running incident responses and retros are before you urgently need them, says Ross.
Further, running proper retros for small incidents helps keep small incidents small and can prevent larger incidents down the road. He suggests:
Related to the last point, Verma urges us to focus our monitoring (and then alerting) on services, not servers. Many of us came up in the HPC world, where the distinction between “the node f3c09” and “the service of being able to run a user job on the node f3c09” is pretty fine. But with virtualization increasingly important even in HPC, the distinction exists and matters, and making it is the first step in being able to think about, build and focus on robust user-facing services rather than focussing on the technical inputs that the team uses to provide those services.
Using WebAssembly threads from C, C++ and Rust - Ingvar Stepanyan
WebAssembly has support for threads via Web Workers, and shared arrays via Shared Array Buffers - and on top of that enscripten provides an posix threads API, so it’s possible to use threads from C/C++ and Rust in webassembly!
But Workers behave a little differently than you might expect, and expectations are different in the browser too - you expect the current terminal’s command line to hang while a long running job is doing its thing, while having the browser’s UI freeze seems weird. So there are a few things you have to do a little differently. Stepanyan walks us through some examples.
Introducing the Scalable Matrix Extension for the Armv9-A Architecture - Martin Weidmann
Research computing can count itself lucky that AI is such a commercially important workload, as we continue to benefit from the increased demand for many of the same primitives around vector and matrix mathematics.
ARM-9A is building on existing scalable vector extensions (SVE, SVE2) to build matrix extensions, for
If I understand this early announcement right, SVE2 in ARM8.6-A already had support for “4 dot products at a time”, extending SME’s 8-wide vector dot product to handle (2,8)x(8,2) matrix multiplication, which can help tile matrix operations; SME buids on this with more matrix tile memory operations, and a vector outer product which can then be used as a primitive (such as for rank-one updates in a number of linear algebra operations).
Admittedly, I’ve never been a numerical linear algebra expert, I just always called high-level libraries - but outer products wouldn’t have struck me as the obvious next primitive to add; any readers in that space, do you have any thoughts?
Research Running on Cloud Compute & Emerging Technologies (RRoCCET 21) - 10-12 Aug, Virtual, $25
Organized by the NSF’s CloudBank, this is three mornings (pacific time) of talks on use cases and case studies of research computing in the cloud, covering CloudBank itself, and AWS, Azure, IBM, and GCP.
HTCondor Workshop Autumn 2021 - 20-24 Sept, afternoons European times, Virtual, Free
HTCondor is pretty venerable, which means it doesn’t necessarily get as much attention as shiny newer tech as an engine for executing high throughput computing jobs; but it’s well tested, well documented, and widely used. Worth registering for if you’re interested in learning from the developers, or those giving talks on their use cases.
Ever wanted to say “I’m in: I have full access to the mainframe now.”? Getting z/OS running on an Ubuntu laptop.
LLVM is the JVM of this generation - a platform that’s enabling a lot of new language development and experimentation. Here’s the first of a series of deep deep dives into LLVM’s representations - the bitcode format.
Another deep dive, this one into how Firecracker VMs work - what “paravirtualization” means, why bootup time is so fast, an the VMs so lightweight.
In Python, when to use and not use namedtuples now that there are dataclasses.
Continues to boggle my mind that for between $3-$22USD/minute, you can schedule antenna time to download data from a satellite. Here’s how to use AWS Ground Station from the command line.
A five-year old MPI tool I hadn’t known about: WI4MPI lets you choose whether to run using OpenMPI or IntelMPI (say) at runtime.
Creating 100M rows in SQLite in 33 seconds.
An argument that for distributed data systems, convergence (a concept about individual objects and how they are merged - replicas that have been delivered the same update eventually get the equivalent state) and confluence (a component gives the same output regardless of the orderings of its inputs) are more meaningful definitions than consistency.
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
This week’s new-listing highlights are below; the full listing of 192 jobs is, as ever, available on the job board.
Principal Data Scientist, Chemometrics - Danaher, Courtland NY USA
You are Senior Data Scientist whose focus is multivariate chemometric analyses and advanced process control concepts and approaches. You understand how bioreactors and bioprocessing technology work and highly interested in emerging technologies that will greatly improve biologics manufacturing now and in the future, The Principal Scientist will exploit and extend expertise to translate business requirements into research strategies to explore process concepts and develop new technologies, delivering from feasibility to implementation. Build insight, know-how and skills in multiple relevant fields of expertise. We are looking for this hire to be based in the Boston area to engage and form relationships with KOL’s
RNA Resources Project Leader - EMBL, Hinxton UK
We are recruiting a Project Leader to spearhead the development of the RNAcentral and Rfam databases. Currently funded by the BBSRC and Wellcome, Rfam and RNAcentral are key resources for RNA Biology that serve tens of thousands of users every year and are highly cited in the scientific literature. The RNA Resources team is part of the Sequence Families group led by Alex Bateman who will oversee progress and provide scientific input.
Group Leader – Cell Biology and Biophysics (2 positions) - EMBL Heidelberg, Heidelberg DE
We are seeking to recruit highly motivated group leaders who wish to carry out cutting-edge molecular cell biology research or imaging technology development. In the area of cell biology, we would for example welcome applications that plan to work in unusual, including marine, model systems, or that take a theoretical approach to model dynamic cell biological processes. In the area of imaging technology, we would for example welcome applications that develop novel microscopy technologies to probe the molecular or physical structure and function of cells.
Technology Manager, Data and Health (two positions) - Wellcome, London UK
The Technology Manager will be responsible for: Taking an entrepreneurial approach to evaluating and executing projects related to digital tools which advance and cut across the objectives of Wellcome’s Health Challenges (HC) and Discovery Research (DR) strategies. Drawing on appropriate expertise and data from across physical, biological and social sciences, humanities, industry and other funders to develop a strategic view of the digital technology landscape relevant to Wellcome, Working with the Senior Manager, Digital Technology and Technology Leads to develop productive relationships with other teams in Research Programs. Supporting digital tools funding across Research Programs by helping to set and support common approaches, providing expert technical advice to other teams and working with the DSH team to develop a portfolio view of digital technologies across Wellcome’s current funding and pipeline.
Technology Lead, Data for Science and Health (two positions) - Wellcome, London UK
The Technology Lead will be responsible for: Taking an entrepreneurial approach to developing, evaluating and executing projects related to digital tools which advance and cut across the objectives of Wellcome’s Health Challenges (HC) and Discovery Research (DR) strategies. Drawing on appropriate expertise and data from across physical, biological and social sciences, humanities, industry and other funders to develop a strategic view of the digital technology landscape relevant to Wellcome,
Working with the Senior Manager, Digital Technology to develop productive relationships with other teams in Research Programmes by generously sharing expertise and developing a mutual understanding of priorities and shared objectives. Supporting digital tools funding across Research Programmes by helping to set and support common approaches, providing expert technical advice to other teams and working with the DSH team to develop a portfolio view of digital technologies across Wellcome’s current funding and pipeline.
Manager, Data and Analytics Infrastructure - Janssen Pharmaceuticals, Toronto ON CA
The Manager, Data and Analytics Infrastructure is an integral position in the Data & Analytics Centre (DAC) with accountability for optimizing our data foundation through managing the full data lifecycle needs, including curation, integration, and maintenance of the organization’s syndicated and proprietary data assets. This individual will also collaborate with other DAC team members and cross functional partners in applying advanced analytical techniques in generating actionable and integrated insights from our data assets.
Director, Scientific Computing - Janssen Inc, Spring House PA USA
Statistics & Decisions Sciences (SDS) focuses on statistical and data evaluation needs for discovery, clinical trials, manufacturing, and safety sciences data. Primary responsibilities of the position includes identifying, establishing collaboration with, and supervision of external partners and their services at multiple locations. Strategic and technical leadership is required both internally and externally. This includes collaborations with statisticians, researchers, and information technology professionals. There is a large diversity of needs to serve, such as end-to-end management of software applications for statistical evaluation or for business processes, education in-classroom and e-learning, knowledge sharing, user interface navigation, software/application acquisition and training, and high performance computing for intensive data evaluation, simulations, and statistical research.
Technical Manager, RD&E Scientific Computing - Corning, Corning NY USA
The Scientific Computing function is core to delivering a modeling first research strategy, through tight partnerships with Research, Development, Engineering and Business groups, and consistent focus on anticipating and delivering HPC environment needs. Overview: The Scientific Computing Manager is responsible for High-Performance Computing (HPC) infrastructure, software and user services to support the global modeling, machine learning and technology communities at Corning. The successful candidate will develop and execute strategies to align with business and technical partner objectives; drive adoption of advanced cutting-edge technologies while ensuring robust and cost-effective service and support of existing compute clusters, storage and software portfolios; identify organizational gaps and hire appropriate resources to meet demand. You will be joining a well-established group of strong technical resources, with existing on-prem HPC platforms, and significant growth planned over the next several years. The user community includes talented and curious technical partners from the research, engineering and business functions.
Biostatistician and Research Methods Stream Lead - Cancer Council NSW, Sydney AU
CCNSW and The University of Sydney (USYD), have formed a Joint Venture, The Daffodil Centre (DC). This role sits within the DC’s Research Methods Stream that provides analytical, methodological and systematic literature review expertise across the DC’s research hubs and conducts research on risk factors for cancer; patterns of cancer diagnosis, treatments, costs and outcomes; projections of cancer incidence and mortality; systematic and scoping reviews. Based primarily in our Woolloomooloo office, with flexibility to work from home, the role will be responsible for leading and managing the Research Methods Stream and contributing to a program of research and analysis on the occurrences and causes of cancer and the performance and outcomes of cancer control strategies.
Engineering Manager II - Data Science - Ball Aerospace, Boulder CO USA
The Engineering Strategic Support Unit comprises the organizational talent and technical leadership that enables the successful delivery of high-impact discriminating technologies for our customers’ missions. Our collaborative, cross-functional teams are committed to innovation, integrity, continual learning and strong execution. Engineering Manager II – Data Science oversees and provides guidance to the data science department responsible for developing and delivering data science products and services for internal stakeholders with a primary focus on creating and maintaining processes, managing resources, and developing and deploying staff who assure the technical services meets all of the requirements necessary to achieve the mission of MPA.
Programme Manager: Human Genomics & Translational Data - ELIXIR, Hinxton UK
ELIXIR is seeking an experienced data scientist to coordinate and organise technical architecture and complex projects within the dynamic and international Human Genomics and Translational Data (HGTD) team at the ELIXIR Hub in Hinxton, UK. This position is an exciting opportunity to become involved in many large, multi-million Euro research projects that are transforming the way that human genomic health data is managed across Europe and to drive scientific collaborations that improve the health of citizens, such as B1MG and EJP-RD.
Scientific Manager (Three Jobs) - Wellcome Sanger Institute, Hinxton UK
Due to an acceleration in exciting, scientific growth, we have three new opportunities for a Scientific Manager to join our team in DNA Pipelines. All three positions sit within the Technical Administration team and all involve line management responsibilities; ensuring the teams provide first-rate customer service to end users and effective communication around quality and timeliness of delivery. Two of these roles will have Scientific Service Representatives and Scientific Service Assistants as direct reports. The third vacancy has responsibility for managing the Data QC group within the technical Administration team.
Principal Clinical Data Manager - Labcorp, remote US or CA
Serve as the technical leader on all data management aspects for project(s) including start-up, maintenance, and completion activities. Develop [Global] Data Management Plans and Quality Management (QM) Plans that will deliver accurate, timely, consistent, and quality clinical data. Identify and implement solutions to project data management issues and concerns, including proactive prevention strategies based on metrics and forecasts. Serve as the project and client liaison including management and provision of project specific data management status, cycle time, and productivity metrics. Coordinate and participate in the development of the clinical data model and/or database design and annotate the CRF (eCFR) according to these specifications. Review data acquisition conventions and data review guidelines / diagnostic specifications consistent with the clinical data model, [statistical] analysis plans, and CRF (eCRF) completion / monitoring conventions.
Director, Scientific Computing - Johnson & Johnson, High Wycombe UK
Primary responsibilities of the position includes identifying, establishing collaboration with, and supervision of external partners and their services at multiple locations. Strategic and technical leadership is required both internally and externally. This includes collaborations with statisticians, researchers, and information technology professionals. There is a large diversity of needs to serve, such as end-to-end management of software applications for statistical evaluation or for business processes, education in-classroom and e-learning, knowledge sharing, user interface navigation, software/application acquisition and training, and high performance computing for intensive data evaluation, simulations, and statistical research. Just as important a responsibility and accountability is formal management of a global team of direct reports and contract partners.
Staff Big Data Engineer - Infoblox, Burnaby BC CA
We are looking for a Staff Big Data Engineer to join our SaaS Next Generation Platform team in Burnaby (BC), reporting to Manager, Software Engineering. In this role, you will be responsible for developing, maintaining, evaluating, and testing big data technologies. Our organization is extremely data-driven, where technical innovations happen, and you will have an opportunity to use cutting edge technology across all stages of the development lifecycle and be part of our exciting and innovative initiatives.