RCT #162 - Measure what matters, Part I: Kirkpatrick Models. Plus: Role of a Tech Lead; User Interviews 101; Shared Services Organizations; Maintainer Guides; Refactoring; ROI
I’ve had some great conversations following up from the surveys article (#159) - thanks! You’ll likely remember that my argument was that while surveys can, used carefully, be a useful instrument, there are some common failure modes in our line of work:
- Surveys are too often a way to avoid talking to people
- It’s too easy to ask questions in a survey without putting in the effort to decide what really matters, and what decisions will be make based on the results. So the results don’t mean anything and nothing changes, but it still feels like something’s been done.
Training came up a lot as an example in the conversations. That’s a great application area, because training has a well-developed practice around evaluation. I’d like to use that as a baseline way to talk in the next issue about measuring what matters, and outcomes and impact versus inputs.
So let’s talk about the Kirkpatrick Evaluation Model for training interventions:
This model describes four levels at which one could evaluate the effectiveness of training. They all have value, but at each stage it’s important to understand what’s actually being measured, and why.
Reaction - What is The Immediate Response to the Training
When some kind of evaluation is done for short or medium-length trainings in our business, it’s almost always of the “how satisfied were you with the training today?” reaction-level evaluation.
These evaluations can surface real issues in the inputs to the training. The in-person venue is too hot, the Zoom audio was too quiet, the scheduling was bad, the speaker wasn’t engaging, the materials were unclear. These problems can interfere with learning, and if the evaluations are done after every session, it gives you clear signal about something that might be fixed before the next session. What’s more, they’re easily done, there’s lots of templates one can use, and they typically get good response rates because attendees expect them. Terrific!
Further, if you do these consistently for a while, you can test to see if a major change has made the inputs to the learning experience better or worse. For instance, in #64, we looked at a report by Bristol’s ACRC on the move from in-person to virtual training delivery; because they had been doing these “would you recommend this workshop” questions for a long time, they could monitor how well the transition went. That’s great! It’s good and useful information.
One needs to be aware of the limitations of what’s being measured here, however. A low score gives some signal about problems which could hurt learning. But learning isn’t being measured at all. A session with a very engaging, charming instructor can easily result in very high evaluations without any actual learning. A boring, droning session might end up being filled with real, actionable nuggets of knowledge which get used enthusiastically. We all know the issues with teaching evaluations and how they measure a lot of stuff that has nothing at all to do with how much education actually happened.
Even people in our business, who learn professionally and constantly as part of their jobs, are absolutely rubbish at knowing in the moment whether or not they’ve successfully learned something, much less whether or not it will be useful. When was the last time you read or watched something, were sure you understood it, and at some point later realized you hadn’t understood it at all? It probably wasn’t decades ago, right?
Our purpose in offering these trainings isn’t to be engaging. It’s not to entertain. It’s not to offer amazing venues. If those things help us achieve our purpose, fantastic, but they’re not the goal. So while “Reaction” evaluation is a useful tool, it’s only one step in measuring what matters.
Learning - Did Education Actually Happen?
We put a lot of effort into these events. If we want to know how good we are at achieving our purpose with them, it’s not enough that the inputs are good, and that people have positive reactions to them; they have to be effective. And we can’t know how to get better at achieving our purpose without measuring how well we’re doing.
The first level on which to measure effectiveness, the first step towards the purpose, is the clearest and easiest to measure - did the attendees actually learn what we set out to teach?
For longer trainings, we’re normally pretty good at this. We have one or more assignments which get evaluated. Paired with a pre-test (which is important; our success at teaching the material is measured not by the final grades but the difference between what the incoming students knew and what they leave knowing), this gives us actual data on what is being learned and what we need to work on still. If the tests are light-weight enough (e.g. clickers and the like) we can actually know in near-real-time what we need to spend more time on/revisit in a different way, and when we can move on. We’re normally less great at doing this for shorter trainings, but it’s just as important. It doesn’t have to be a 20 part test with essay questions; but some kind of pre- and post- assessment is the only way to know if our intervention had any educational value.
Once done, this is now something we can compare across sessions, instructors, and experiments and know if something is “better” than before in a real, educational, sense. Note that unlike the reaction, this won’t necessarily give us clear guidance of what needs to change; but it will tell us which sub-topic needs more attention. That, plus experimentation, is enough to guide improvements.
But even that is just a first step.
Behaviour Change - Did The Learning “Stick” and Seem Useful Enough To Apply?
If all we cared about was successfully teaching some knowledge, we could just choose easier knowledge to teach!
Conveying facts and knowledge is well and good, but that isn’t our purpose. In our roles, we teach knowledge and skills with the intent that they get used to advance research and scholarship. People attend our sessions, presumably, so they can apply what we’re offering to teach them. If we taught engagingly, and attendees successfully learned the material, but then never use it — what was the point?
Thinking about how the learning will be used, right from the beginning of preparation, can greatly and usefully shape what and how we choose to teach. It almost always results in a greater hands-on focus, with deeper exposition on theory and facts being deprioritized in favour of teaching some of the basics with immediate relevant application, and provide deeper theory and knowledge either as resources to be looked up as needed, or in followup sessions.
Measuring behaviour change is tricky, partly because it’s the first level where we’re unlikely to be able to make meaningful quantitative measurements, and partly because by definition it involves contacting people once they’ve already finished the training (and so response rates go down). Surveys can work here, but probably aren’t as good as emails and quick followup chats. These will be much more successful if you let attendees know throughout the course that it’s coming.
Cohort-based teaching can help here with both enabling and evaluating behaviour change. In this model the cohort has a way of staying in contact for some fixed period of time (3 months is a common period) after the material, and there’s follow-up discussions, and maybe coaching and Q&A; the idea here is to provide a mini “community of practice” where people can share how they’re using the material and learn from each other.
Longer-term and less-intense training (say, an afternoon a week for 8 weeks) can also help, because it can provide more opportunity to use the learned material between sessions - and class discussion can be used.
Finding out how the material was used or what barriers there are to applying it can help improve future sessions, making the material being taught in a way that’s more useable, and better targeted to audiences that will successfully use the material. It will also start providing useful testimonials that can be used in communicating with future attendees and decision makers: “Six months later I’m still using what was taught in this course, and have learned more skills based on the original material! I’m so glad I took this course!”
Results - Did It Matter? Did We Teach The Right Thing?
Great, so the material was taught well, and people are using it. But to what end?
Our goal is to advance research and scholarship as much as we can, given our constraints and priorities.
This final level of evaluation attempts to answer the questions: Why teach a course? Why teach this course? Does any of it actually matter?
Implicitly or explicitly, we’re teaching these things so that there’s some impact on research and scholarship. We can’t possibly be consistently successful in that end goal - the whole purpose of the exercise - unless we teach with that end result clearly in mind, and repeatedly, constantly, evaluate the impact we’re having.
Maybe our goals with this particular training were:
- Help grad students be more effective by helping them analyze their data faster and in a more repeatable way, so they can publish faster and better with the same or less effort.
- Help labs write the software they need to do specialized work and receive more grants.
- Help promote the economic impact of our institution by helping trainees successfully land post-ac jobs that make use of their skills.
- Help research groups run larger-scale, more multi-physics simulation so their publications can have more impact.
So were we successful or not?
There’s absolutely no chance we can find that information out using surveys. This needs to come from on-going discussions, and long-term followup. And folks, it's messy.
Attribution will be challenging, causality will be unclear. We certainly won’t end up with any quick fixes we can do for the next session.
But these impacts are the point of our interventions. We’re putting our (and our attendees) time into these courses to have some impact on research and scholarship in our institutions. We can’t know we’re doing the right things, or doing things well enough, without some attempt to find out the impact of what we’re doing.
The results will be uncomfortably qualitative. Some researcher will be willing to say “We never would have gotten this grant without our students being trained in this material”. Some researcher will seemingly have the same outcome but not feel that way. The self-reported results are unfortunately the best you’re likely to be able to get. You can also look at overall grant success rate or publication rates or…
This unclear data may be unsatisfying to you, but will be hugely influential to decision makers. That “never would have gotten this grant” quote will be something you can use in conversations and slides again and again. Consistently placing grad students in spinouts or other businesses will really matter for your jurisdiction’s staff and politicians. Prioritizing the work that has the most clear impact will mean your team is spending its time on the right things. And aiming for that impact in how the training is chosen, designed, and delivered will mean that work is as valuable as possible.
Application More Broadly
I’ve spent a lot of time on training evaluation here, but these ideas have much wider application.
We choose a training intervention — or other services or products for our community — because, consciously or unconsciously, we have some theory of change. We think our training will have some impact on research and scholarship in our institution, typically through pretty simple mechanisms; mechanisms which are observable, measurable, and improvable.
Crucially, the impact is the entire point. There’s no reason to do the work except for the impact. This isn’t a hobby; we’re professionals, doing badly needed work, with far more things we could usefully do than time or resources to do them.
And yet, we tend to focus on measuring the inputs than the impact, because the impacts are easy and quantitative to measure.
That’s a bit longer than I intended. I’ll write more about this next week - now on to a slightly shorter roundup!
Managing Individuals and Teams
Across the way this week over at Manager, Ph.D., I covered:
- Feedback as worked example of team expectations, goals and standards, and how we should seek out these worked examples to learn faster
- One-on-one questions for underperforming team members
- Team expectations should be tradeoffs
- Problems with “the team” may be caused by us
- Better LinkedIn profiles
Technical Leadership
The Role of the Tech Lead - Rachel Spurrier
For a small enough team, the manager may be the tech lead, but things get a lot easier once the team is large enough to split out the responsibilities.
A technical lead can be a permanent job title, or it can be a per-project role. The key thing, as Spurrier writes, is that there’s clarity about who is responsible for what.
The tech lead generally focusses on the how - how best to implement something, how to manage the project - and the manager focusses on the who (and the what and why, if there’s no separate person doing that like a product manager). The tech lead, Spurrier tells us, is responsible for technical mentorship while the manager focusses on broader people management and career development.
You can have different divisions of responsibilities and have things work, as long as there’s clarity. Spurrier talks about what skills the tech lead needs, and how to introduce some one into the role. That’s particularly useful in teams like ours where the tech leadership may be a bit fluid depending on what’s being done for who, and so different people may get the opportunity to practice those responsibilities at different times.
Product Management and Working with Research Communities
User interview 101 - Sophie Aguado
Whether we’re following up on a course to see if material is being used, or getting input on current or possible new services or feature or product for our clients, we need to make sure we can talk to our clients without asking leading questions and getting the most out of the time-consuming conversations.
Aguado walks us through what user interviews are good for (qualitative insight about needs) and what they aren't good for (quantitative results or "do you think this is a good idea" questions), who and when to interview, and a good series of steps.
Aguado's scripts are more for UX design interviews, but the steps, suggestions, and listed gotchas apply very much to our teams.
Innovating to Improve and Mature Your Shared Services Organization - Karen Hilton, Betsy Curry, Scott Madden Consulting
We might not love thinking about it this way, but our teams are shared services, centralizing certain kinds of expertise and/or equipment across departments, divisions, institutions, or even regions.
There's actually a lot written about shared services, and even when the shared services are very much non-research functions (finance, HR, IT), the basic story of we best serve our institutions very much applies: as does the path to maturity:
I see a lot of teams get stuck after that first phase. They happen, there's centralized and more efficient services. But the drive to keep getting better, to optimize and excel, to hold ourselves accountable to high and publiclly measured standards, and to use that excellence to find opportunites to tackle larger and more complicated tasks - that doesn't happen by default. We're not trained how to do it.
(Relatedly, a different article by the same company describes different “service delivery channels” - as advisor to leadership or divisions, as a transactional service center, or centers of expertise. I think too often our teams fall into the trap of being largely or entirely transactional service centers, which is valuable but only part of what we can be).
Research Software Development
Maintainer guides: spending, succession, & more - Sumana Harihareswara
Harihareswara has a great set of guides here from training she’s developed for maintainers of open source projects. It includes including guides on growing the contributor base and handling conflict.
Refactoring and Program Comprehension - Greg Wilson
Quickly improve code readability with Proximity Refactorings - Nicolas Carlo
Easily identify technical debt in any repository - Daniel Bartholomae
A trifecta of refactoring articles for this issue:
Wilson summarizes a recent paper with quantitative results on what kinds of refactoring genuinely result in improved readability:
- Refactorings can help readability directly or indirectly (like by making comments easier to apply to the code being read)
- Reorganizing source code to make more cohesive components positively effects readability
- Moving features between objects don't clearly have a net positive or negative impact
- Reorganizing data within classes doesn't have a clear impact on readibility
- Renaming code elements doesn't always help
- Moving logic along the hierarchy structure can help, but superclasses inevitably have a negative impact.
- Multiple refactoring operations involving renaming applied in sequence can noticably improve readibility.
The paper itself really looks interesting.
Meanwhile, Carlo suggests that refactorizations that can be of type 2 - reorgnizing code so relevant pieces are in closer proximit - is a simple and effective way to refactor code as you go along making other changes
Finally, Bartholomae makes two arguments:
- Technical debt and accidental complexity are basically the same thing, differing only in the history of how it happened
- Accidental complexity is suprisingly well correlated with typical seemingly-simplistic code complexity measures
And with that he offers a tool based on the new-to-me GitHub Blocks, which lets you build UIs on top of code bases. His block looks for regions of the code with high code complexity measures. Very cool!
Research Computing Systems
Use of accounting concepts to study research: return on investment in XSEDE, a US cyberinfrastructure service - Stewart et al, Scientometrics
In #132 we covered a similar article, “Metrics of financial effectiveness: Return On Investment in XSEDE, a national cyberinfrastructure coordination and support organization”. The latest article is a greatly expanded, 31-page treatment of that work, discussing what the costs would be compared to if the systems had been provided piecemeal without coordination and common stacks, etc and services like extended consulting, software optimization, and training had not been offered.
Using the largest dataset assembled for analysis of ROI for a cyberinfrastructure project, we found a Conservative Estimate of ROI of 1.87, and a Best Available Estimate of ROI of 3.24.
Random
In praise of awk (and some interesting technical notes about awk - e.g. it was designed, like bash, to not need any garbage collection - I did not know that about either sh or awk).
A ChatGPT client for MS-DOS, because we could.
An international conference on Pascal and Pascal-like languages, because they could.
Forwarding ssh-agent through websockets. They were so preoccupied with whether or not they could, they didn’t stop and think if they should.
Going deep into fsync.
Darklang (a hosted programming language for web services) is going all-in on LLM code generation. Whether you like Darklang and its direction or not, this is the first argument I’ve seen that a future with more AI code generation will mean that what matters for ergonomics in programming languages will change significantly.
Better defaults for GitHub actions.
That’s it…
And that’s it for another week. Let me know what you thought, or if you have anything you’d like to share about the newsletter or management. Just email me or reply to this newsletter if you get it in your inbox.
Have a great weekend, and good luck in the coming week with your research computing team,
Jonathan
About This Newsletter
Research computing - the intertwined streams of software development, systems, data management and analysis - is much more than technology. It’s teams, it’s communities, it’s product management - it’s people. It’s also one of the most important ways we can be supporting science, scholarship, and R&D today.
So research computing teams are too important to research to be managed poorly. But no one teaches us how to be effective managers and leaders in academia. We have an advantage, though - working in research collaborations have taught us the advanced management skills, but not the basics.
This newsletter focusses on providing new and experienced research computing and data managers the tools they need to be good managers without the stress, and to help their teams achieve great results and grow their careers.
Jobs Leading Research Computing Teams
This week’s new-listing highlights are below in the email edition; the full listing of 187 jobs is, as ever, available on the job board.
Director of High Performance Computing (HPC) - Drexel, Philadelphia PA USA
The University Research Computing Facility (URCF) is a centrally reporting core facility that provides access to high-performance computing hardware and software resources for research computing. The URCF occupies a 1,600 sq. ft., 0.6 MW, climate-controlled server room that hosts the NSF-Drexel funded Picotte shared HPC cluster, which consists of 4,224 compute cores and 48 Nvidia Tesla V100 GPUs along with a high-capacity storage system. The Director of High Performance Computing (HPC) is responsible for the design, installation, monitoring and maintenance of hardware, software and networking equipment for HPC systems in the URCF. The position reports to the Operations Director of Research Core Facilities (part of the Office of Research & Innovation). The Director of High-Performance Computing will also work closely with the Faculty Director of the URCF, who leads a Faculty Advisory Committee charged with helping the facility develop and meet its strategic, financial and operational goals. URCF financial administration is provided by the Office of Research & Innovation.
Director, Research Computing - Simon Fraser University, Burnaby BC CA
The Director, Research Computing is responsible for providing strategic, operational, and administrative leadership to the delivery of researcher focused services to meet the diverse needs of the university community. The Director oversees a large portfolio including large research computing facilities including storage facilities, country-wide collaborations and services, high-performance network design, and operations. The director is responsible for defining and implementing strategies focused on delivering researcher-focused services, while leading a team dedicated to providing outstanding researcher support across SFU and in partnership with other IT and Academic units. As a key member of the IT Services (ITS) senior leadership team, the role participates and contributes to the development of the ITS strategic plan and leads continuous improvement initiatives within the Research Computing portfolio. The Director also reports, in a dotted line relationship, to the Associate Vice-President Research and International to assure close alignment with the University research priorities.
Director of Platforms, Infrastructure and Data Services - MIT, Boston MA USA
DIRECTOR OF PLATFORMS, INFRASTRUCTURE AND DATA SERVICES, Office of Research Computing and Data (ORCD), to serve as the lead technical architect and administrator of the newly created ORCD Platforms group. Leadership duties include overseeing and implementing technical and architectural aspects of research computing by creating reliable and sustainable systems to meet the Institute’s wide-ranging research computing and data goals; architecting, deploying, monitoring, and supporting reliable state-of-the-art computer, network and storage systems, and technologies for delivering solutions for research computing and data services; and managing the underlying infrastructure, staff, and computing assets to deliver high levels of availability and reliability.
Research Computing Manager - Sunnybrook Health Sciences Center, Toronto ON CA
The Research Computing Manager will maintain core systems and software along with supporting the specialized needs of campus researchers and instructors via individual consultations and group presentations. This position will collaborate closely with the hospital IT workforce as well as SRI scientists, staff, students, and partners at other institutions to provide broad technology support for operational issues, research, and teaching. The position will be responsible for managing our deskside support team, system administration team and REDCap team.
Data and Analytics Manager - University of the West of Scotland, Paisley UK
The Data and Analytics Manager will lead a team of analytics and insight-focused staff, delivering business-critical analytics and reporting across all aspects of the University including, but not limited to, utilising internal and external data to forecast student demand, delivering persuasive analytics on student performance, informing research performance strategy and evidencing resource allocation models. Working closely with academic schools, the post holder will strive to understand customer data-needs, and work collaboratively with leaders of professional services across the University.
Oxford AI Research Group Project Manager - University of Oxford, Oxford UK
We are looking for an ambitious and entrepreneurial Project Manager to oversee the day-to-day running of the research group. The Project Manager will be responsible for the operations of the research group, will act as a force multiplier on the group’s impact, and will promote its global reputation. This post is an exciting opportunity for someone who is keen to support and collaboratively work with a fast paced and world-leading research group. You will be responsible for translating the vision of the group into an operational plan and executing the same. You will coordinate with multiple stakeholders across industry and academia to organise and manage the group's project portfolio which is composed of many projects (> £3m).
HPC Systems Engineer Lead - Northwestern University, Evanston IL USA
The Research Computing Infrastructure (RCI) team supports Northwestern’s High-Performance Computing (HPC) infrastructure, a suite of computational resources used by Northwestern researchers to make cutting-edge discoveries through computational research and data science. Northwestern’s HPC infrastructure includes Quest, an HPC system with more than 50,000 cores, and consists of physical servers, back-end storage, operating system, integrated networking, parallel filesystem, scheduler, cloud-based infrastructure, and other associated systems all supported by RCI. As Lead HPC Systems Engineer, you will actively support Northwestern’s HPC systems while leading the RCI team in best practices for developing, implementing, maintaining, and securing HPC cluster systems and solutions for computational research requirements, including AI/ML and data science. As a Lead member of the RCI team, you will work closely with your teammates and the Research Computing Services team to develop a long-term strategy for the evolving Northwestern research enterprise.
Principal Scientist (Discovery Bioscience) - Cancer Research UK, Cambridge UK
Biological exploration of novel potential oncology/immuno-oncology drug targets and biomarkers. Development and execution of cell-based assays to enable drug discovery projects. Expansion of the Bioscience group capabilities. Characterisation of small molecule and/or antibody for target validation
Manager, Clinical Data Scientist - Pfizer, Kirkland QC CA
As part of the Data Monitoring and Management group, an integral delivery unit within the Global Product Development (GPD) organization, the Clinical Data Scientist is responsible for timely and high quality data management deliverables supporting the Pfizer portfolio. The Clinical Data Scientist designs, develops, and maintains key data management deliverables used to collect, review, monitor, and ensure the integrity of clinical data, oversees application of standards, data review and query management, and is accountable for quality study data set release and consistency in asset/submission data.
Director of Project Management - R&D Data and Computational Science - Sanofi, Cambridge MA USA
You are an experienced program and portfolio leader that has worked in the pharma / biotech industry, managed software development projects and are passionate about creating machine learning enabled, data driven R&D solutions. You have a strong understanding of biology and an interest in using advanced computational methods to drive new insights from biological data. You have a passion for working on the development of drugs to improve peoples life. You are an excellent communicator, and can work with subject matter experts from our R&D business and connect them to our data scientists and software engineers so that we can to deliver professional grade solutions.
Engineering Manager, Data - BenchSci, Toronto ON or Remote CA
We are currently seeking an Engineering Manager to join our rapidly growing Data Team. Reporting to the Director of Engineering, Data, you will be responsible for planning, delivery, mentoring, and coaching of data engineers. In this impactful role, you will work closely with key stakeholders across the organization and be instrumental in cross-team priorities and management. The most successful candidates for this role will be experienced software engineers who have transitioned to leading individual engineers and delivering complex data engineering solutions.