License: CC BY
Version
- First internally used Dec 2023
- Published 17 Feb 2024 with minor revisions
Evaluating software engineers
Since I first formally managed other software engineers in 2017, I’ve struggled with the best way to perform their annual reviews. It’s one thing to develop the experience required to understand what makes a good engineer and a good team; it’s another matter entirely to articulate it within a cohesive framework that can withstand the scrutiny of a great many intelligent folks when their livelihood depends on its clarity and correctness. I believe it’s even more challenging in a startup or small company where role boundaries are squishier and more broadly scoped than in more mature organizations.
After many years of consideration and revisions, I’m publishing this snapshot of how I do evaluations today (see version history above). I’m doing this in the spirit of collaboration, in the belief that there is no one true way to evaluate software engineers and that I should contribute back my thoughts on the matter after having so richly benefited from the thoughts of others (which are enumerated below under the Attribution section).
Note I am only covering the manager’s review in this document. I believe self- and peer-evaluations are important parts of a complete review cycle, but I’m not ready to share those yet. Further, my rubric is derived from my experiences with web-based engineers. While I attempted to make the criteria as broadly useful as possible (e.g. I think it’s applicable to product engineers, test automation engineers, and data engineers) you may need to rethink the Technical -> Proficiency section in particular in a very different context.
I believe no scoring rubric should be presented to a team without important background information to contextualize the results: Levels, purpose, methodology, score calibration, review process, and who can access the review. Simply “throwing scores over the fence” at your team will lead to more issues than doing no evaluations at all. Evaluations need to be a thoughtful collaboration to be useful. The rest of this page is the necessary documentation to provide that.
Nothing about this introduction is meant to suggest I have presented all of my rationale or the meta structure of the scoring rubric. That’s an essay for another day.
Levels
At a small company, you don’t need a ton of engineering levels. I went with 4 for simplicity; I may add more as the need arises. I only have one level of management today.
- L1 — Associate Engineer
- An L1 or “junior” engineer is entry-level or typically only a couple years of experience. They typically have a very narrow focus to their scope of work to promote early expertise and provide confidence. They frequently work with L2+ engineers to learn additional context and skills.
- L2 — Software Engineer
- An L2 or “mid-level” engineer is typically defined by a primary domain, e.g. “Front-End”, and that context should frame the evaluation. They maintain a focus in their scope of work and prioritize personal growth.
- L3 — Senior Engineer
- The most important deliverable of a senior engineer is more senior engineers — they should assist and elevate their teammates. They are defined by their ability to work independently in multiple domains while valuing collaboration.
- L4 — Staff or Principal Engineer
- Staff Engineers typically fall into one of four categories:
- Technical Lead (guiding technical decisions embedded on a particular team)
- Architect (overseeing the success of a critical technical domain)
- Solver (sent to work on knotty problems until they’re solved)
- Right Hand (acts as technical representative/liaison for a specific executive)
- At a small company, a Principal role is often a combination of many or all these Staff Engineer archetypes and often applies to the individual contributor who has been working on the product longest provided they have the necessary skill level.
- Staff Engineers typically fall into one of four categories:
- Manager / Director
- Any manager should have at least had two successful years at the L3 level and should have dedicated leadership training as a prerequisite. I believe this track bears more work to define a full progression, but as I am focused on small teams today, that work is for another time. I wrote this draft of the criteria for myself at the director level.
Each role at each level has its own scoring rubric, however many items will overlap with other roles at the same level.
Purpose
Evaluations are a useful tool for providing more detailed performance feedback on a reasonable cadence. They ideally spark more insightful conversations and a better understanding of roles and how teams operate.
If you are surprised by anything in your evaluation, use 1:1s to focus on it and ask your manager for help to close that gap. Evaluations should formally dig deeper into what is mutually understood, not provide critical feedback for the first time or provide feedback incongruous with previous discussions.
Evaluations are not for:
- Directly calculating raises or bonuses.
- Correcting extremely deficient performance (lots of 1s or 2s).
- Quizzing your manager on how to achieve 5s.
- Attempting to improve every area at once.
Pick a few areas to focus on for the coming year. They may include weaknesses you feel you can remediate and/or strengths you can lean into.
Methodology
Evaluations are rubric-based and set on a 5-point scale. There are 3 areas:
- Technical
- Teamwork
- Alignment
Each area will have a number of sections each with enumerated criteria based on the specific role. For example, the Teamwork area includes these sections, each with multiple criteria:
- Communication
- Collaboration
- Process
- Feedback
- Culture
- Mentorship
Not all sections apply to every level. For instance, the Culture & Mentorship sections do not apply until L3 or higher.
The overall score will be the average of all relevant sections. Therefore, some areas may be weighted more than others. Higher levels have more sections, which change the weighting. Scores should be accompanied by comments that help explain the score given.
Calibration
Each sub-section in the job description is scored on a scale of 1-5 or N/A. Your overall evaluation is your average score, omitting any N/A items (they do not count as zero, they are dropped from the average entirely).
How to read a score
- 1 — Deficient. This needs fixing. You are always expected to remedy this with priority.
- 2 — Weakness. This is a good item to focus effort on but it’s OK if it takes a while for some.
- 3 — Success. It is not code for “mediocre”. It’s a big, green checkmark. Well done!
- 4 — Strength. Recognizes high achievement. This is broadly a solid goal that may require stretching.
- 5 — Exceptional. Used to highlight extraordinary effort. Only use as a goal in a few targeted areas.
- N/A — Manager didn’t have enough visibility or context to evaluate, or a situation to demonstrate this skill didn’t arise during the time period. Omitted from average.
An excellent annual review will have:
- No scores of 1
- Very few scores of 2
- Many scores of 3 and 4
- A few scores of 5
If you disagree with any score (including N/A) please note it directly on the evaluation with your rationale. You may attach additional documentation as needed.
Review
Everyone should receive their evaluation at least the day before a discussion with their manager, but preferably not more than 48 hours prior. This is to strike a balance between allowing time to reflect before a discussion but recognizing additional conversation is required to understand the full context.
Employee feedback should be noted directly on the form during discussion. Employees may request a followup discussion for additional feedback if they require more time to consider it.
Access
Evaluations are viewable by:
- The employee evaluated
- Employee’s manager and their reporting chain (up to the CEO).
- Head of HR (or contracted equivalent representative)
Employees may keep their own copy of their evaluations and share them at their discretion.
Attribution
I consulted many similar works over many years while creating this guide. Among these were:
- Rent The Runway’s Engineering Ladder, published by Camille Fournier
- Khan Academy Engineering Career Development, published by Ben Eater
- Patreon’s Engineering Levels, published by Valerie Lanard
- SkillMap.io Skill Tree, shared by David Morgantini
- Octopus People, published by Michael Noonan
I also benefited most significantly from these longer works:
- The Manager’s Path, by Camille Fournier
- Managing Humans and Being Geek, by Michael Lopp
- Staff Engineer by Will Larson
And from many essays, posts, and tweets, especially from Lara Hogan and Katie Womersley, whom I discovered via The Lead Dev conference chaired and hosted by Meri Williams. My revision and editing process was aided by collaboration with Eli Robbins.
For an exhaustive list of works I found influential, see my public bookmarks. I was also strongly influenced by leveling systems in both BSA Scouting and Zen Martial Arts.