Categorymanagement

Bonus AntiPatterns – How to set the wrong incentives

Entering my second decade in software development and moving up in various leadership roles, I’ve been exposed to many different incentive plans to help “boost morale and increase performance”.  In my experiences, people were trying to do the right thing and give people more of the carrots and less of the sticks. However, most times you wind up establishing the wrong incentives that could in fact backfire so greatly it changes the cultural fabric of your company. Below I outline what I’ve found to be anti-patterns in bonus structures for engineering teams and what a better approach could look like.

To make sure we level set on what I’m talking about…  This article is referring to any compensation package that may involve a bonus structure whereby meeting certain criteria you get additional money paid out to you.

These are based on my experiences, feedback from teams I’ve run and my own personal beliefs on the subject. These are based on personal observations that any devised system will ultimately be gamed. This also applies more towards engineering teams since sales team goals are a different beast.

Anti-Pattern #1

“Your bonus is based on hitting goals you commit to”

This one is pretty common in “agile” type shops where you’re committing to hitting targets or OKRs (objectives and key results) every 2-4 weeks. While good, in theory, it goes against everything it means to be agile unless when a change in the landscape occurs your bonus targets for that period also change. For example: “I promised Project A but if I spend a couple days helping our sales person get around a bug, they could land a huge deal for the company. If I help our sales person I’ll get a pat on the back, but my team will miss our bonus. I’ll help sales later.”

Perceived Incentive: “Pad as much as possible to not have a chance of missing the date, commit to as little as possible and make your boss challenge everything you say, Avoid helping across teams”

Result:  A culture of mediocrity due to no one pushing hard to meet deadlines because they’re padded so much and everything takes 50-80% longer to get delivered. If I think I might miss my dates I ship anyway even if I know it will fail in production. Hey, I shipped! This anti-pattern promotes silos and limits cross-team communication.

Anti-Pattern #2

“Your bonus is partly based on hitting cost cutting or COGS metrics” 

This one is based on keeping costs down. COGS = Cost of Goods Sold. Otherwise known as cost control measures. In a fast growing tech company, this might be the most dangerous one. There are so many factors at play that would require a highly detailed model of the world. For example, if you said costs can’t grow past 20% this quarter, but we get a sudden boon of customers, or we have to scale for a big sales event or promotion, how do you tie that back to the original commitments? It’s highly tedious and error prone. Other examples would be not developing a feature unless it has a certain expected dollar amount of return. The analysis of that alone will cripple you.

Incentive: “I will take as little risk as possible, I will overload as many servers as I can and not run anything over capacity and will accept much high latencies to protect our bonuses. I will not try out new ideas because I don’t want my teammates to be punished if I have to spend money temporarily to do that until next year.”

Result: You have a culture that incentivizes complacency and dissuades innovation. New features will never get approved because COGS have to get adjusted every QTR to account for new projects. Those adjustments will be wildly inaccurate and you now need a COGS review board for new initiatives. Don’t let finance dictate growth in a growth phase. COGS come into play most effectively when you have a solid business in the exploit phase.

Anti-Pattern #3

“Your bonus is partly based on production status”

This one is based on the idea that service uptime, the number of production bugs found or other service level metrics tie into your bonus structure. While at a company level this makes sense at the employee level it falls apart pretty fast. Many things are out of your control, such as someone cutting your data center’s primary and redundant fiber pipes while doing routine maintenance with a shovel  (it happens!). Did I do a bad job or were other incentives in place for to not consider a more fault tolerant architecture like COGS?

Incentive: “Never push code unless mandated by your boss, do not take risks, pad your estimates another 200% to factor in longer times in testing and QA cycles, push off acceptance to some other group to get blamed for uptime if possible”

Result: You definitely do not have a ship-it culture, you have a culture of fear of change. Deployments slow down, new projects stagnate, no one wants the responsibility to touch production.

Let’s tie all those anti-patterns together with the data center pipes being cut example from above and you can see how these start to de-incentivize employees.

Yes, we had a production outage so my bonus will get dinged because I didn’t hit our uptime criteria. I was hitting your cost-cutting goals by not running multiple data centers and I didn’t push out the fallback code because I was too scared to touch production and potentially cause issues and lose that part of my bonus. I did notice I could have moved a backup server to the other datacenter but I would have missed our deadline for Project A and we would have missed our OKR bonus.”

So now you can’t win because you set up a system where we will fail at some tier.

Whenever a bonus structure is proposed you have to look at what the actual incentives  are. As many have learned from the book “Freakonomics”, decisions can have odd repercussions that the original authors did not intend, “The Cobra effect “

Some of the better approaches I’ve seen that didn’t have the incentive downsides:

Profit sharing / sales-based bonuses

When the company does well everyone does well. It can get everyone moving in the same direction to help attract and retain customers by getting everyone caring about success. Some have mixed feelings on that one but I’ve seen it work more effectively. It’s not about creating one-off features just to land a deal,  it’s about helping the sales team feel like engineering is a partner and everyone is rowing in the same direction.

Put it into salary

When recruiting is all said and done and it’s time to make an offer the #1 factor is usually base salary. A bonus is seen as just that, something that may or may not be paid out based on some criteria. If you want to increase your chances of winning against competing offers put that bonus into base salary.

Thought based bonuses.

If this is all too complex for you, perhaps get rid of bonuses altogether, raise salaries and offer more spontaneous rewards. The greatest bonus I ever received was after a successful launch of a project after working a good deal of hours on it. My boss came to me and said: “Take 3 nights at the hotel of your choice with the family and we’ll cover everything.” Now this wasn’t the biggest monetary bonus I had ever gotten but it stuck with me the most. There was actual thought into how that launch must have affected my family life and how I needed some face to face with them again. He could have just given me a check and be done with it, but he put some thought into it. Even if it was a little thought, it was something. (thanks, JB!)

Another nice bonus is autonomy. If someone has been a consistent performer give them a chunk of time where they get to work on whatever project they feel interests them most. Do they want to improve a query? Refactor something? Hackathon up a new product? Go for it, time is your bonus.

I’ve also seen gestures like letting employees expense nights out with their spouses, babysitting included if they have kids, blocks of days off, gift cards for going above and beyond, and other various perks. Did I do a great job at something? Send a note to the CTO for a hive five.  At the end of the day it’s about recognition.

Bottom Line

Your top performers would be your top performers regardless of bonus structure. Top performers are internally incentivized. Increase their pay so they don’t have to worry about money issues and focus on retaining them.

You need to incentivize risk and stretch goals. No one will stretch if they are hit with a stick if they don’t make it a full 100% of their commits, 100% of the time. When you look at those incentives there are no upsides for stretching and taking a risk. Why make stretch goals when if you don’t hit that 90-100% goal you only get a stick?

You’re telling people to set mediocre goals because there’s only downside to taking risks. What about those who don’t strive to be bold and innovate, no matter what system you set up? Don’t fear letting people go. If you have a consistent low performer it’s cheaper for the company to let them go rather then keep shifting them around to other projects. The larger the company the more time you can invest in improving someone but small companies don’t have that luxury and each headcount needs to have a significant impact.

As John Doerr who brought the OKR model to Google says:

“Don’t tie the OKR goals to bonus payments, except for sales quotas. We want to build a bold, risk-taking culture.”

Some references that also cover this topic:

Joel On Software – Fog Creek Compensation

http://www.joelonsoftware.com/articles/fog0000000038.html

John Doerr on OKRs

http://blog.betterworks.com/keys-okr-success-qa-john-doerr/

The Surprising Truth on What Motivates Us (only 10 mins)

 

Ted Talk  – Dan Pink The puzzle of motivation

Your development environment is your culture

As Director of Engineering my job is to help build and facilitate the company culture for the engineering team. Culture is where you spend your days and your efforts. Culture cannot be confused with perks. Perks are free drinks and stand up desks. Culture is ultimately what makes top performers leave or stay at companies.

You can usually assess a team’s culture starting right at the development environment. How much time have they invested in automation? How long do builds stay broken? Does anyone care when a build is broken? Do they have test automation running? Have they centralized logging? How do they know a new code push doesn’t introduce a regression? Can someone checkout a project, follow the README and be up and running or do they have to chase down someone to help troubleshoot? How can we quickly can we safely get from laptop commit to production? What are all the steps involved there, are they automated?

These are all things that add friction to the one thing engineers really want to do… ship product. The more you separate and engineer from production the worse your engineering culture tends to be and becomes a culture of bare minimum, throw it over the wall mentalities. That statement is based on my experiences over nearly 20 years in software, across startups and enterprises of various size and scale (so your mileage will vary).

Engineers want to be productive and not be held back by issues that pop up day to day. Ultimately, it’s management’s job to hold the team to a higher standard, remove roadblocks and enforce behaviors until they become part of the culture where engineers can self-regulate. You want new hires to come in and just say “ok so this is how this place works and everyone is on the same page, good”. You don’t want mixed messages for people coming in, as they will settle into their patterns within the first month.

To me it’s analogous to a chef vs a line cook. A chef keeps a clean kitchen, food properly stored and prepared. The chef knows the food cost, knows inventory, knows how the kitchen runs and what everyone does. The chef treats the kitchen (their dev environment) as a pristine place where the end product begins it’s journey. You wouldn’t expect a 3 star Michelin restaurant to serve meals from a kitchen with dirty floors and grease all over the walls. Line cooks on the other hand are usually implementers. During work hours they take tickets, make some food hopefully to spec and go home. In a startup you need chefs. You want a team who wants to remove roadblocks that slow down getting ideas to production. You want a team of chefs who want to know when things are broken and know where to go to fix them. You want a team of chefs who treat their development environment with respect and their teammates time with respect by keeping it operational. An engineer with a chef mentality will work a Saturday to automate something that will save their teammates 10 minutes of friction.

So what are some things you can do on your team to start down the path of excellence? As an overall goal start with the statement:
“Aim to minimize as much friction as possible that prevents developers from coding as efficiently as possible”

Concrete actions:

Mission Statement

Create a mission statement that documents the expectations of the team and the standards and hold the team accountable for them. What are the 5-10 things you value most as a team? Give it to new hires to read so they understand what your expectations are as it relates to the development life cycle and hold them to it.

Sustaining Engineer role that rotates

It is important in large, complex systems that as many people as possible know how all the cogs fit together. Where do I find error logs? Who owns what component? How do I monitor all the servers from a central place? A test just failed how do I know who recently pushed code? Why does that thing talk to this thing?
These are very time consuming items if you’re spreading that over the entire team’s capacity every time an issue comes up. Create a rotating sustaining engineer role to run point on broken builds, failed tests, daily log/error reports, running releases, troubleshoot, first pass on production issues. I’ve seen this become extremely valuable getting the team’s knowledge of how the overall system works, the number of components we have, reading logs, troubleshooting, etc… We have built internal tools to provide this information from a central place for easy diagnostics.

Source Control Standards

As your team grows you’ll want to make sure you have some standard for commits. If you use a ticketing system, would it help if you put the ticket number as the first item in the commit so you can write a tool to auto generate release notes? Do you accept commits like “had to change this thing”. Source control is your window into the past so you’ll want some thought there to make sure people understand WHY something changed. It’s easy to see the change, but the intent is usually the critical part. It also becomes more important as you need to start sharing release notes with customers as well as auditing.

 Repeatable Builds

Are you able to go back in time and recreate your cluster based on a known deployments? In large scale systems weird bugs come up, what works with one version combination of software, fails with a minor revision of the next. Are you able to take component A at version 1.2.3, component B at version 3.1.0 and component C at 1.0.1, load it up on a test cluster and troubleshoot outside of the production environment? You should aim to be able to recreate an environment that you deployed a month ago.

Error Logging

What good are logs if no one looks at them? Early in your cycle define standards for error logging, get central logging into place for your various environments and create reports and alerts to find abnormalities or error conditions. Everyone looks at their logs as they develop and release a feature but what about 2 months later? Who’s looking at the logs? Probably no one so make sure to set up alerting early on. PagerDuty is an ideal application for this to make sure there is a point person assigned to triage when issues come up.

 

Code Reviews

This will definitely be one of the better mandates for the team. Getting shared knowledge, having experts in a language shaping developers who may be new to a language. Making sure features and concepts are spread among the team. Many bugs and issues have been caught at this phase. It also gives you confidence your ideas have been vetted.

 

Integration/Smoke tests

Unit tests cover specific use cases and use mocks/stubs for most interactions. Most systems have complex interactions outside of themselves so it’s critical to write full stack Integration tests that exercise even the simplest cases of send raw data in, get back nice processed data from APIs, if everything passes and no errors are scraped from the logs, assume processing succeeded. Start with broad strokes and work your way in. With limited resources try and get the most test coverage for your time invested. Invest early on in API monitoring checks that run in production repeatedly if your APIs are private.
Continuous deployment and testing:
One of your goals should be to find issues as soon as possible in your development pipeline. The sooner you find a bug, the cheaper it is to fix it. To that effect I’ll walk through what happens when you check in a line of code in our team’s repository.

  • code is checked in with an issue number in the beginning to support automated release note generation
  • checkin kicks off a build server plan that builds the project and runs the unit tests
  • assuming local tests pass a process is kicked off that then builds the code into deployable packages and uploaded to a package repo
  • The role those changes affected is determined and then the running servers in dev with that role are identified. This kicks off a rolling restart in dev that installs the new packages
  • Once the servers have been restarted automated smoke tests run through the environment that test a number of known scenarios to protect against regression issues.
  • If tests fail during any part of that process alerts are emailed out and the SE runs point on getting it fixed.

Metrics On Everything

This is pretty standard but make sure your application has proper instrumentation that’s logged. For example Code Hales metrics library stream to statsd for historical trending. Make sure you have metrics on server health. Provide alerts and dashboards that summarize the important information. Try to put counters on everything that runs and have JSON API endpoints you can extract the counters for alerting and graphical UI analysis.

 Internal tooling

Automate all the things! If you find yourself doing the same things daily or weekly or you have to do 10 steps to get code to your dev environment, spend the time to automate it. Heavy investments in automation and repeatability continue to pay off over time.

Conclusion

As people on my team can attest… failing tests, broken builds, unstable dev environments are my hot button topics. It is because I see that as a window into how you think about production. If you can’t keep your own house in order then I lose faith you have the ability to make the best decisions for production environments. Be a chef.

want more? follow me on Twitter http://twitter.com/jimplush

© 2017 Jim Plush: Blog

Theme by Anders NorenUp ↑