MonthDecember 2015

Moving a team from Scala to Golang

Scala has long been part of the CrowdStrike stack, the primary language in fact. I helped lead the adoption of Scala as we first started to develop our applications back in 2012. In fact it was one of pros for my decision making process of wanting to come to CrowdStrike. Several of the early developers were interested in adopting it as well so it seemed to be a nice fit.

I had come from a company called Gravity, who were also heavy Scala users. It was the primary language there. I was used to it, enjoyed it, saw the power of it and thought I could prevent some of the issues I saw with Scala as CrowdStrike grew. We were doing high scale analytics, batch jobs over Hadoop and our Chief Architect (hi Bissel!) was doing lambda architecture before it was what the cool kids were doing.

A recent quote from one of our senior engineers prompted me to finally write this post describing why we’re transitioning most of our stack to Go and why new services default to Go by developer choice.

Instead of waiting until the end of the this post I should clarify that Scala will not be leaving our stack completely. In fact it will complement where Go does not shine. Scala is a big part of our machine learning / analytics stack. It’s interop with java projects we use, and its ability to provide a nice DSL that our analysts can use still make Scala a solid choice. It’s becoming more of a specialized tool vs the core development language.

I’m going to take you through this from the lens of a Technical Director. A lens where you need to scale a company from the early days of 5 engineers to 200+ engineers as the business grows. It’s about having a maintainable code base where you can have people cross projects easily and get new hires up to speed rapidly.

I remember when I first saw the potential issues of scaling Scala at Gravity back in 2009/10ish. It was close to the end of the day when we had a major issue reported in production that was affecting our large customers. Several of us started investigating and were able to track the source of the issue. The only problem was we had no idea what the code was doing at first. We came across a strange symbol we hadn’t seen in our projects before. The spaceship operator <|*|> . Someone said out loud “what the hell is that?”.  There was some other implicit magic going on that wasn’t immediately apparent. A CMD+B to traverse into the method yielded nothing in our IDE as it couldn’t find the symbol (IDE’s have improved here since). A quick googling of “<|*|>” yielded nothing as well. We were stumped and didn’t have sources pulled down. [1]

Screenshot 2015-11-21 11.20.00

The developer who wrote the code was unreachable on vacation so we had to figure it out. We noticed a new library was being included called scalaz, hours later we had tracked down the mystery symbol and grok’d what was going on, made the fix and life was good again. That blip turned a fix that should have taken minutes into a fix that took hours. That was the point I started seeing the split in our engineering team.

Scala is a powerful language, it comes from academic roots and gives enough flexibility that you can easily start writing “write-once” type code.  Scala developers typically travel down two paths: You have the “it’s a better java ” camp you have the “I (heart) Applicative Functors” camp.

Screenshot 2015-11-21 12.07.04

The “it’s a better java” camp like the terseness of Scala, and  the standard features that make Scala generally more enjoyable than Java. There’s functional programming in their new Scala code but it’s not the main focus. The “I (heart) Applicative Functors” camp really takes to their new functional world and begin to expand their knowledge deeper down that path or they bring their already functional backgrounds from places like Haskell.

Based on my experiences you start to split down these camps and have excellent programmers in each so you can’t say one side is superior to the other. On the semi-functional side you potentially have more generalists working across languages, or those who may not want to learn lambda calculus theories to work with an API server.

As an example, this is a code sample from a project we had one of our more advanced Scala developers start:

Some of you may look at that and say awesome, but some will say WTF is that? There were thousands more lines like above. This was to be something the whole team could work on but half the team didn’t want anything to do with it. The developer who wrote it is a brilliant person but the fact it divided half the team was a problem. Luckily this was caught in code review and rejected based on our internal guidelines. It never made it out to production.

As you’re scaling an engineering team this split becomes more apparent when trying to get new hires up to speed. Scala has a lot of rough edges already around getting a build environment, SBT pain, IDE environment pain , release upgrade pain, slow build times, add on top of that a heavy dose of functional concepts required for proficiency and the ramp up time grows and dev output slows down. Now there’s more upfront training from existing developers required which slows things down as well. SBT is also a real sore spot. There’s always that one person who actually knows what the heck SBT is doing and can debug everyone’s issues. I’m aware that Scala is not scalaz and they can be mutually exclusive, but I’ve seen issues with or without it.

It’s also not a question of Scala being “too hard”. I’ve never had someone not be able to learn the language. It’s about the investment. You make an investment in something with the hope of it paying off somehow down the line. Whether that’s faster time to market, higher performance/lower cost,  or increased stability. There are specialized places I’ve seen that happen with Scala but not in the general case. We discussed as we were about to scale up if we wanted to invest in more docs, guidelines, example projects, etc… but the reality was we didn’t think that investment would come back vs Go.

This isn’t unique to companies I’ve worked for. Twitter has gone through the same growing pains, as well as other companies I’ve talked to at conferences and people I know working with Scala. It’s a common theme, in fact. While you can have very high performing small teams going with Scala, trying to grow and engineering organization > 50 is an uphill battle. If you’re already invested in the JVM, Java8 is a solid choice that borrows some of the concepts of Scala to make Java easier to work with.

Other references from larger teams seeing issues:

Yammer   http://codahale.com/downloads/email-to-donald.txt

Is LinkedIn Moving off of Scala? https://www.quora.com/Is-LinkedIn-getting-rid-of-Scala

Former Twitter Platform VP heard saying Java8 may have been a better choice if available when they made the choice of Scala for similar reasons https://www.quora.com/Is-Twitter-getting-rid-of-Scala

You can also see a trend at Twitter where the latest OSS projects released are in Java (Heron, DistributedLog, etc..) In fact a telling line in the Hero release reads: “It is written in industry-standard languages (Java/C++/Python) for efficiency, maintainability, and easier community adoption.

Is Scala on it’s way out? (article):

https://www.linkedin.com/pulse/scala-way-out-owen-rubel

 

and a funny Tweet about the subject:

Screenshot 2016-02-17 07.52.50

That’s where Go enters the picture. One of Go’s reasons for existence is to make developers more productive, limit the number of ways you can do something and have a very opinionated view of the world at the compiler level. I pushed back on Go adoption internally for a while. I worried about splintering even more, having another language for people to learn as we already had quite a few technologies in play. We have an internal policy about looking at adopting new technologies when you have at least 3 people willing to support it at 3am if there’s a production issue. After much prod’ing (thanks Sean Berry) and getting past that 3 number, I dug in to the Go world and saw it solved a lot of issues I had with Scala at the organizational scaling level.

Fast build times, small binaries, one file, built in formatting, great tooling, built in test framework, race detector, visual profilers, a nice concurrency model? Wow, sold! We did a sample project in Go that was successful, then another, then another, expanded out the number of developers we had on Go and it started to become the language of choice people wanted to write in. You can jump into any Go project and know immediately what it’s doing. Do I miss immutable types and some of the great features of Scala? Sure do, but I think the maintainability side of the story is too great to overlook with Go. We’ve seen faster ship times, better stability and better test coverage being written.

One of the other benefits of Go was widening the pool of backgrounds we can hire. We can take someone from any language background and have them ramped up on Go in weeks. With the Scala side there’s the JVM learning curve, the Java world of containers, black magic of JVM tuning, profiling tools,  etc…

New developers we’ve hired are ramped up in weeks vs months (we have lots of services that operate at extremely high scale, across several divisions).  We now have the majority of our services written in Go and one of the last holdouts to move to Go just wrote his first project and said to me afterward “Wow, I read through that library once and I knew exactly what it was doing , I’ve read the Scala version of that library four times and I still have no idea what it does, I can see why you guys like it so much”. That was one of our senior engineers who’s previously worked for one of the largest web properties in the world. This process was a complete bottoms up initiative from our development team who pushed for the move to Go.

We now process hundreds of thousands of messages per second and Terabytes of data per day with our GoLang services. Some try to equate Go’s simplicity with weakness, I’ve seen the opposite. You can do some pretty powerful stuff in Go. There is power in simplicity. The error handling, while seemingly annoying at first, actually has lead to more robust error handling and stability in application code. You can’t just throw something and hope it gets caught somewhere.

Google is also a major investor and having that technical alignment opens up access to additional resources that we can leverage.

I’m not here to bash Scala, or ScalaZ (I love ValidationNel! ) but more to give some real world context of Scala in a production environment over 7 years with two companies. I still use Scala and love hacking in Scalding.  Some of our more ambitious projects coming up will most likely be Scala based but I’m just not as sold on it anymore as the core language when trying to scale a fast growing engineering team. There are always exceptions and if your team really loves Scala you can make it work, and some companies are.

Go sits in a place that happens often… when you need small’ish, high performance services. Where you’re doing light transformations, shuffling data around and putting APIs in front of data or supporting systems.

Go just makes it too easy.

 

If you’re interested in hearing more or chatting, follow me on Twitter https://twitter.com/jimplush

…and we’re hiring 🙂

 

Updates:

** Marius, one of the twitter Scala gurus, also seems to be getting keen on Go.

screenshot-2016-09-24-08-58-56 screenshot-2016-09-24-08-58-42

 

 

68747470733a2f2f7261772e6769746875622e636f6d2f676f6c616e672d73616d706c65732f676f706865722d766563746f722f6d61737465722f676f706865722e706e67

——

[1] Code Reviews would have solved this issue as someone else would have stumbled on that symbol and asked for more clarity and understanding. That would have help introduced the library to the team easier than a surprise. Unfortunately, at Gravity we didn’t have required code reviews in place. We do have code reviews in place at CrowdStrike. However, given not every person is on every code review there’s no guarantees.

 

 

 

Path to Productivity: No Meeting Thursday

At small startups there’s usually very few meetings initially and the number of people you have to communicate with on a daily basis is typically quite manageable. You feel productive and you’re cranking out code, designs or other tangible deliverables. Life is good! Your social graph at work looks similar to the graph below.

small-social-graph

Hopefully, your company is successful and you start hiring. As you hire, your social graph at work also grows. Perhaps not on every hire but as the company expands the number of people you need to interact with will go up as well. As you get even bigger your social graph looks like the new graph below. Now things aren’t as fast. You need more co-ordination, more people have to understand what’s going on. You start to have more meetings, and if you’re distributed, potentially even more.

large-social-graph

 

You start to become sad that you’re not as productive as you once were but understand it’s part of growing. Then you get to a point where you feel like you’re in too many meetings, they’re scattered throughout the day, you’re working in bits and pieces but can’t really tackle anything of substance. You need a long stretch of time for understanding a problem, building solutions and algorithms in your head. You find yourself having to work after-hours or early in the day just to be able to chew on meaty projects and feel somewhat productive again. If you’re in a big company this process can happen even faster.

More than half the engineers I interview cite “too many meetings” and “not feeling productive” as one of the key drivers as to why they’re looking to leave their current company.

After hearing this at my previous company during interviews I wanted to instill a guard against that early on at CrowdStrike. Very early on we put in place the concept of “No Meeting Thursday”.  Meetings were not a huge problem early in our life, our social graph was small. However, I knew as we grew and expanded there would be more interrupts throughout the day for engineers. I also know if you bake things like that into the culture early on, it’s easier to fight to keep it, than to fight to instill it.

So with that, I put a calendar invite on everyone’s calendar that blocked off the whole day for No Meeting Thursday. The goal of No Meeting Thursday is not to have a “don’t talk to me” day. It’s about not having anything scheduled or re-occurring to give engineers (and other groups) the chance to own their own schedule. You’re an adult, do the right thing.  If they need to be alone for 8 or more hours to really dig into a problem, great that’s what the day is for. If someone is working on a team and that team wants a whole day of uninterrupted time to focus on design, testing or anything else related to their project and everyone is on board then great, that’s what the day is for. Obviously, production issues still come first.

As new executives come in there is a training period of “What’s this No Meeting Thursday thing about and why did you deny my meeting invite.”  I wrote this post to reduce the number of times I have to explain the concept so here it is….

Managers and Engineers are usually on different schedules. Paul Graham famously calls it “Manager vs Maker schedule“.  Managers are used to being interrupted and work on broken chunks of the day. They work in blocks of time and usually have calendars that will make you cry. Managers can go from meeting to meeting with minimal context switching. Engineers on the other hand work in long stretches and need hours to get into the zone where they’ve built a problem in their head enough to the point then can work directly out of mental RAM. That’s where the productivity sweet spot is.

I like to think of meetings this way for engineers… You’re an engineer, you’re building  a mental chandelier in a big, tall entryway. You have a high ladder where you climbed to the top and you’re starting to put together your big, beautiful chandelier in your head. You added the base, you start adding crystals, things are going well. You’ve got a great mental model of your problem and some corner cases.

chandelier-ladder

 

Then a calendar reminder pops up, reminding you that you have a meeting in the middle of the day. You head to the meeting to talk about something completely off topic and Boom!!, Crash!! Your mental chandelier starts to wobble and fall, crashing on the floor. Not all the crystals are busted but you definitely are starting to swap out that mental RAM for this new topic in the meeting. Meanwhile you’re actually still thinking about where you left off in  your other task so you’re only somewhat paying attention to this new topic. So you’re losing context on your previous project and this new subject is getting half your attention.

chandelier-ladder-broken

The meeting ends and you have to clean up to start rebuilding your chandelier. Hrm, where was I?  You may get back to your desk and you think you’re already interrupted so why not just check email real quick. You read an email and start to think about that and now you’re two levels removed from your actual project. Ok time to refocus… This could take anywhere from minutes to hours. If it’s towards the end of the day you might just say forget, you’ll start again tomorrow and work on little tasks the remainder of the day. I tried to find the source for where I read that chandelier analogy before but couldn’t find it. If someone comes across the source feel free to drop a comment so I can attribute it properly.

As someone who’s been in both the management and and engineering side I can validate for myself and many others I’ve talked to this is the way it goes. When I’m in a managers schedule I know I’ll be interrupted all day so things involving deep thought are left for early in the morning, or after office hours. In my opinion a good manager needs to understand the affect random meetings have on engineers and the time wasted going in and out of context.

The phrase of the day nowadays is “more productivity”. To put it as bluntly as possible. Would you want to work at a company that couldn’t even give you 8 hours of uninterrupted work time? How can someone be productive if you never give them long stretches of time? It sounds silly when you say it out loud but that’s what a lot of companies do by default. With so many connections people could be interrupting each other left and right.

Scheduling random meetings in the afternoons? The worst. If you are going to schedule meetings for engineering teams, keep them in blocks. If I’m already interrupted at least keep it going so I can get a long stretch of time in the afternoon or in the morning. Good managers keep tabs on meeting workload and fight to reduce it. They take meetings for the team and distribute information asynchronously or ask that meetings be changed to discussions in a chat tool, or via a ticket as done in the open source community.

I’ve heard of other companies doing no meeting afternoons, however, for distributed teams in many time zones that makes it hard to establish when that time will be. Having a full day evens it out more across the globe.

Meetings can be fruitful but can also be highly disruptive. For engineering teams please be conscious of the effect meetings are having. Because if you’re not, you may find yourself having to hire replacements for a situation that could have been avoided.

We’re 3 years into No Meeting Thursday. I put out an email to the team recently on if we should preserve NMT and if people thought it was still effective. The responses were an overwhelming “please keep NMT, fight for it and let me work”.

 

If a culture like that interests you, ping me… we’re hiring 🙂

or follow me on Twitter: https://twitter.com/jimplush

 

thanks for the reviews from:

John Kurkowski

Roger Clermont

© 2017 Jim Plush: Blog

Theme by Anders NorenUp ↑