Saturday, March 05, 2016

Programming and the art of shoe polishing

One of the most most undocumented occupational hazards of being a software engineer is that the rest of the world is not privy to our thought process, nor do they have a deeper understanding of what we do. Yes, coding is a part of it, and a very big part of it, but, it is understanding of the system that separates good engineers from the bad ones. Coding is easy, coding a system is hard. There is a whole lot of difference between learning english and writing a novel. Just as an exercise, let's imagine that the task at hand is polishing shoes. Our protagonist is the engineer(SE). The product manager(PM), the quality assurance(QA) engineer & shoe polisher(SP) have the supporting roles in this play.

Bad Engineer's Life
PM: I want our company to polish shoes. We will provide supplies to our shoe polisher team who will polish the shoes.
SE: I'll get on it.

[CODE]
- Distribute shoes to each SP(shoe polisher).
- Distribute polish to each SP.
- Ask each SP to use the given polish on each shoe.
- Take shoes from them.


QA: Hey! Your team used brown polish on black shoes. I'm filing a bug.
SE: Oops. I didn't know that. Let me fix it.

[CODE]
- Distribute shoes to each SP(shoe polisher).
- Each shoe has a COLOR.
- Distribute polish to each SP. Each polish has a COLOR.
- Ask each SP to use the given polish(COLOR) on each shoe(COLOR).
- Take shoes from them.


QA: Hey! Your team put black leather polish on my black canvas shoes. I'm filing a bug.
SE: Oops. Hey, why would the customer even want to polish canvas shoes?! Please test my system  properly.
QA: I don't care. You should not ruin your customer's shoes.
SE: Let me talk to PM.

PM: You should not ruin your customer's shoes.
SE: Ok.

[CODE]
- Distribute shoes to each SP(shoe polisher).
- If shoe is CANVAS then return to the customer.
- Each shoe has a COLOR.
- Distribute polish to each SP.
- Each polish has a COLOR.
- Ask each SP to use the given polish(COLOR) on each shoe(COLOR).
- Take shoes from them.


QA: Hey! SP still polishes my hip hemp shoes from SF. And why should the customer wait for shoe distribution to get his canvas shoe back? I'm filing a bug. Hey PM, the code is unstable!!
PM: Hmmm :(
SE: Let me fix it.

[CODE]
- Get shoes from Customer.
- Return shoe if NOT leather shoes.
- Distribute shoes to each SP(shoe polisher).
- Each shoe has a COLOR.
- Distribute polish to each SP.
- Each polish has a COLOR.
- Ask each SP to use the given polish(COLOR) on each shoe(COLOR).
- Take shoes from them.


QA: After 1 work day, the shoes don't look polished. Something is not working.
SE: I don't know, it works for me. Can you try it again and let me know?
-- After one day --
QA: See.. I told you so. I'm filing a bug.
SE: Oops.. the polish got over after a day. Let me add more polish each day.

[CODE]
- Get shoes from Customer.
- Return shoe if NOT leather shoes.
- Distribute shoes to each SP(shoe polisher).
- Each shoe has a COLOR.
- Distribute polish to each SP.
- Each polish has a COLOR.
- Ask each SP to use the given polish(COLOR) on each shoe(COLOR).
- If SP is working more than a day, then distribute more supplies to each SP.
- Take shoes from them.


QA: The product passes the test cases.
PM: We can sell this code to SUPER-MEGA-STORE then. Let us do some scale testing. They order around 100 shoe polishes a day! This could be a big order for us. I feel like Steve Jobs.
QA: Sure, whatever, I'll test out the cases.

QA: PM, we polish 100 shoes in a week! We need to handle that many in a day!
PM: SE.. make it so!
SE: I'll try. We can handle around 15 shoes in a day, we need to get 7 times as many SPs, maybe 6 times if we get stronger and faster SPs.
PM: Hmmm... We'll have to pass the cost to SUPER-MEGA-STORE

SUPER-MEGA-STORE: We like the fonts on sign board outside your establishment, take our money                                           and give us the product please.
--- After Deployment ---

SUPER-MEGA-STORE: Hey, nobody wears black leather shoes in our Sunnyvale store. We have too many unused black polish cans in our stores right now.
SE: I'll fix it.
...
SUPER-MEGA-STORE: Hey, one of the SP died while working. The shoes are now piling on!
SE: I'll fix it.

[CODE]
- Get shoes from Customer.
- Return shoe if NOT leather shoes.
- Distribute shoes to each SP(shoe polisher).
- Each shoe has a COLOR.
- Distribute polish to each SP.
- Each polish has a COLOR.
- Ask each SP to use the given polish(COLOR) on each shoe(COLOR).
- If SP is working more than a day.
- If polish(COLOR) is over, replace it.
- If SP is dead, get new SP.
- Take shoes from them.


SUPER-MEGA-STORE: Hey, we want canvas shoes to be washed too.
SE quits.

This happens over the period of a year. SE's own unscalable and spaghetti code have tormented him. The fluidity of requirements from the customer adds fuel to the fire. When one SE quits, other SE takes his place, mutating this code even more into an unrecognizable mess.

Now let's see what happens with a more experienced engineer:

Good Engineer's Life
PM: I want our company to polish shoes. We will provide supplies to our shoe polisher team who will polish the shoes.
SE: I'll get on it. What kind of shoes though?
PM: Leather shoes.
SE: Any specific set of colors? Google lists some common colors as BLACK, BROWN, NEUTRAL, WHITE, CHERRY RED & CORDOVAN.
PM: Hmmm..
SE: Also what about speciality leather shoes like those make with crocodile leather?
PM: Market research shows that people mostly wear BLACK and BROWN shoes, so let's stick with that.
SE: Any particular brand of polish you want me to use.
PM: Use the CODEYMAN brand. We can get bulk discount from them.
-- A week later --
SE: Hey I found out that one box of polish can polish about 250 shoes. I also spoke to my SP friend. He can shine 10 shoes in a day. Do you know about how many shoes we'll be shining a day?
PM: You ask too many questions, I somewhat hate you. SUPER-MEGA-STORE might be interested, but that is just a rumor I heard. But you haven't yet started writing code.
SE: So how many shoe shines a day?
PM: About 100 maybe?
SE: So we need about 10 SPs working each day. We can have 20 SPs who work on two rotating shifts. This would reduce the downtime if something were to happen to them.
PM: Stop doing a PhD in this! I want the code.
SE: Also 100 shoes a day is 700 shoes a week, which means 3 shoe cans would be used in a week. We need to keep more inventory of cans, to keep SPs from waiting on a polish can if someone else is using it. You can use this to set a price point for the product.
PM: I hate you! Give me an ETA.
SE: I'll sketch some informal design plans and revert in 2-3 days.

.. Time goes on ..
SE: PM, I'm going to be code complete in some time. I want the QA to add test cases.
PM: So the product is done?
SE: No, I've just written the code and did some unit testing. System testing etc will take at least a  month.
PM: So the product is done? I'll tell the Sales team to sell the product.
QA: I'll start testing it as soon as I'm done testing some other thing the SE doesn't care about.

SE: Here's the code

[CODE]
- Don't allow customers without BLACK or BROWN leather shoes.
- Let there be X BLACK polish cans and Y BROWN polish cans.
- Every 12 hours:
    a. New set of 10 SPs come in.
       - If less than 10 SPs come in. Pay the exiting employees overtime to pick up the slack.
b. SPs distribute the shoes amongst themselves, trying to maximize the amount of shoes of a particular COLOR.
c. Each SP takes the cans of polish corresponding to the shoe colors he has. He decrements the number of shoe polish cans available by 1.
- If the number of shoe polish hits the threshold T, he puts an order into the system to get more polish. This way they don't have to wait for polish if it runs out.
d. If SP runs out of shoes to polish:
- He'll put the polished shoes on the rack.
- Hel'll repeat steps from b. again.
e. When SPs are ready to leave:
- If less than 10 SPs leave then:
  - Find if SP is dead and put a replacement SP for the next rotated shift.
  - If SP is hiding, then ask him to leave.
- Take shoes from them.


PM: (During review): Please use spaces instead of tabs and use the function names according to  coding standards.

QA: Hey, edge condition issues. If I'm getting 100 shoes/day and one SP dies, the backlog never goes away.
SE: Oh yeah. Let's add a fudge factor of 11 SPs. Will take care in release.

PM: How is the quality?
QA: Seems to mostly work. Not many open issues I think.
PM: Great, let's deploy it.

SUPER-MEGA-STORE: Hey, your system doesn't handle bursts of more than 110 shoe-shines. We need a burst of 200 during Christmas holidays.
PM: Sure. We can include redundancy and burstiness of shoe-shine traffic in your tarrif. This would translate to hiring seasonal workers.

SUPER-MEGA-STORE: I need to get canvas shoes washed too.
PM: That feature will be supported in the next release.

SE: Shouldn't be a big deal. I'll design similar product for canvas-shoes and encapsulate both the products behind an api that is multiplexed by shoe type.
PM: Whatever.. just do it.

SE goes home.
SE: Honey.. I'm home.
Wife: Whatever! you spend all your time at work. I hate you.

.. SE is sad.

Friday, September 11, 2015

Mosh: The ssh client replacement you probably didn't think you needed.

I had to take a breather after writing the heading to avoid giving an over the top introduction to Mosh (mobile shell), but it deserves nothing less. It is a relatively new software (2012) from the well known MIT AI labs. Mosh is based on a new protocol SSP (State Sync Protocol), that runs over UDP instead of TCP. It does all the session management at the higher levels, which makes the connection incredibly robust. Few places where Mosh really shines on my setup (Mosh+Tmux):

1. When moving between networks (Wireless to Wired to VPN at home), the connection automatically migrates to the new IP. This wouldn't have been possible if TCP had been used. Consequently, my shell connectivity to servers stay up... No more banging the keyboard when ssh session abruptly freezes, or me trying to clear up a ssh terminal after the computer resumes from sleep. The terminal comes back up at the same place no matter what the network situation is. Its Magic!! I'm running tmux within mosh and the difference between local and remote machines has completely blurred.

2. Mosh has predictive typing! When typing on a ssh client, your keystrokes are relayed to the server, and displayed to you when it appears on the pseudo terminal at the remote end. This works great when you are working on a network with low latency, but try working from a slower connection from home, or god forbid, the caltrain, a coffee shop or from an airplane using the gogo inflight connection, there is a very significant delay between when you type and when the letters appear on your screen, which makes any sort of remote work next to impossible. Mosh gets around this issue by showing you what you type, bunching the updates to your screen in a low latency setup and later syncing the updates to your screen in background (e.g. if you type an 'i' in vi command mode, it should not show up, but it will show up when the network is very very laggy, however it will be erased when the state is synced from the remote server).

3. Mosh uses SSH protocol to communicate, so it automatically uses all the SSH configurations and writes to ssh logs. All one needs to do is call "mosh ", just like ssh. One doesn't need to muck with _any_ configuration files.

One important distinction between ssh and mosh is that ssh is a set of protocols whose utility goes well beyond providing a secure shell. It allows tunnelling, file transfer, X forwarding etc. None of those things are supported in mosh. Mosh was only meant to be a more robust shell.

That being said, Mosh hasn't been battle tested the way ssh has (specifically OpenSSH), but the design looks sound and simple with a relatively small attack surface, which gives me a lot of confidence in using it.

Go try it out.. and let me know what you guys think.

Thursday, February 05, 2015

i++ : performance centric solution to a common programming paradigm

NFV(Network Function Virtualization) is the new buzzword on everyone’s lips in the networking world. NFV, when done right, promises a big disruption to the well-established network appliance businesses by virtualizing most of its core functionality out, which in layman's term means that you don't need to spend mucho moolah on custom built Layer3-Layer7 hardware appliances. At a fraction of the cost, it can all be done in the software.

This is all great for the businesses and network vendors, but for us, it forces us to solve problems, the solutions for which you can't Google. These problems, although critical for manageability, serviceability and billing, tend to get ignored during initial stages of product development at other companies. One such problems which I worked on was incrementing a counter... just i++, as simple as that. E.g. if one packet comes in, you increment the packet_counter by 1.

Let me start with an overview of how it is handled at BOHVs (big old hardware vendor). Most of the BOHV boxes consist of line cards, each line card will have one or more asics, and each asic will deal with a fixed number of interfaces. These have set memory regions carved for the up-keep of each interface. So if you have to increment the packet_counter for interface G0/1/1, you just go to 0xblahblahblah and add 1 there. It goes for other counters too. Most of the popular network OSes out there are based on custom built *nixes/Linux kernels. Having a directly addressable memory region that stays in memory, is not paged out and will not cause a cache miss when you are incrementing counters when more than a million packets are coming in on just one of your interfaces, is a HUUUUGE help. Since most of the hardware is modularized, if you add extra interfaces, you are inadvertently adding extra memory to support that feature. However, you can't add a new counter for an existing hardware (unless of course, a separate reserved memory space is kept… which adds to the cost).

This simple cookie cutter design goes out of the window, when the number of interfaces or the type of services are not fixed or capped. Heck… we don't even know whether the services are going to be deployed on bare metal or on a virtual machine!!

So how do you update around 150K counters in memory, when the packets are coming in at multiple 10Gbps ports, there are multiple threads accessing the same counter, and you need to be able to query the value of that counter at any time with minimal or no disruption to the traffic. To make the matters simple, let’s take a variable X, which needs to be incremented simultaneously from a variable number of threads, around a million times a second.

If you are right out of college, and you have read anything about multithreaded programming, you'd suggest locks. However have threads taking and releasing write locks million times a second is a horrible horrible idea.

But what about atomic adds, the cute little primitives that gcc provides for incrementing counters? Sounds like a great idea until you look at the exact use case. Most of the counters to be updated are close to each other (so if you are incrementing packets_received, you probably also want to increment bytes_received). If you don't know how atomic add works, then you'll inadvertently end up making the performance worse. In some platforms, it invalidates the cache line, so if you need to increment the next counter, you'll end up causing a cache miss. And with around 10 threads contending for the same counter, your code's performance would still be terrible.

Let's say we had a way to update these counters, how do we read them? Some control process will have to go through each module, each thread and query them one by one. Again a somewhat bad idea, if you need to maintain a running state of your system in general.

We require some crazy out of the box thinking to solve this problem... or maybe not. We'll use the "other box" thinking here, use the solutions to completely different problems and use it ourselves. I looked at various database implementations, system implementation at Twitter, different “new age” technologies that were trying to solve the same problem with a different set of parameters in a different domain.

One stack that did catch my eye was Statsd in conjunction with a no-sql db. Many people had implemented exactly what we wanted, at least partly, and at much smaller scale. All the users who need to increment X, need to open a connection to Statsd, which will just aggregate the value of X over certain interval of time and then send it to the DB. There are tons of fast, slave backed performance centric DBs to chose from. This was a great design, however it would never have scaled to our use case of million counter updates per second (for 1 counter X).

We could definitely have the users cache the entries locally and then send the update at some interval, and we can club the updates together. But this essentially meant that we are just providing some Statsd like functionality to each thread... which we did. Adding our own daemon, which uses our own custom built IPC stack, further improved the performance a bit.

We then benchmarked the DB to find the upper limit of counter updates it could sustain and then designed the system around it.

So far, we just narrowed on where to store the counters, but we didn't know how to get the values to the daemon talking to the DB. The inspiration for this part came from the tried and tested "malloc". Counters of any type were allocated from the same page, doled out in blocks, and maintained in the same way. Since all the counters are in the same page, reading them and updating the DB became incredibly fast. Each thread maintains it's own block and the increments, at worst, take the hit of a cache miss.

You could have one thread do the collection, but with NUMA, the performance will again take a hit.
The result of the above exercise was an xml file and two simple apis exposed to the developers. All the other code was auto-generated using python. The amount of research done, and the technologies used (python, no-sql, xml ... all the way to Intel assembly, vtune, hyper-threading, prefetching and architecture) was baffling, but needed to be done in order to squeeze the most amount of work from a clock tick.

So next time you see a counter, number of emails in your inbox, number of tickets remaining, data used by your cellphone, think of all the sweat that went into giving you that number J


Sunday, May 18, 2014

You are not special..

Are you from a country that has some sort of currency? 
Now that you are intrigued by my idiotic question, please allow me the liberty of asking another one. If you are filthy rich in one country, lets say that you are a millionaire in US, living in a house that was featured on "Cribs", and you move to country that doesn't recognize your wealth. Do you instantly become poor? The answer is a resounding yes. Then all this proves is that you were rich only because you and other people around you, attached some value to the pieces of paper with numbers scribbled on them. The whole economy functions only because everyone sees some value in those pieces of paper. If I'm able to convince people that my torn sock has some value, then other people will think I'm rich. I'll be actually rich, when I believe that my torn sock is valuable, or if I exchange it for some belief system that I actually believe in (e.g. hard cash) before the reality catches up.

Let's look at social circles. There isn't much disparity between people within a circle, I.e. you wouldn't find people who are a lot poorer or a lot richer than you in your active circle of friends. Money act as differentiator here because it forces a change in priority, e.g. you wouldn't take a trip to Hawaii with friends if you are short on cash, but would definitely go on local hikes because it hardly costs anything.  Gradually friends turn into acquaintances and are sometimes forgotten. Slowly but steadily, you loose touch with people (Note that there are other reasons why people loose touch, but I'm just trying to point out this particular one). If you have lived a relatively fuller life, you'd accept the fact that people move in and out of our lives, just as we move through their's. Without our knowledge, forces of economics drive people closer to each other and rips us apart. All this, because we believe in a common currency!!

Forget money for a second, and think about culture. Think about it as money. What happens when a white catholic from America meets an upper caste Hindu brahmin from India? How does that interaction go? Let's assume for one second that the American has never heard or know anything about India, and the Indian doesn't know anything about America.. what then? Their interaction would completely depend on the place and environment in which they meet. If they meet at the Indian's house, the Indian would probably find the American less modestly dressed, lacking any knowledge of his religion, his traditions, bordering on blasphemy. The American would see the Indian in the same light if the meeting happened in America, he would find the clothes, food, traditions, deities with multiple hands completely weird. Both of them would find each other less cultured. Assuming both of them to be decent humans, the Indian would encourage the American to go to temples and introduce him to scriptures, on the other hand, the American would do the same if he is religious, if not he'd encourage the Indian to watch some opera or Broadway to get some "culture". If either of them disagree, there would be hostilities thrown around. 

Sadly that is how most of the wars start, because of both parties fail to understand the different point of view because it is so radically different. We got around the money problem by having currency exchanges which arbitrarily pits currencies against each other based on supply and demand and not the purchasing power it facilitates in the region where it is prevalent. I.e. if a 1 USD = 60 INR, and a couch costs 1 USD in USA but 30INR in India, who is richer, the guy with 30INR or the guy with 1USD? Still, these currency exchanges allow retention of economic status all over the globe... because everyone believes in some currency and they are more or less inter changeable.

Should we setup a culture exchanges like currency exchanges then ? If you can distinguish between Chardonnay and Sauvignon Blanc, should you be considered a high priest in Navajo? Would you consider Dr BalmuraliKrishna and Puccini similar? Would you see Baptism and Upnayanam in the same light regardless of your religious inclinations?
Of course that would be absurd. Cultural, economical, sociological inequalities are all absurd because they depend on a belief on an arbitrarily defined concept perpetuated by people. These concepts might be bit alien to people who have not interacted a lot with other cultures and have spent their whole life in a cocoon. 

Attacks on our rigidly held belief system tend to anger us. If the belief system is big enough, it might even start a war. A lack of a belief system, however, evokes nihilism, depression and general sociopathic tendencies. This seems to be a catch 22 situation. A belief system has given us distinct evolutionary advantage, even if has been something which defies logic. However, evolution's only concern is the survival of the species, not it's well being. (A recent study shows that chicken taste good to humans because chickens wouldn't have survived in the wild without human help, so humans started rearing tastier breeds of chickens while ignoring the other breeds). It may very well be the case that a fundamentalist faction exterminates all other races (as has been tried multiple times in the past) on earth.. We might end up becoming a paragraph in a grade school book. We might all be doomed to the same fate as Neanderthals when homo sapiens roamed the earth. Of course, the process won't end there. There'll be fractures in the victor's factions, a power struggle between the sub-factions, and eventual destruction of culture and advancement. 

Humanity is stuck in a feedback loop and is shaped by our own intellect, the limits of which, we cannot comprehend. We need a better approach, and we need to shape our own destiny as a race. What matters to you, doesn't matter to anyone else and vice versa. Why shouldn't we accept the disparity in the belief systems without being combative?

Sunday, October 13, 2013

Stop yearning for yester years..

Let me articulate a problem that people don't realize they have, and two words that cause this: Happiness & Sadness. We people like to be happy (whatever be the cost), and do not like to be sad. However, almost all of us, mistakenly assume happiness to be the exact opposite of sadness. This assumption is ingrained in our psyche, thrust onto us by our education system, our family, the society, the media, almost universally. But why is this wrong?

Take something banal from your own life, from say 10 years ago, say sharing a cup of road side chai with one of your friends/roommate, or taking a bus ride home on a bus that had barely any place to stand, or maybe just looking outside the window of your room (10 years ago). How do they make you feel now? Nostalgic, yes.. but also happy. How did you feel when you were actually living those moments? Just meh!

Now, take some actual defining moments in your life, when you were actually happy in that particular instant: these includes graduation, marriage, winning some contest that only 50 people in the world care about. How do they make you feel now? Those were just a blips in your existence when you were almost ecstatic. You were definitely happy during those few hours or few days of festivities, but your journey to those defining events, your college years, your courtship period is what made you more happy than those events itself.

So are you happy now? Most of us too jaded by the monotony would respond to this ambiguously, and even if someone replied that they were happy, they wouldn't be able to articulate why. Let me articulate it now for every one of you reading this post; You were happy in the past.. you will always be happy in the past.. 10 years from now, you will think you were happy at this very moment.

Lets switch to sadness. What makes you sad? Traffic Jam, chores at home, office work on weekend, existential crisis, when things don't work according to you.. in short pretty much everything.

What about some sad defining moments in your life? Loss of a loved one, an accident, getting fired from a job? This generally tend to generate two different kind of reaction, either you will look at them fondly (yes you heard me right) because it made you what you are today (the hardship you faced gave to skills to progress in life), or it will make you shed a tear whenever you think about it. The first kind generally tend to invoke happiness indirectly when you look back on your struggle, the second one might put you in a downward spiral or sadness, but you do eventually recover from it.

On an average, when are/were you sad? Now.. and when are/were you happy? In the past. This justifies the predisposition of people to yearn for the past, sometimes elevating it up to the utopian standards. This is a sad universal truth; in America, people want to live the way their parents lived, when the jobs were plentiful and everyone had a car and a house, but they discount and turn a blind eye to slavery. In India, the movies and soaps puts every one on a trip down memory lane, with all the bahus (daugher-in-laws) taking care of the house and guys bringing the moolah, but discount sati and dowry practices.

Everyone is chasing the proverbial happiness carrot, without realizing that it is something that cannot be achieved, only looked at. If you are not sad now, then you'll be happy in the future thinking about this very instant, so don't be sad because you are not happy now, because that'll rob you of your happiness in the future.

Learn from your experiences, and make new ones, but don't try to yearn for old ones.

PS: Inspired by the Republican rhetoric and Jon Stewart :)