Rewriting live system

Krste Šižgorić
8 min readJun 9, 2021

--

Rewriting live system

Indiana Bell was the central Bell telephone company in the state of Indiana. It was founded in 1920 and it provides a wide variety of advanced telecommunications services.

As it grew, nine years later, Indiana Bell bought Central Union. It was planning to move into their headquarters, but Central Union’s building was too small for Bell’s needs. They decided to build a new building, but since the old building was providing telephone service to the whole city of Indianapolis, Bell couldn’t just tear it down. And they really wanted to have a new building in the same place as the old one. So they decided to move the existing building to make room for a new one.

From October 12 until November 14, 1930 the building was moved 16 meters south, rotated 90 degrees, and then moved an additional 30 meters to the west. Impressive thing was that the building was operational the whole time. Telephones, electricity, gas pipes, water pipes, sewers… everything was working. Although the building was moving, business was going as usual and people were working inside, still providing service to the city.

In my opinion, rewriting of the live system can easily be compared to this event. Extent probably won’t (even though sometimes it could) be the same, but the process is. You have something that needs to be active the whole time while you prepare changes and implement new functionalities.

Decision making

Sometimes as a developer, you get a clean slate and can rewrite the whole system from the start. But in my experience most rewrites become a hybrid between rewrite and refactoring. Companies are not willing to wait for the whole system to be rewritten. How to explain to a client that they need to wait a year (or more) to get a new product when a current product works? It will work better, faster and more secure? From their point of view this is a big investment with low business value.

They want to start using new or improved functionalities right away. This could only be achieved by using the old and new system at same time, and combining their functionalities. So we prepare pipes and cables, and start moving the building.

I had the opportunity to work on a 7 years old system that was created in a MVC framework. Originally it was a prototype, but ended up in production. With time it grew on more than 130 tables. Code was everything but clean: business logic was everywhere, there was no separation of concerns, code was duplicated on a massive scale (since a lot of it was generated)…

This was an ideal candidate for rewrite. But this was unacceptable for the client. There were problems with performance in parts of the system and some functionalists needed to be changed right away. Client was simply not willing to wait to get a fully rewritten product. Working with the old framework was equally unacceptable because it did not meet the needs of new functionalities. Someone in the past obviously modified the original framework to implement some features so there was no possibility of updating it now.

Conclusion was that the system will be gradually rewritten, progressively separating it into two projects, front-end and back-end. Since React can be partially used on pages it was a logical choice for the front-end. New components could be planned and implemented right away. Back-end will be web API written in C#, and the system will be separated into modules. Functionalities will be moved from existing solution to new API and that API will be called from the old system through the repositories/services. Since there will be a layer of abstraction between old and new system it gave us the opportunity to create a new database for a new system instead of using an existing one. This way the database could be cleaned from unused columns and tables.

Divide and conquer

This is easier said than done. As a developer I got myself into a situation where I want to fix a bunch of stuff at the same time. But a smarter approach is to fix small things, one by one, evaluate changes and ensure that nothing is broken.

If a system or feature that needs to be refactored is small you can easily analyze it and determine what needs to be done. With time this comes naturally to developers. But if the code base is big and complicated, analyzing the whole thing is not possible. Or better put, it is not a good approach. You should “divide” the solution into smaller features and then isolate one of them. Now you have something to work with and can focus only on that. Rest of the system should stay the same.

Once isolated, you should go through that functionality to see how it works. Don’t think, know. This is something that really bothers me in the industry. Often I run into situations where developers tell me that they think some feature works a certain way. If you don’t know for sure how a part of the system works, you need to analyze it before you do anything. One feature can be closely related to another, and you (or anyone on project) may not be aware of that.

Earlier mentioned system had a notice board. Users in a specific role could publish notification on board and optionally inform selected users via email. This was chosen as the first functionality that will be rewritten. By analyzing the code we noticed that notification could be connected to a specific entity, and content of the mail was depending on entity type. This was something that we originally didn’t know, and the client forgot to mention it. Additionally, there was functionality to notify users via SMS that didn’t work (since the service provider no longer provided this service), but it was still called in code.

Since you now know what you want to change and how it works, the best thing to do is to make it testable. You achieve testability by putting abstraction between different parts of business logic. If you receive the dependency as an interface (instead of specific implementation) you can mock it and mimic different scenarios. That way you can test part of the logic to see how it behaves under different circumstances.

Unit tests are not only useful for functionalities that you plan to refactor. Real purpose of unit tests is to ensure that a feature behaves exactly as you planned it to behave. If there is a bug, unit tests will (should) reveal it to you. This is very useful if you decide to change part of the implementation. And the main reason why you shouldn’t start refactoring things before you can test it.

I started to refactor the old notice board. Everything related to it was copy/pasted into a NoticeBoardService. At the time I thought it was not wise to change anything. This meant that mail sending should stay the same. I hid the data accessing part behind the repository, and I put an interface on it. Now the repository was handling saving and retrieving data from the database and the service was processing responses from the repository, sending mail and/or returning errors.

I decided to write unit tests for the repository (I was planning to write a new implementation with an HTTP client in it) and for service. While I was writing unit tests for service I stumbled upon one use case that I was not sure if I needed to cover: is it possible to have a user without an email address. Long story short, there was a bug. Email was a mandatory field, but regular expression used for validating email addresses enabled inserting the space character. Emails were sent in a foreach loop (personalized mails with user names), but since an email address is not valid if it contains space it would throw an exception and notifications wouldn’t be sent after that point. System would swallow up this exception (since it was sent in background) and the user wouldn’t be aware of this problem.

Even though I did fix the email regular expression and removed all spaces from email addresses in the database, I kept a test for this use case. I changed mail sending logic in such a way that, if email is not in valid format, users will be aware that there was an error.

Doing this gave me the possibility to change the system in a secure way. Now I know how functionality behaves and have unit tests to ensure that that behavior will persist if I start changing things.

Friendly advice

Preparations are now over and rewriting can finally begin. It is crucial to split work into smaller pieces. If you start refactoring a huge chunk of the code, you will have too many moving parts. It is a bad decision to try to plan everything at once. That will get you in a situation where a lot of things can (and some certainly will) go wrong. Multitasking in refactoring is a bad thing. It carries a big risk of mistakes and therefore a bigger chance of failure of system rewrite/refactoring.

For example, starting to rewrite the system by creating the database first. Systems have business logic that should dictate data models, so the database shouldn’t be a first step. Presuming that you know business logic so well that you can predict how the whole system will work is in best case reckless. Later changing the database and/or data model is harder than it seems at first. You will eventually get into the situation “let’s keep it like this now” where you don’t want to change anything because consequently too many things need to be changed. And you started rewriting the system because you found yourself in that situation.

Conclusion

Altering behavior of a live system is hard work. Even harder if you are rewriting the system and integrating it with the predecessor at the same time. Often you end up refactoring a lot of the old code just to be able to rewrite one part. It would be easier to rewrite the whole system, but in most cases you won’t have that privilege. So you are forced to maintain the old system while under the hood replacing old with new.

Each project is different and so is the process of rewriting it. Most of that process is improvisation and trial and error. Sometimes you start with back-end, sometimes front-end and sometimes you do feature by feature. But if you try to force yourself to practice TDD (test driven development) this process becomes simpler and safer. You can maintain stability of the system, while implementing something that used to look impossible. Just like moving the building.

--

--

Krste Šižgorić
Krste Šižgorić

Written by Krste Šižgorić

Full stack Software Engineer focused on system architecture and creating reusable software.

No responses yet