Geo-distributed data sync solution for large volume of complex data

API integrationdata replicationenterprisegeo-distributedMVPPoCSaaS platform integration

Our Clients

Client Background

The solution has been developed for the largest South Korean business conglomerate and electronics manufacturer.  It has a global network of distributors and partners and operates worldwide.

Business Challenge

The product catalog contains almost 1 million SKUs, each accompanied by different marketing and supplementary materials. Everything exists in multiple versions for different languages. The challenge was to organize auto sync of product details between Head Quarter database and the global marketing platform’s database in such a way, that

  • no changes are implied to existing solutions
  • cost-effective
  • ready for peak high load

It is important to note that the systems to be integrated are geo-distributed and are located on almost opposite sides of the globe.

Challenge in numbers:

  • Total number of SKU’s to operate on top of – approx. 1 000 000 SKU’s
  • Total volume of materials – more than 40 TB
  • Physical distance between key API gateways – approx. 9 000 km

Value Delivered

Project status

  • successfully delivered Proof-of-Concept (PoC) phase to demo the potential approach for the solution
  • successfully delivered Minimum Viable Product (MVP) and put into operation

The overall project timeline:

  • PoC phase – 3 weeks
  • MVP phase – 3-6 months

Value delivered

  • High-Level Architecture (HLA) design and research of existing APIs
  • Requirements for API changes to be implemented so that bulk operations required for the peak load are supported
  • PoC solution to show the potential of API-based approach
  • MVP solution developed and put into operation

SUCCESS STORY IN DETAILS

Background

To e-mail information about a couple of product updates or new products to several partners – relatively simple operation. Things going to be much more complex when you have hundreds of thousands of products (new and existing) and tenths of languages in the materials to be distributed at.

Another dimension of the complexity is the fact that you have flagship products that release is a global event with significant importance for the overall business and it is necessary to prepare tons of materials up-front and then distribute them across the globe in the shortest possible time during the product’s launch event…

That raises the challenge of finding a cost-effective way of handling peak loads.

Geographic distance between 2 major system hubs of thousands of kilometers adds delays which could cause significant problems during the peak load and simple retry logic won’t help. Especially, taking into account that peak load moments are the most demanding moments.

Security is essential for any enterprise-grade solution and must be implemented at a high level as well.

Solution

Taking into account the above-mentioned requirements the decision was made to implement a serverless solution based on AWS geo-distributed infrastructure.

Serverless was responsible for peak loads and the ability to control the costs for low and high load periods.

AWS has provided excellent geo-distributed infrastructure and secure enterprise-level compliant services to focus on business tasks rather than low-level infrastructure routines.

What has been done:

  • Careful analysis of existing API’s at both sides of the integration
  • High-Level Architecture (HLA) design to enable planning and effort estimates
  • PoC and MVP implementation
  • Automated Performance testing
  • Automated deployment of the solution to Production environment

Lessons learned

  • Cloud-native solutions have phenomenal scalability potential with affordable Total Cost of Ownership (TCO)
  • Right cloud-oriented design patterns deliver excellent results