Leading digital advertising agency advances data analytics using Vertica on AWS cloud

THE CHALLENGE

The client wanted to use their business intelligence tool to query their historical data for 90 days in under two seconds with more than 80 concurrent users. Their current system included data in Hadoop clusters and Infobright DB on-premise cluster, which was unable to handle their data analytics requirements.

THE SOLUTION

Beyondsoft’s Big Data consulting team proposed a solution using Vertica, a columnar database on Amazon Web Services (AWS). The AWS cloud solution would provide the added scalability, elasticity, and performance that the customer wanted. The project consisted of creating Vertica clusters in a repeatable manner and a pipeline-based approach for Vertica DDL. It also included moving large amounts of data daily to the Vertica cluster.

The project consisted of three phases:

  • Phase 1 involved creating a repeatable deployment process through infrastructure as code (Terraform) for Vertica cluster. The Vertica AMI was procured from AWS Marketplace. Beyondsoft engineers added the ability to launch different Vertica nodes as part of a cluster through tags to AWS Elastic Load Balancer (ELB). The Vertica cluster consist of two AZ’s in active/active nodes, with 16 nodes total and 90 days of data, which came to approximately 40TB. AWS Systems Manager Service (SSM) and CloudWatch Logs are used for administration of the cluster. This Vertica infrastructure as code also integrates with the customer’s self-servicing tool for their developers.
  • Phase 2 included a pipeline-based approach for Vertica DDL. Vertica DDL is pushed through the pipeline using Liquibase, a java framework for database change and deployment. This ensures that production Vertica clusters are not touched manually for schema changes.
  • Phase 3 involved setting up ETL from the Hadoop cluster to Vertica using a producer/consumer pattern. The ETL code is written in python code with on-demand Fargate containers, which extract data from Hadoop and store it in zipped files in S3. From there, jobs are created to load data into Vertica from S3. The data is around 120GB/day with around 570M rows loaded at its peak. The customer-facing java application has several dashboards which are able to procure data from Vertica in under two seconds query time with concurrent usage.

TECHNOLOGIES USED

Vertica on AWS, AWS SSM, AWS CloudWatch logs, AWS S3, AWS ELB, AWS Fargate, AWS Parameter Store, AWS ECR, Python, Jenkins, etc.

KNOWLEDGE TRANSFER

Beyondsoft educated the client’s data analytics team around the newly created solution and Terraform and provided a runbook, to enable them to both manage and add to the solution in future. Beyondsoft also provided education on the various AWS services and customized training sessions on various topics.

BENEFITS

Moving from an on-premise cluster to the cloud increased the scalability, agility, and performance of the whole solution. Taking a DevOps approach through data pipelines decreased go-to-market time for code changes. Infrastructure as code provided a repeatable way to create infrastructure, increasing operational consistency and reducing bugs.

Download Case Study

Why BEYONDSOFT

Our onshore, nearshore, and offshore delivery services support our customers' businesses 24/7, 365 days a year. We have been providing services to major Japanese SI companies for decades. Since 1999, we have 25 years of experience in Japan, with a staff of 500-600 people. Our long-standing success stories prove how crucial we are in delivering a return on investment for our clients. Singapore is our global headquarters, and we have 14 regional offices around the world.

30年以上にわたる強力なITコンサルティングサービスの経験

COBOL, C, Java, Pythonなど幅広い開発言語や開発環境に対応

SAPにおけるABAP, BTP, Fioriなど幅広く対応

40以上のグローバルデリバリーネットワークを持つ4大陸をカバー

CMMI 5、ISO 9001、ISO 14001、ISO 20000、ISO 27001、ISO 22301、ISO 45001、TMMi5の認証

マイクロソフトの専門家であるAzure MSP

Beyondsoftと共にビジネスの潜在力を最大限に引き出しましょう。私たちがどのようにイノベーションを推進し、効率を向上させ、ビジネスの成長を実現するのか、ぜひご相談ください。

en_USEnglish