Skip to main content

Vertica Analytic Database Architecture

The below mentioned diagram illustrates the basic system architecture of Vertica on a single node.
Pic: Vertica.com
  • Queries are issued in SQL to a front end that parses and optimizes queries.
  • Vertica is internally organized into a hybrid store consisting of two storage structures: WOS and ROS
  • Write-Optimized Store (WOS) is a data structure that generally fits into main memory and is designed to efficiently support insert and update operations.
  • The data within the WOS is unsorted and uncompressed.
  • Read-Opytimized Store (ROS) contains the bulk of the data in the database, and is both sorted and compressed, making it efficient to read and query.
  • A background process called the Tuple Mover, moves data out of the WOS into ROS.
  • As it operates on the entire WOS, the tuple mover can be very efficient, sorting many records at a time and writing them to disk as a batch.
  • Both WOS and ROS are organized into columns, with each columns representing one attributes of a table. 
  • Each column may be stored in one or more projections that represent partially redundant copies of the data in the database. 

How actual data is stored in Vertica?

The below mentioned diagram illustrates how logical data in an example sales table is physically stored as columns.

Pic: Vertica.com
  • As stated earlier, each column may be stored in on or more Projections that represent partially redundant copies of the data in the database.
  • For example, sales table might be stored as two projections, one called sales-prices with the columns (oid,pid, date,price) and other called-salescustomers with the columns (oid,pid,cust).
  • Each of these projections has a sort order that specifies how the data in the projection is arranged on disk. e.g. The sales-customers projection mighted be sorted on customer-id. This makes it efficient for totaling all of the products that a customer brought. By storing several overlapping projctions of a table in different sort orders, Vertica can be efficient at answersing many different types of queries.
  • Vertica's Database Designer automatically selects a good set of overlapping projections for a particular table based on set of queries issued to that tableover time.
  • It may seem that redundantly storing data in multiple projection is wastage of disk space. However, Vertica includes aggressive column-oriented compression schemes that allows it to reduce the amount of space a particular projection takes up in the ROS as much as 90%. 
  •  

Comments

Popular posts from this blog

Projections in HP Vertica - 1

What  are Projections in HP Vertica?  Lets try to understand by comparing with traditional databases like - Oracle, MySQL, SQL Server etc..  In traditional database architecture, data is physically stored in table. Additionally, secondary tunning structure such as index and materialized view structure are created to improve query performance.  In contrast, table donot occupy any physical storage atallin vertica.  Physical storage c onsists of collection of table columns called projections. Projections store data in a format that optimize query execution. They are simmilar to MVs in that they store result set on disk rather then compute them each time they areused in a query. The result set are automatically refreshed whenever data values are inserted, appended or changed.  Projections are not aggregated but rather store row in a table e.g. full atomic detail Definition:  Optimized collection of table columns that provide physical storage for data. A proj

Introduction to Vertica Database

What is Vertica Analytic Database? Vertica Analytic Database is designed to manage large, fast growing volume of data. Vertica was developed by Vertica Systems. It was founded in 2005 by database researcher Michael Stonebraker and Andrew Palmer. Vertica was acquired by Hewlett Packard in March 2011 Vertica Analytic Database is an innovative, ground-up implementation of a relational database management systems optimized for read-intensive workloads. Vertica provides extremly fast ad hoc SQL query performance, even for very large database, making it well suited for: Data warehousing Data marts Fraud detection Call detail analysis Business intelligence Other query intensive applications.   Why to use Vertica? By this time i guess you all would be able to say- Why should we user Vertica? This answer is pretty simple- Performance The key reasons for Vertica's performance are mentioned below:   Vertica organizes data on disk as columns of values from the same att