Skip to main content

Projections in HP Vertica - 1

What  are Projections in HP Vertica? 

Lets try to understand by comparing with traditional databases like - Oracle, MySQL, SQL Server etc.. 
  • In traditional database architecture, data is physically stored in table. Additionally, secondary tunning structure such as index and materialized view structure are created to improve query performance. 
  • In contrast, table donot occupy any physical storage atallin vertica. 
  • Physical storage consists of collection of table columns called projections.
  • Projections store data in a format that optimize query execution. They are simmilar to MVs in that they store result set on disk rather then compute them each time they areused in a query. The result set are automatically refreshed whenever data values are inserted, appended or changed. 
  • Projections are not aggregated but rather store row in a table e.g. full atomic detail

Definition: 
Optimized collection of table columns that provide physical storage for data. A projection can contain some or all the columns of one or more tables. A projection that contains all of the columns of a table is called super-projection. A projection that contain one or more tables is called pre-join projection.

What are the benefits of Projections?

  • Projections allow for the sorting of data in any order ( even if different from the source tables). This enhances query performance and compression.
  • Projections delivers high availability optimized for performance, since the reduntant copies of data are always actively used in analytics. We have the ability to automatically store the redundant copy using a different sort order. This provides the same benefits as a secondary index in a more efficient manner.
  • Projections do not require a batch update window. Data is automatically available upon loads.
  • Projections are transparent to end-users and SQL. The Vertica query optimizer automatically picks the best projections to use for any query.
  • Projections are dynamic and can ve added/changed at ay time without stopping the database.
Note: Vertica's projections represent collections of columns (so you can say- It's a table), but they are optimized for analytics at the physical storage structure level and are not constrained by the logical schema.

As we have covered the simple concept of Vertica's projections. Now as we have a basic understanding of that projections are. Lets go in more detailin the next post.  Vertica Projections -2

Please share it among your friends. Lets learn together!!






Comments

  1. Hi, Nice article. please add more details about vertica

    ReplyDelete

Post a Comment

Popular posts from this blog

Vertica Analytic Database Architecture

The below mentioned diagram illustrates the basic system architecture of Vertica on a single node. Pic: Vertica.com Queries are issued in SQL to a front end that parses and optimizes queries. Vertica is internally organized into a hybrid store consisting of two storage structures: WOS and ROS Write-Optimized Store (WOS) is a data structure that generally fits into main memory and is designed to efficiently support insert and update operations. The data within the WOS is unsorted and uncompressed. Read-Opytimized Store (ROS) contains the bulk of the data in the database, and is both sorted and compressed, making it efficient to read and query. A background process called the Tuple Mover , moves data out of the WOS into ROS. As it operates on the entire WOS, the tuple mover can be very efficient, sorting many records at a time and writing them to disk as a batch. Both WOS and ROS are organized into columns, with each columns representing one attributes o...

The Vertica Approach

Vertica is built from the Ground Up on the 4C's Pic: Vertica.com     Column Store : Vertica store table data as sections of columns rather than as rows. Column store is ideal for read-intensive workloads as it can dramatically reduce disk I/O. Pic: Vertica.com Compression:  Vertica employs aggressive compression of data on disk, as well as a query execution Store more data, provides more views, and uses less hardware, which allows keeping much more historical data in physical storage. Pic: Vertica.com   When similar data is grouped, we have even more compression options. The above figure shows few of the compression algorithms - RLE, Delta Encoding and Float Compression Vertica applies over 12 compression techniques. Dependent on data. Vertica system choses which to apply. NULLs have virtually no space. Typically we can see, 50% - 90% compression in Vertica Vertica queries data in encoded form. Clustering: Lets...