PACIFIC: New data warehouse technology, all-round data coverage

 

The entire data ecosystem is a very large industry, because big data is hot, and the world is a $100-200 billion market. The data ecosystem includes data sources, underlying systems, and various upper-level big data analysis applications. Initially the data is generated at the data source, for example it is a transactional system, Oracle or MySQL, etc. There are other places where data is generated, such as mobile phones, ipads, web servers, etc. After the data is generated, it goes through ETL or is collected into the data warehouse. With the development of big data, artificial intelligence, and the Internet of Things, including the emergence of technologies such as blockchain, the data is getting bigger and bigger, and the requirements for data warehouses are getting higher and higher. Therefore, the technological innovation of data warehouse is the most.



 

The evolution of data warehouse

From the 1970s and 1980s to the present, the evolution of data warehouses can be roughly divided into three generations. The earliest data warehouse is based on the most traditional transactional database technology, such as Oracle, which uses shared storage, which is the high-end storage of EMC or IBM. Its disadvantage is that it can only be extended to a dozen nodes, so after a dozen nodes, storage bottlenecks will be encountered, and the price is relatively expensive.

The MPP system appeared in the 1980s, which belongs to the second generation of data warehouse. The first productized MPP was Teradata. In terms of hardware, the technology of mainframe, minicomputer, and some proprietary hardware is adopted. Later, some start-up companies appeared, such as the more famous Greenplum and Vertica around 2000. They are MPPs based on the X86 architecture, massively parallel processing MPP systems. These startups were eventually acquired by giants. For example, Greenplum was acquired by EMC, and Vertica was acquired by HP.

The second-generation system solves some of the scalability problems, and can basically reach the scale of 100 nodes, but it is more difficult to go further.

In recent years, third-generation systems have emerged, such as the SQL system on Hadoop or the SQL system on the cloud, which we call a new generation of data warehouses.

 

PACIFIC - New Data Warehouse Technology

In recent years, with the application and development of database technology, people try to reprocess the data in DB to form a comprehensive and analysis-oriented environment to better support decision analysis, thus forming data warehouse technology (Data Warehousing, referred to as DW). As a Decision-making Support System (DSS), the data warehouse system includes:

Data warehouse technology;

On-Line Analytical Processing (OLAP);

Data mining technology (Data Mining, DM for short);

The data warehouse makes up for the shortcomings of the original database, and develops the original data environment centered on a single database into a new environment: a systematic environment.

 

PACIFIC - Comprehensive Data Coverage

Data warehouse is a new data warehouse technology based on big data technology. PACIFIC relies on the distributed storage and computing of big data, and adds the support of SQL to form an architecture, so it is completely different from the architecture of traditional data warehouses. A data warehouse is a storage and computing service for massive data. It uses a distributed architecture to solve data storage problems and has strong scalability. In terms of data processing, in order to avoid the huge overhead caused by the movement of massive data, the data warehouse uses a mobile computing architecture to distribute computing tasks to data nodes for computing, store them in multiple nodes, and execute concurrently after each node receives computing tasks, and finally aggregate the partial results to obtain the final result.

 

Epilogue

The PACIFIC global ecological community will be committed to building a diversified ecosystem through "decentralized" autonomy. Decentralized" governance effect, creating a fair and open participation environment and participation experience.

评论