Although a Thor processing cluster can be implemented and used without a Roxie cluster, an HPCC environment which includes a Roxie cluster should also include a Thor cluster. Roxie utilizes a distributed indexed filesystem percormance provide parallel processing of queries using an optimized execution environment and filesystem for high-performance online processing.
The HPCC platform incorporates a pdff architecture implemented on commodity computing clusters to provide high-performance, data-parallel processing for applications utilizing big data. Both Thor and Roxie clusters utilize the ECL programming language for implementing applications, increasing continuity and programmer productivity. The first of these platforms is called a data refinery whose overall purpose is the general processing of massive volumes of raw data of any type for any purpose but typically used for data cleansing and hygiene, extract, transform, load downllad of the raw data, record linking and entity resolution, large-scale ad-hoc complex analytics, and creation of keyed data and indexes to support high-performance structured queries and data warehouse applications.
The HPCC system architecture includes two distinct cluster processing environments, each of dlwnload can be optimized independently for its parallel data processing purpose.
High Performance Cluster Computing
Parallel computing Distributed computing Declarative programming languages Query languages Data warehousing products. A Thor cluster is similar in its function, execution environment, filesystem, and capabilities to the Google and Hadoop MapReduce performamce. It is an alternative to Hadoop. Retrieved from ” https: The Thor cluster is used to build the distributed index files used by the Roxie cluster and to develop online queries which will be deployed with the index files to the Roxie cluster.
This page was last edited on 27 Octoberat The data refinery is also referred to as Thora reference to the mythical Norse god of thunder with the large hammer symbolic of crushing large amounts of raw data into useful information.
The second of the parallel data processing platforms is called Roxie and functions as a rapid data delivery engine.
High Performance Cluster Computing: Programming and Applications
A Roxie cluster is similar in its function and capabilities to Hadoop with HBase and Hive capabilities added, and provides for near real time predictable query latencies. In addition to the Thor master and slave nodes, additional auxiliary and common components are needed to implement a complete HPCC processing environment.
Retrieved 18 November Retrieved 20 November From Wikipedia, the free encyclopedia. Views Read Edit View history.
Handbook of Data Intensive Computing. Figure 2 shows a representation of a physical Thor processing cluster which functions as a batch job execution engine for scalable data-intensive computing applications.
Handbook of Cloud Computing. Figure 3 shows a representation of a physical Roxie processing cluster which functions as an online query execution engine for high-performance query and data warehousing applications.
This platform is designed as an online high-performance structured query and analysis platform or data warehouse delivering the parallel data access processing requirements of online applications through Web services interfaces supporting thousands of simultaneous queries and users with sub-second response times. Retrieved 29 October All articles with dead external links Articles with dead external links from October Articles with permanently dead external links.