Friday, 20 September 2013

Pentaho - A Powerful Business Intelligence Tool

The Pentaho BI Project is open source application software for enterprise reporting, analysis, dashboard, data mining, workflow and ETL. There are many business analytics vendors available but, what’s different in Pentaho is that it brings together IT and business users to access, integrate, blend, visualize and analyze all data that impacts business results. As pentaho has a commercial open source license, open source heritage drives continued innovation in a modern, unified, embeddable analytics platform that is purpose-built to save time and money. Moreover Pentaho is an integrated platform. Most business analytics vendors do some range of reporting and analytics; and actually the data integration process is combined with the analytical toolset, saving customers time, money and getting faster time to value. Pentaho is developed entirely on java and has and Open Web-Based API’s and has a unique pluggable Architecture

Pentaho is open source based BI suite offering Reporting, ETL, Dashboards and Data Mining capabilities. It offers free community edition with online community support through forums and wikis and an enterprise suite encompassing all BI modules. The Pentaho platform can be used for business Analytics, data mining , data integration, big data and for analysis services .Now further will talk about each of the utility and the platform provided by Pentaho to implement the utility briefly

Pentaho Big Data Analytics –
Within a single platform it provides visual tools to extract and prepare our data plus the visualizations and analytics that will change the way we run our business.  Regardless of the data source, analytic requirement or deployment environment, Pentaho allows to turn big data into big insights. A tightly coupled data integration and business analytics platform accelerates the realization of value from big data.


Pentaho Business Analytics-
Pentaho's modern, simplified and interactive approach empowers business users to access, discover and blend all types and sizes of data.  With a spectrum of increasingly advanced analytics, from basic reports to predictive modeling, users can analyze and visualize data across multiple dimensions, all while minimizing dependence on IT.  At the same time, a true designed-for-mobile experience ensures users are productive no matter where they are.The Pentaho Business Analytics suite provides a full spectrum of data integration and business intelligence (BI) capabilities including ETL, OLAP, query and reporting, interactive analysis, dashboards, data mining and a BI platform that has made it the world's most popular open source BI suite. Pentaho's platform also provides broad enterprise data services including integration with Hadoop for big data analytics and support for the company's Agile BI initiative, which enables organizations to build BI applications more quickly, respond to business changes more easily and expedite time-to-value for up to 90% less cost than traditional BI vendors.

Pentaho Data Integration-
With Pentaho Data Integration, Pentaho is redefining the way that BI applications are built and deployed. Utilizing Pentaho’s Agile BI approach, Pentaho Data Integration unifies the ETL, modeling and visualization processes into a single, integrated environment that enables developers and end-users to work seamlessly together.  The end result is that BI developers and end users can build BI applications more quickly, easily and at a small fraction of the cost of traditional solutions.

Pentaho Analysis Services(Modrian)-
Pentaho Analysis Services Community Edition also known as Mondrian. Mondrian is an Online Analytical Processing (OLAP) server that enables business users to analyze large quantities of data in real-time. Users explore business data by drilling into and cross-tabulating information with speed-of-thought response times to complex analytical queries. Pentaho Analyzer provides intuitive, interactive analytical reporting letting non-technical business users quickly understand business information. As part of the enhanced functionality in Pentaho Analysis Enterprise Edition, Analyzer features:
·         Web-based, drag-and-drop report creation
·         Advanced sorting and filtering
·         Drill through reports into the underlying data
·         Chart visualizations including conditional stop-lighting

Pentaho Data Mining (Weka)-
Pentaho Data Mining Community Edition (CE) also known as Weka. Pentaho Data Mining is a comprehensive set of tools for machine learning and data mining. Its broad suite of classification, regression, association rules and clustering algorithms can be used to help you understand the business better and also be exploited to improve future performance through predictive analytics.

The main focus of this article is mainly on Pentaho Report Designer (PRD), Pentaho Data Integration (PDI) and Pentaho Bi-Server. These are readily available in the community edition pack of Pentaho BI Suite Community Edition (CE).Further with throw light on each of them individually.

Pentaho Report Designer(PRD) -
Pentaho Reporting is a suite of tools for creating pixel perfect reports. With Pentaho Reporting we are able to transform data into meaningful information tailored to concerned audience. We can create HTML, Excel, and PDF, Text or printed reports. If you are a developer, you can also produce CSV and XML reports to feed other systems. It provides a very simple interface to create report and very easy to use features. Pentaho reporting is a powerful tool to create reports, it provides great connectivity with vivid options to connect to various data sources like MySQL, Oracle, Mongo DB, Java Beans and lot more. It provides various designing options which helps to portray your ideas to reality with just a snap of finger and without compromising on the reliability. Further it provides wide range of export options like PDF, XLS, XML, and many more. We can even add various kind of charts which provide better portrayal of the data. Reports made in the PRD are dynamic as it could take input from the end user in terms of parameters and create report on the fly with those parameters.  For designing a report following simple steps could be followed – 

1.      Define a data source      
      2.      Determine the query to retrieve fields from the data source
      3.      Design the Report with the tools provided.
      4.      Export the report in required format.

Pentaho BI-Server:
The Pentaho BI Platform provides the architecture and infrastructure required to build solutions to business intelligence (BI) problems. The framework provides core services including authentication, logging, auditing, web services, and rules engines. The platform also includes a solution engine that integrates reporting, analysis, dashboards and data mining components. The modular design and plugin based architecture allows all or part of the platform to be embedded into third party applications by end users as well as OEMs Through Pentaho Server you can create Ad-Hoc Reports easily , with only few simple steps. Following simple steps could be followed in order to create a report –
1.      Define the data source
2.      Select the template
3.      Select the fields and put them into specified bands using Drag and Drop , even we could add the constraints and Groups
4.      Next we could customize the Selections of the fields like setting the formats and applying filters, functions (like count, sum etc.) ,sorting and alignment
5.      Next we could specify Orientation and the Paper for the Report and could key in the description for the report , as well as the Header and Footer for the Report and Page could be specified here
6.      Now we are ready with the report , at the bottom there is a drop down which says Preview As and options are HTML/Excel/CSV and after choosing one of them we could hit Go to view the report run in the specified format.

Pentaho Data Integration(PDI):
Pentaho Data Integration Community Edition (PDI CE) also known as Kettle. Pentaho Data Integration delivers powerful Extraction, Transformation and Loading (ETL) capabilities using an innovative, metadata-driven approach. With an intuitive, graphical, drag and drop design environment, and a proven, scalable, standards-based architecture, Pentaho Data Integration is increasingly the choice for organizations over traditional, proprietary ETL or data integration tools. Pentaho Data Integration provides support for slowly changing dimensions and surrogate key for data warehousing, allows data migration between databases and application, is flexible enough to load giant datasets, and can take full advantage of cloud, clustered, and massively parallel processing environments. We can cleanse your data using transformation steps that range from very simple to very complex. Pentaho Data Integration provides tools that include ETL, modeling, and visualization in one unified environment — with the help of Spoon interface. Basically there are two main components Transformations and Jobs.
There are basically two building blocks which leads the course of action — Transformations and Jobs .

A transformation is a network of logical tasks called steps. Transformations are essentially data flows. For instance , a developer creates a flat CSV file with enormous data, and other database developer has to put a part of data of the CSV file in a database filtering all the other data and as an extra he wants to keep track of the mismatch records in a file, so basically it would be done as :

The two main components associated with transformations are steps and hops. Steps are the building blocks of a transformation, for example a text file input or a table output whereas Hops are data pathways that connect steps together and allow schema metadata to pass from one step to another.

Jobs are workflow-like models for coordinating resources, execution, and dependencies of ETL activities. For instance a simple Job would be to load a data from a data file, like shown below Jobs starts with waiting for the data file then it loads the data at desired location and if the data successfully loaded then heads to success or goes to error log.


Pentaho in a nutshell is a very powerful platform in order to perform Business Intelligence Operations. Please put suggestions and questions in  the comments section I will be more than happy to answer them.


  1. Syntax:
    pentaho bi development services

    Good Design,Great Explanation About Pentaho Big Data Analytics,Pentaho Business Analytics,Pentaho Data Integration,Pentaho Analysis Services(Modrian),Pentaho Data Mining (Weka),Thanq Admin.

  2. I like your blog, I read this blog please update more content on hacking, further check it once at Tableau Online Course


Please post your queries or suggestions here!!