Saturday, September 3, 2016

Power Query: how to connect to Essbase and pull data

UPDATE5: ExoInsight 7.0.0 now makes working with Tableau much easier than before. You can access ExoInsight from inside Tableau!

UPDATE4: ExoInsight also makes working with OLAP data (now both Essbase and Analysis Services) easier in Tableau. Instead of struggling to get your Oracle Essbase data in a format that Tableau prefers, ExoInsight does all this heavy lifting for you. ExoInsight eliminates this challenge by presenting your Essbase data and metadata in a relational format, instantly and on-the-fly, unlocking it for downstream reporting tools such as Tableau. The challenges with Tableau’s built-in Oracle Essbase connector are well documented, not because the connector doesn’t do the job advertised, but because of the inherent discrepancies between OLAP and relational data. Tableau’s internal engine prefers data in row-and-column format, and this concept simply doesn’t apply to OLAP sources such as Oracle Essbase.

Enter ExoInsight. Instead of struggling to get your Oracle Essbase data in a format that Tableau prefers, ExoInsight does all this heavy lifting for you, formatting your Essbase data in such a way that makes it easy to work with.

UPDATE3: ExoInsight enables Power BI Direct Query capability to Essbase. But its data staging ability makes ExoInsight much more useful than as just a PBI connector. ExoInsight allows you to pull Essbase data at any level, including calculated members, into a SQL Server relational, flat format that's perfect for use in downstream processes. Easily schedule SQL Server jobs to have your Essbase data refreshed in ExoInsight. You can also stage Essbase data that doesn't change (last year's budget, for example) in ExoInsight rather than have to make yet another roundtrip to Essbase from Power BI.

UPDATE2: Now you don't even have to know MDX to use the Power BI connector for Essbase. This makes it much more accessible for people who don't know Essbase, but still need to pull live data directly from it, without ETL/data marts/data warehouses and the accompanying lag times.

UPDATE: We've now created a fully-supported Power BI connector for Essbase. It's significantly improved from the raw Power Query Excel workbook below.


As promised in my last post, I'm going to detail the process of connecting Power Query to Essbase and extracting data. Please keep in mind that I'm a Power Query neophyte. I'm sure there are better ways to accomplish some of the steps outlined below. Please post a comment with any advice/improvements, so we can all benefit.

First, let me start by saying that there is no magic involved. It's just rare that anyone on the Essbase side uses the requisite connection method - XMLA. This is likely because Essbase does a terrible job of implementing the XMLA protocol. It only does so partially, and even deviates from the standard in the parts it ostensibly supports. This is the reason why third-party tools are almost always unable to connect to Essbase - and after working with it for over a decade I think this is by design. Anyone coming from the full-featured, beautifully-implemented XMLA on the Microsoft Analysis Services side would naturally be completely stymied. For those used to Essbase, imagine using XMLA to create/modify cubes, measures, members, etc. This has been possible in MSAS since day one. Ok, enough of my Essbase XMLA ranting - for now anyway. On with the show...

The first piece of information you need is the XMLA url for your Essbase server. On my test server (running in VirtualBox) it looks like so: http://192.168.56.101:9000/aps/XMLA. You'll need to replace the 192.168.56.101 with the server's IP address or DNS name where Hyperion Provider Services is installed (this is usually not the same as the Essbase server). The port is 9000, which is standard for compact (i.e. development) installs, but is usually 13080 or 19000 in production installs. You may need to contact your Essbase administrator if none of the above gives you similar output (your version number could obviously be different) when entered in a web browser:



Power Query supports Web queries, which, of course, is what XMLA is (being SOAP). The key, therefore, is knowing the correct SOAP envelope attributes, as well as the Security node's wsse syntax to pass to the Essbase XMLA service. I've highlighted below the items most likely to cause an issue when trying to connect:
Xml.Tables(Web.Contents("http://192.168.56.101:9000/aps/XMLA",
  [
   Headers = [#"Content-Type"="text/xml; charset=utf-8"],
   Content=Text.ToBinary("
    <SOAP-ENV:Envelope xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/'
 xmlns:wsse='http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd'
 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
 <SOAP-ENV:Header>
   <Security xmlns='wsse'>
     <UsernameToken>
       <Username>admin</Username>
       <Password>password123</Password>
     </UsernameToken>
   </Security>
 </SOAP-ENV:Header>
 <SOAP-ENV:Body>
 <Execute>
 <Command>
   <Statement>
    SELECT NON EMPTY { crossjoin([Year].members, {[Actual]}) } ON COLUMNS, 
      NON EMPTY {crossjoin(Descendants([Profit]), crossjoin(Descendants([South]), [Product].members)) } ON ROWS 
    FROM Sample.Basic WHERE ([Can])
   </Statement>
 </Command>
 <Properties>
   <PropertyList>
     <DataSourceInfo>Essbase;Data Source=epm</DataSourceInfo>
     <Content>SchemaData</Content>
     <Catalog>Sample</Catalog>
     <Format>Multidimensional</Format>
     <AxisFormat>TupleFormat</AxisFormat>
     <Content>SchemaData</Content>
     <Timeout>30000</Timeout>
   </PropertyList>
 </Properties>
 </Execute>
 </SOAP-ENV:Body>
 </SOAP-ENV:Envelope>
  ")
 ]
)),

NOTES: The "Data Source" in the above <DataSourceInfo> element is the name of the Essbase server. The "Catalog" is the Essbase application, which must be the same as the one referenced in the MDX query in the <Statement> element.

Power Query provides excellent XML parsing capabilities. I've always had to use other languages (Java, Scala, JavaScript even) to parse XMLA's SOAP responses, but PQ handles it right out of the box. Navigating an XMLA response is as easy as pointing and clicking: 


XMLA returns data and metadata separately. In the above you can see Axes (metadata) and CellData. The hard part is marrying up the two of them. Specifically, CellData is returned like so:


Each data point has a CellOrdinal, which is not a row or column number. With all things XMLA, I always use the Microsoft documentation, which is orders of magnitude better, and more complete, than Oracle's. Here is the CellOrdinal explanation: "The axis reference for a cell can be calculated based on a CellOrdinal attribute value. Conceptually, cells are numbered in a dataset as if the dataset were a p-dimensional array, where p is the number of axes. Cells are addressed in row-major order." Yep, even that is better than Oracle's explanation, because there is no Essbase documentation at all of CellOrdinals.

The M code is pretty self-explanatory, so I'll let it do the talking below. The part dealing with CellOrdinals starts with the SetColumnNumberstep and goes up to, but doesn't include, MergeAxis0.

let
Source = Xml.Tables(Web.Contents(fnGetParameter("Provider Services URL"),
   [
   Headers = [#"Content-Type"="text/xml; charset=utf-8"],
   Content=Text.ToBinary("
    <SOAP-ENV:Envelope xmlns:SOAP-ENV='http://schemas.xmlsoap.org/soap/envelope/'
 xmlns:wsse='http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd'
 xmlns:xsd='http://www.w3.org/2001/XMLSchema' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'>
 <SOAP-ENV:Header>
   <Security xmlns='wsse'>
     <UsernameToken>
       <Username>" & fnGetParameter("Username") & "</Username>
       <Password>" & fnGetParameter("Password") & "</Password>
     </UsernameToken>
   </Security>
 </SOAP-ENV:Header>
 <SOAP-ENV:Body>
 <Execute>
 <Command>
   <Statement>
     " & fnGetParameter("MDX Query") & "
   </Statement>
 </Command>
 <Properties>
   <PropertyList>
     <DataSourceInfo>Essbase;Data Source=" & fnGetParameter("Essbase Server") & "</DataSourceInfo>
     <Content>SchemaData</Content>
     <Catalog>" & fnGetParameter("Essbase Application") & "</Catalog>
     <Format>Multidimensional</Format>
     <AxisFormat>TupleFormat</AxisFormat>
     <Content>SchemaData</Content>
     <Timeout>30000</Timeout>
   </PropertyList>
 </Properties>
 </Execute>
 </SOAP-ENV:Body>
 </SOAP-ENV:Envelope>
  ")
 ]
)),
Body = Source{0}[Table],
xmlAnalysis = Body{0}[Table],
ExecuteResponse = xmlAnalysis{0}[Table],
return = ExecuteResponse{0}[Table],
mddataset = return{0}[Table],
root = mddataset{0}[Table],
Axes = root{2}[Table],
Axis = Axes{0}[Table],
AxisTextType = Table.TransformColumnTypes(Axis,{{"Attribute:name", type text}}),
ExpandTuples = Table.ExpandTableColumn(AxisTextType, "Tuples", {"Tuple"}, {"Tuples.Tuple"}),
AxisOrdinalMap = Table.ExpandTableColumn(ExpandTuples, "Tuples.Tuple", {"Member", "Attribute:Ordinal"}, {"Tuples.Tuple.Member", "Tuples.Tuple.Attribute:Ordinal"}),
AddDimensionMembers = Table.AddColumn(AxisOrdinalMap, "DimensionMembers", each Text.Combine([Tuples.Tuple.Member][UName], ",")),
AxesTable = Table.AddColumn(AddDimensionMembers, "DimensionAliases", each Text.Combine([Tuples.Tuple.Member][Caption], ",")),
Axis0 = Table.SelectRows(AxesTable, each ([#"Attribute:name"] = "Axis0")),
Axis1Text = Table.SelectRows(AxesTable, each [#"Attribute:name"] <> "Axis0" and [#"Attribute:name"] <> "SlicerAxis"),
Axis1 = Table.TransformColumnTypes(Axis1Text,{{"Tuples.Tuple.Attribute:Ordinal", type number}}),
SlicerAxisText = Table.SelectRows(AxesTable, each ([#"Attribute:name"] = "SlicerAxis")),
SlicerAxis = Table.TransformColumnTypes(SlicerAxisText,{{"Tuples.Tuple.Attribute:Ordinal", type number}}),
CellData = root{3}[Table],
Cells = Table.ExpandTableColumn(CellData, "Table", {"Value", "FmtValue", "Attribute:CellOrdinal"}, {"Table.Value", "Table.FmtValue", "Table.Attribute:CellOrdinal"}),
#"Changed Type" = Table.TransformColumnTypes(Cells,{{"Table.Attribute:CellOrdinal", type number}}),
SetColumnNumber = Table.AddColumn(#"Changed Type", "Axis0", each Number.Mod([#"Table.Attribute:CellOrdinal"], Table.RowCount(Axis0))),
GroupedRows = Table.Group(SetColumnNumber, {"Axis0"}, {{"ColumnNum", each _, type table}}),
AddedClusteredIndex = Table.TransformColumns(GroupedRows, {"ColumnNum", each Table.AddIndexColumn(_,"ClusteredIndex",0,1)}),
ExpandClusteredColumn = Table.ExpandTableColumn(AddedClusteredIndex, "ColumnNum", {"Table.Value", "Table.Attribute:CellOrdinal", "ClusteredIndex"}, {"ColumnNum.Table.Value", "ColumnNum.Table.Attribute:CellOrdinal", "ColumnNum.ClusteredIndex"}),
#"Sorted Rows" = Table.Sort(ExpandClusteredColumn,{{"ColumnNum.Table.Attribute:CellOrdinal", Order.Ascending}}),
#"Removed Columns1" = Table.RemoveColumns(#"Sorted Rows",{"ColumnNum.Table.Attribute:CellOrdinal"}),
#"Changed Type2" = Table.TransformColumnTypes(#"Removed Columns1",{{"Axis0", type text}}),
MergeAxis0 = Table.NestedJoin(#"Changed Type2",{"Axis0"}, Axis0, {"Tuples.Tuple.Attribute:Ordinal"},"Columns",JoinKind.LeftOuter),
ExpandColumns = Table.ExpandTableColumn(MergeAxis0, "Columns", {"DimensionMembers", "DimensionAliases"}, {"Columns.DimensionMembers", "Columns.DimensionAliases"}),
MergeAxis1 = Table.NestedJoin(ExpandColumns,{"ColumnNum.ClusteredIndex"}, Axis1,{"Tuples.Tuple.Attribute:Ordinal"},"Rows",JoinKind.LeftOuter),
ExpandRows = Table.ExpandTableColumn(MergeAxis1, "Rows", {"DimensionMembers", "DimensionAliases"}, {"Rows.DimensionMembers.1", "Rows.DimensionAliases.1"}),
AddSlicerMemberNames = Table.AddColumn(ExpandRows, "SlicerMemberNames", each SlicerAxis[DimensionMembers]),
ExpandSlicerMemberNames = Table.ExpandListColumn(AddSlicerMemberNames, "SlicerMemberNames"),
AddSlicerMemberAliases = Table.AddColumn(ExpandSlicerMemberNames, "SlicerMemberAliases", each SlicerAxis[DimensionAliases]),
ExpandSlicerMemberAliases = Table.ExpandListColumn(AddSlicerMemberAliases, "SlicerMemberAliases"),
result = Table.RemoveColumns(ExpandSlicerMemberAliases,{"Axis0", "ColumnNum.ClusteredIndex"})
in
    result

The code can handle any combination of dimensions in the ROWS, COLUMNS, or SLICER axes. I haven't tested it with PAGE or CHAPTER, however. Also, it will return the "Default" alias, since XMLA returns this by default - whether you specify it in an axis or not. But if you use dimension properties to pull in other alias tables or dimension information (like LEVEL_NUMBER or GEN_NUMBER), you will need to modify the AxesTable step from [Caption] to the property name. 

The fnGetParameter query referenced above was taken verbatim from Ken Puls awesome Excelguru blog. It allows you to store variables and their values in an Excel sheet, then reference them from Power Query.

The code above generates the following output to Excel:



Finally, I've included a workbook containing all of the above that you can download here. Please let me know if you have suggestions for how to make the code better. I am astounded by how easy Power Query has made working with XMLA. At any step in the process above you can easily see what Power Query is doing. I will definitely be using PQ as my go-to Extract and Transform (of ETL) in the future.

No comments:

Post a Comment