Power BI Embedded Analytics — Part 1.1: Power BI Authoring Data Federation
This article is part of a four-part series. It covers data federation, including sources, connectors, connectivity, datasets, data modelling, data refresh, etc.
Join the DZone community and get the full member experience.
Join For FreeNote: This article dives into the details of Power BI Authoring Data Federation. This is a sub-part of the Power BI Embedded Analytics — Part 1 article, where Power BI Authoring and introduction to Power BI Embedded Analytics are discussed.
So, going through the discussion on Power BI Embedded Analytics — Part 1 is recommended.
You can also find Part 2, which covers Power BI Embedded, here.
Data Federation
Data Source
Data Sources that can be used with Power BI are categorized as follows:
- File
- Database
- Microsoft Fabric (Preview)
- Power Platform
- Azure
- Online Services
- Other
Types and List
A list of data sources is available here. A list of all the Power BI connectors that can be used to connect to the data sources is listed here.
Custom Connectors
The Power Query SDK is a set of tools designed to help you create Power Query connectors. These connectors are often referred to as custom connectors or Power Query extensions. Custom connectors let you create new data sources or customize and extend an existing source. Common use cases include:
- Creating a business analyst-friendly view for a REST API
- Providing branding for a source that Power Query supports with an existing connector (such as an OData service or ODBC driver)
- Implementing OAuth v2 authentication flow for a SaaS offering
- Exposing a limited or filtered view of your data source to improve usability
- Enabling DirectQuery for a data source using an ODBC driver. For further references, check out these links:
Connectivity Modes
Data source connection types primarily are:
Import (The Default Mode)
All data is imported/copied into Power BI, which is the fastest in terms of performance since all data resides with Power BI. This also requires polishing and cleansing the data, preferably following the star schema, since the dataset size is limited based on the subscription/license.
Key points:
- There's no concept of an Import model being partially loaded into memory. The entire model must be loaded to memory before Power BI can query the model, which can place pressure on available capacity resources, especially as the number and size of Import models grow.
- Model data is only as current as the latest refresh, and so Import models need to be refreshed, usually on a scheduled basis.
Direct Query
This mode is an alternative to Import mode. Models developed in DirectQuery mode don't import data. Only the schema is copied over; the data stays with the data source. Not all data sources support this.
Key points:
- When data volumes are too large, even when data reduction methods are applied, to load into a model, or, practically, to refresh.
- When reports and dashboards need to deliver near real-time data, beyond what can be achieved within scheduled refresh limits, scheduled refresh limits are eight times a day for shared capacity, and 48 times a day for Premium capacity.
- Import model size limits don't apply.
- Models don't require scheduled data refresh.
- Power Query/Mashup expressions and DAX formulas are limited to using only functions that can be transposed to native queries understood by the data source.
Composite
Composite mode can mix Import and DirectQuery modes, or integrate multiple DirectQuery data sources. Models developed in Composite mode support configuring the storage mode for each model table. The table storage mode can be configured as Import, DirectQuery, or Dual.
Live Connection
Live connection is a method that lets you build a report in Power BI Desktop without having to build a dataset for it. When you create your report in Power BI Desktop, it can be connected to an already published Power BI dataset. A live connection allows you to rely on existing data, which can be updated without accessing the report. With a live connection, we can connect to one of the following data sources:
- A dataset that already exists in the Power BI service
- An Azure Analysis Services (AAS) database
- An on-premises instance of SQL Server Analysis Services (SSAS)
The capability of a connector and the connection mode supported can be checked in the respective connector documentation page, as an example for MongoDB Atlas.
MongoDB Datasource Connectivity Walkthrough
For the purpose of this demonstration, we will use a MongoDB Atlas SQL connector against an Atlas federated database, which includes our data from a MongoDB Atlas cluster. We will use MongoDB Cloud with a free shared-tier cluster for configuring and setting up the MongoDB database and the Atlas federated database.The following steps demonstrate the process of connecting to a MongoDB database:
1. Sign up and log in to the MongoDB cloud (https://cloud.mongodb.com/).
2. Under Deployment -> Database, create a new cluster like the one below.
3. Move to the Collections tab and create a new database.
3. Import the data (from CSV or other sources).
4. Once the cluster is ready, click on Connect and choose Atlas SQL.
5. Use the Power BI Connector and copy the connection string that will be used in Power BI to connect to this instance.
6. Next, open Power BI Desktop and follow the Get Data wizard. Select Database from the categories on the left, select MongoDB Atlas SQL, and then select Connect.
7. In the MongoDB Atlas SQL window that appears, fill in the MongoDB URL and the federated Database name.
8. Next, enter your Atlas MongoDB Database access username and password, and select Connect.
Next, in the Navigator, select one or multiple elements to import and use in Power BI Desktop. Then select either Load to load the table in Power BI Desktop, or Transform Data to open the Power Query editor, where you can filter and refine the set of data you want to use, and then load that refined set of data into Power BI Desktop.
Once the data is imported into the dataset, save the file as a pbix file. Click Publish to publish to Power BI Service and resume authoring reports/dashboards against the created dataset in the Power BI Service (Web).
Dataset
A dataset is a collection of data that you import or connect to. Power BI lets you connect to and import all sorts of data sources and bring all of it together in one place. Dataset and data source are terms that are used synonymously, but they are two different things. A dataset contains information about the data source and data source credentials.
The dataset also often includes a subset of data copied from the data source. Within a dataset, we can mix-match different connection types against data sources, resulting in a composite model. Datasets are associated with workspaces, and a single dataset can be part of many workspaces. For each workspace, the listed dataset is a source of data available for one or more reports, and the dataset can contain data that comes from one or more sources.
Data Modeling
Data modeling in Power BI is a critical process that involves shaping and organizing your data to create a logical structure that can be used for analysis and reporting. It is a fundamental step in building effective and insightful reports and dashboards. Power BI has a dedicated Data Modeling view, which allows you to perform data modeling operations. Key concepts and steps in modeling:
Data Source Connection
Connect Power BI to data source(s) and import data.
Data Transformation
After importing data, Power Query (which is integrated into Power BI) can be used to perform data transformation tasks. Power Query Editor provides a user-friendly interface for cleaning, shaping, and preparing your data. You can filter rows, remove duplicates, pivot and unpivot data, merge tables, and apply various data transformations to make the data more usable.
Power Query is a tool for extract-transform-load (ETL), i.e., it lets you import and prepare your data for use. Contrasting with DAX, DAX is another programming language available for Power BI. DAX is used for data analysis rather than ETL. Power Query Editor is the graphical user interface to Power Query. Power Query M ("M" for short) is the functional programming language used in Power Query.
Custom/Calculated Data Entities
-
Custom/calculated tables: Tables are created in the dataset after importing data into the model from an external data source(s). But calculated tables let you add new tables based on data you've already loaded into the model. Instead of querying and loading values into your new table's columns from a data source, you create a Data Analysis Expressions (DAX) formula to define the table's values. Calculated tables are best for intermediate calculations and data you want to store as part of the model, rather than calculating on the fly or as query results. Calculated tables behave the same way as regular tables in terms of the applicability of operations on tables. For example, we can do a union or cross-join two existing tables and have a calculated table that represents those results.
-
Custom/calculated columns: These are columns you create in a table based on calculations or expressions. Calculated columns are computed during data import and are useful for adding additional data to your model. Calculated columns are created during the data import process. They are calculated and stored in the model, which can increase the memory footprint. Calculated columns can be thought of as static values associated with rows in a table.
-
Custom/calculated measures: Measures are calculations that are computed on the fly when used in reports and visuals. Measures are defined using the DAX (Data Analysis Expressions) language, and they enable you to perform aggregations, calculations, and KPIs. Measures are computed on the fly when you use them in visualizations, such as charts and tables. This means that they are not precomputed during data import and consume less memory. There are basically two types of measures:
-
Implicit measures: Implicit measures use a column from a data table by dragging the field into a visual in Power BI. They allow you to calculate Sum, Count, Average, Min, Max, and DistinctCount. When a field is in the values section, select the dropdown list to determine which summarization calculation you want to perform on the field. These types of measures work for basic tables and summaries, but are limited compared to explicit measures.
-
Explicit measures: Explicit measures require you to use the DAX formula language to explicitly write out the expression. These measures are better over an extended time and will allow you to create custom analytical metrics like Profit Margin YTD. You can create explicit measures by writing a formula in the editor. These measures offer the most flexibility and give you the power to use all the capabilities of DAX.
-
Relationships and Hierarchy
Relationships
- Auto-detect: Power BI Desktop attempts to find and create relationships for you. The relationship options Cardinality, Cross filter direction, and Make this relationship active are automatically set. Power BI Desktop looks at column names in the tables you're querying to determine if there are any potential relationships. If there are, those relationships are created automatically. If Power BI Desktop can't determine with a high level of confidence that there's a match, it doesn't create the relationship.
- Manual:
- On the Modeling tab, select Manage relationships > New.
- In the Create relationship dialog box, in the first table drop-down list, select a table. Select the column you want to use in the relationship.
- In the second table drop-down list, select the other table you want in the relationship. Select the other column you want to use, and then select OK By default, Power BI Desktop automatically configures the options Cardinality (direction), Cross filter direction, and Make this relationship active for the new relationship. However, these settings can be changed if necessary.
- Relationship options:
- Cardinality: The Cardinality option can have one of the following settings:
- Many to one (*:1): A many-to-one relationship is the most common, default type of relationship. It means the column in a given table can have more than one instance of a value, and the other related table, often known as the lookup table, has only one instance of a value.
- One to one (1:1): In a one-to-one relationship, the column in one table has only one instance of a particular value, and the other related table has only one instance of a particular value.
- One to many (1:*): In a one-to-many relationship, the column in one table has only one instance of a particular value, and the other related table can have more than one instance of a value.
- Many to many (:): With composite models, you can establish a many-to-many relationship between tables, which removes requirements for unique values in tables. It also removes previous workarounds, such as introducing new tables only to establish relationships.
- Cross filter direction: The Cross filter direction option can have one of the following settings:
- Both: For filtering purposes, both tables are treated as if they're a single table. The Both setting works well with a single table that has many lookup tables that surround it. An example is a sales actuals table with a lookup table for its department. This configuration is often called a star schema configuration (a central table with several lookup tables). However, if you have two or more tables that also have lookup tables (with some in common), then you wouldn't want to use the Both setting. To continue the previous example, in this case, you also have a budget sales table that records the target budget for each department. And, the department table is connected to both the sales and the budget table. Avoid the Both setting for this kind of configuration.
- Single: The most common, default direction, which means filtering choices in connected tables work on the table where values are being aggregated.
- Make this relationship active: When checked, the relationship serves as the active, default relationship. In cases where there's more than one relationship between two tables, the active relationship provides a way for Power BI Desktop to automatically create visualizations that include both tables.
- Cardinality: The Cardinality option can have one of the following settings:
- Manual:
Hierarchy
Hierarchies in Power BI are a way to organize data (related fields within a dataset) in a hierarchical structure, where one parent or first level is ranked over the other. This allows users to drill down from parent levels to lower levels in a specific order. Power BI Desktop can automatically detect some of these hierarchical relationships, especially in the case of date-related data.
Another common example of a hierarchy is a structure with a category, subcategory, and product. The following example showcases a BusinessUnits table where we can define hierarchy between the fields of this table.
This table has a hierarchy with three levels. The top-level is Division, followed by Group, and then BusinessUnit. This means that several BusinessUnits below belong to one Group, and several Groups belong to one Division.
- Data types: Here are some of the common data types in Power BI:
- Binary
- True/false
- Fixed decimal number
- Date
- Date/time
- Decimal number
- Text
- Time
- Whole number
- Data refresh: Data refresh is the process of updating the data in your Power BI report from the original data sources. It ensures that the information in your report is current and accurate. Data refresh is only applicable when the data source connection mode is Import; for other modes, Power BI is directly connected to the source, so no refresh is required. Data refresh can be done manually or scheduled to occur automatically. Here's how to perform a data refresh in Power BI:
- Manual refresh:
- Open your Power BI report in Power BI Desktop.
- Click on the "Refresh" button in the Home tab. This will update the data in your report based on the configured data sources.
- Scheduled refresh:
- For published reports and dashboards, you can set up scheduled refresh in the Power BI Service (Power BI Online).
- In Power BI Service, go to the dataset settings for your report.
- Configure the refresh frequency and specify the credentials for data sources that require authentication.
- Power BI Service will automatically refresh the data according to the schedule you define. The allowed frequency depends on the Power BI subscription.
- On-premises data gateway: If your data sources are located on-premises (behind a firewall), you can use the Power BI On-Premises Data Gateway to enable scheduled refresh for those data sources.
- Manual refresh:
The next article will dive into Power BI Embedded solutions, licensing, authentication, and APIs.
Opinions expressed by DZone contributors are their own.
Comments