Datasource

A Datasource is responsible for taking a query input string and returning its result in the form of a pandas.DataFrame instance. One instance of a Datasource may be used by multiple View instances.

Datasources are only initialized when they are expiclitly declared within the report configuration. When the Datasource is initialized, any arguments included in the report configuration are passed as keyword arguments to its Datasource.init() function (Note: this is not the same as Python’s built-in __init__().) Each Datasource is required to provide both Datasource.init(), which is responsible for setting up the Datasource with any additional state, and Datasource.query(), which provides the mechanism to execute a given query and return a pandas.DataFrame with the results back to the caller.

To create your own Datasource, it is simplest to extend the Datasource class, though this is not required. In is required to return a pandas.DataFrame instance, as this is the API contract with the View layer; it allows Views to use a variety of Datasources as seamlessly as possible.

Example: a custom Datasource

This Datasource will interact with a HTTP/JSON API as its backing data store. The input query string is passed to the API as a query parameter. The JSON result is parsed into a pandas.DataFrame via pandas.DataFrame.from_records().

import pandas as pd
from kpireport.datasource import Datasource
import requests

class HTTPDatasource(Datasource):
    def init(self, api_host=None):
        if not api_host:
            raise ValueError("'api_host' is required")
        self.api_host = api_host

    def query(self, input):
        res = requests.get(self.api_host, params=dict(query=input))
        return pd.DataFrame.from_records(res.json())

As with all plugins, your plugin should register itself as an entry_point under the namespace kpireport.datasource. We name it http in this example. When your plugin is installed alongside the kpireport package, it should be automatically loaded for use when the report runs.

[options:entry_points]
kpireport.datasource =
    http = custom_module:HTTPDatasource

You can configure your Datasource in a report by declaring it in the datasources section. The plugin name must match the entry_point name (here, http.) Any arguments passed in the args key are passed to Datasource.init() on instantiation as keyword arguments.

datasources:
  custom_datasource:
    plugin: http
    args:
      api_host: https://api.example.com/v1/search

You can name the Datasource as you wish (here, custom_datasource); Views can invoke your Datasource via this ID.

Example: an RPC Datasource

Instead of passing the query input to another service/database, it is possible to implement your own Datasource that provides an RPC-like interface. This can allow you to custom-tailor your parsing and transformation of the result returned by the backing service, as well as create more complex “queries” that compose multiple calls to the backing service. You could even front multiple backing services as a single Datasource interface. Here is a fully-formed example:

from datetime import datetime, timedelta
import pandas as pd
from kpireport.datasource import Datasource
import requests

class RPCDatasource(Datasource):
    def init(self, users_api_host=None, activity_api_host=None):
        if not (users_api_host and activity_api_host):
            raise ValueError((
              "Both 'users_api_host' and 'activity_api_host' "
              "are required."))
        self.users_api_host = users_api_host
        self.activity_api_host = activity_api_host

    def query(self, input):
        """
        Treats the "input" parameter as the name of a separate function
        on the Datasource to invoke.
        """
        fn = getattr(self, input)
        if not (fn and iscallable(fn)):
            raise ValueError(f"Query '{input}' is not supported")
        return fn()

    def get_all_users(self):
        """
        Return all users, regardless of their activity status.
        """
        users = requests.get(f"{self.users_api_host}/users")
        return pd.DataFrame.from_records(users)

    def get_active_users(self):
        """
        Return only the users active within the last month.
        """
        users = requests.get(f"{self.users_api_host}/users")
        user_ids = [u["id"] for u in users.json()]
        last_active_at = requests.get(
          f"{self.activity_api_host}/last_activity",
          params=dict(
            user_ids=user_ids.join(",")
          )
        )
        one_month_ago = datetime.now() - timedelta(months=1)
        active_in_last_month = [
          a["user_id"]
          for a in last_active_at.json()
          if datetime.fromisoformat(a["last_active_at"]) > one_month_ago
        ]
        return pd.DataFrame.from_records([
          u for u in users if u["id"] in active_in_last_month])

    def total_active_users(self):
        """
        Return just the total number of active users, not a full set
        of columnar data.
        """
        return self.get_active_users().count()

From within a View, you could then invoke your Datasource like this (we have given the Datasource the ID users here.)

def render_html(self, j2):
    active_users = self.datasources.query("users", "get_active_users")
    # Do something with active users

Or, using an existing plugin, like kpireport.plugins.plot.SingleStat, which can be configured with just the report configuration:

views:
  active_users:
    title: Users active in last month
    plugin: single_stat
    args:
      datasource: users
      query: total_active_users

Module: kpireport.datasource

class kpireport.datasource.Datasource(report, **kwargs)
Parameters
  • report (kpireport.report.Report) – the Report object.

  • id (str) – the Datasource ID declared in the report configuration.

  • **kwargs

    Additional datasource parameters, declared as args in the report configuration.

abstract init(**kwargs)

Initialize the datasource from the report configuration.

Parameters

**kwargs

Arbitrary keyword arguments, declared as args in the report configuration.

abstract query(input: str) pandas.core.frame.DataFrame

Query the datasource.

Parameters

input (str) – The query string.

Returns

The query result.

Return type

pandas.DataFrame

exception kpireport.datasource.DatasourceError

A base class for errors originating from the datasource.