24 Jun, 2024 - About 4 minutes
Apache Superset
Intro
In this article I will go through the process to setup locally superset and work with duckdb datasets.
Steps
- Install and configure a functional Superset instance
- Create a local or memory duckdb database with sample data
- Validate the working datasets
We will be using the provided datasets from superset examples which are in json and csv and create duckdb tables with those.
Quickstart
pyenv install 3.7 |
Docker
If you would like to test with docker version it will brings most of the plugins and would be quicker than to setup the python environment correctly.
For that you can follow this steps:
Download superset
git clone https://github.com/apache/superset |
Start the docker containers with docker-compose
# Enter the repository you just cloned |
Now head over to http://localhost:8088 and log in with the default created account:
username: admin |
Local Python Installation
I would like to do some tests with duckdb and the existing docker image does not bring the required python packages for that as such, I went through the hard path.
First let’s create a Python environment for 3.9.
pyenv install 3.9 |
Download superset code
git clone git@github.com:apache/superset.git |
Install python packages
pip install --upgrade pip |
Let’s configure the default env
mkdir data |
Generate the superset_config.py
file with the following content:
# Superset specific config |
NOTE: Make sure to replace PATH_TO_YOUR_SUPERSET and SECRET_KEY values. You can generate a new secret with the following command openssl rand -base64 42
Create the data folder and start the initialization commands
mkdir data |
You can now run superset with the command:
superset run -p 8088 --with-threads --reload |
Now you just need to access http://localhost:8088/ and start creating you datasets and charts.
Loading Data
When you run the command superset load_examples
it loads example data and several charts and dashboards which allow you to explore the tool.
You can also load this data into duckdb to try creating from those datasets
git clone git@github.com:apache-superset/examples-data.git |
And run the following sql after starting duckdb
create table airports as from READ_CSV_AUTO('./airports.csv.gz'); |
In order to test connection add a new database connection using
duckdb:///local.duckdb |
Or if you are using in-memory
duckdb:///:memory: |
You can test connectivity going to Settings -> Database Connections
Next Steps
- Deep dive on the functionalities of Superset as the tool seems extremely complete on chart possibilities.
- Test a with a dataset in DubckDB hybrid mode
Conclusion
Superset is an exceptionally powerful tool for Business Intelligence, offering numerous graphical options and configurations, including 3D charts. It is lightweight and fast, with an integrated API and support for various database connections. This article guides you through setting up a local environment, primarily for testing with DuckDB.
Using Docker is the most efficient way to start and explore Superset. However, setting it up directly with Python offers greater flexibility for testing other connectors, although managing Python dependencies can be challenging. Superset is designed to be configured for cloud environments using Kubernetes, so these steps are not intended for production environments. For a production setup, refer to the official documentation, which is very comprehensive.