Skip to main content

meltano

·475 words·3 mins·
ETL Data Engineering
Table of Contents

Meltano is a declarative data integration engine,

Intro
#

While Meltano is a declarative data integration engine, made for building data-powered features fast, one of the use cases has always been to use Meltano as an ELT platform.

Features
#

  • Meltano HUB as more than 600 connectors
  • Largest connector library of any EL tool
  • Modify connectors to your liking
  • In-flight filtering and hashing of PII
  • Detailed pipeline logs and alerting
  • Open source and cloud-agnostic

Installation
#

Setup python environment
#

Configure a Python 3.11 environment and activate it

pyenv install 3.11
pyenv virtualenv 3.11 meltano
export PIP_REQUIRE_VIRTUALENV=true
pyenv activate meltano

For this option one needs to have pipx.

pipx install meltano

You should get something like

  installed package meltano 3.4.2, installed using Python 3.12.4
  These apps are now globally available
    - meltano
done! ✨ 🌟 ✨

You can check if meltano is working by executing meltano --version and get the version.

Testing
#

We’re going to take data from a “source”, namely GitHub, and extract a list of commits to one repository.

Create your Meltano Project
#

We need to initiative the project where we register the plugins and details of the pipelines.

meltano init meltano

You will get something similar

Creating .meltano folder
created .meltano in /home/rramos/Development/local/meltano-test/meltano/.meltano
Creating project files...
  meltano/
   |-- meltano.yml
   |-- README.md
   |-- requirements.txt
   |-- output/.gitignore
   |-- .gitignore
   |-- extract/.gitkeep
   |-- load/.gitkeep
   |-- transform/.gitkeep
   |-- analyze/.gitkeep
   |-- notebook/.gitkeep
   |-- orchestrate/.gitkeep
Creating system database...  Done!

Your project has been created!

Meltano Environments initialized with dev, staging, and prod.
To learn more about Environments visit: https://docs.meltano.com/concepts/environments

Next steps:
  cd meltano
  Visit https://docs.meltano.com/getting-started/part1 to learn where to go from here

Add an Extractor
#

Now lets add a plugin to extract data from Github.


pipx install git+https://github.com/MeltanoLabs/tap-github.git

We need to configure the plugin we are going to use the interactive mode

meltano config tap-github set --interactive
  • Update the value for the github auth token
  • Update the value of the start_date
  • Update the value of the repositories to consider

One can execute the following command to validate the plugin configuration meltano config tap-github.

Select Data to Extract
#

Now that the extractor has been configured we need to select which attributes to consider on the extraction

meltano select tap-github commits url
meltano select tap-github commits sha
meltano select tap-github commits commit_timestamp

This will add on meltano.yml the attributes to consider

Dummy Loader
#

Next lets add a dummy loader to dump the data into JSON

meltano add loader target-jsonl --variant=andyh1203

This target requires zero configuration, it just outputs the data into a jsonl file.

Conclusion
#

My initial impression is that the setup is quite messy and difficult to manage, especially when handling different Python versions. Each plugin may only be supported on a specific version, making it challenging to ensure compatibility across packages. This setup clearly requires some significant improvements.

References
#