10 Jun, 2024 - About 5 minutes
minIO
Intro
MinIO is an object storage solution that provides an Amazon Web Services S3-compatible API and supports all core S3 features. MinIO is built to deploy anywhere - public or private cloud, baremetal infrastructure, orchestrated environments, and edge infrastructure.
Features
Install
Installed the Arch package with pacman but check the documentation for other methods based on your system.
sudo pacman -S minio |
Launch the MinIO Server
Run the following command from the system terminal or shell to start a local MinIO instance using the ./data
folder to store the data.
mkdir data |
You should be granted with information on the endpoints for the API, WebUI and CLI.
Testing locally
Tofu
The following repo provides one example where you can setup as code the required containers using the minio terraform provider.
Clone the following repo which will create a sample bucket and a dummy text_file
git clone git@github.com:rramos/tofu-minio.git |
After applying that plan you will have a bucket called state-terraform-s3
with a object text.txt
.
Let’s use other options now.
Console
The MinIO Console is a rich graphical user interface that provides similar functionality to the mc
or mcli
command line tool.
You can access it view browser
NOTE: The port used by MinIO depends on the configuration specified when you started the service. To determine the port, check the output of the server startup command
CLI
The MinIO Client mc
or mcli
command line tool provides a modern alternative to UNIX commands like ls, cat, cp, mirror, and diff with support for both filesystems and Amazon S3-compatible cloud storage services.
You should setup alias for your services (Obtain the service key and secret from the console)
mcli alias set myio http://192.168.1.178:9000 ACCESS_KEY SECRET_KEY |
After that we can check if our service is operational with
mcli admin info myio |
You should have a similar output
$ mcli admin info myio |
We can now use the cli to execute tradicional operations
mcli cp file.txt myio/state-terraform-s3/file.txt |
The full list of commands is available on the following URL
API
In this example we will be copying data though the API using the minio module, let’s start by installing the required pip package.
pip install minio |
Next generate a key/secret in the console and update the following python code minio_cp.py
from minio import Minio |
You can test by generating a new file2.txt
and executing the code.
echo "This is file2.txt > file2.txt" |
TODO: This example is merely explanatory and doesn’t use secure connection. It is not advisable to run it in production.
Example
one example with DuckDB creating a table pointing for that storage where sample_data.csv
file exists
CREATE SECRET secret1 ( |
NOTE: Replace the ACCESS_KEY and SECRET from the ones provided in the console.
Replication
MinIO supports server-side and client-side replication of objects between source and destination buckets.
Server-Side Bucket Replication
Configure per-bucket rules for automatically synchronizing objects between MinIO deployments. The deployment where you configure the bucket replication rule acts as the “source” while the configured remote deployment acts as the “target”
Client-side Bucket Replication
Use the command process to synchronize objects between buckets within the same S3-compatible cluster or between two independent S3-compatible clusters
Replication of Delete Operations
MinIO supports replicating delete operations, where MinIO synchronizes deleting specific object versions and new delete markers. Delete operation replication uses the same replication process as all other replication operations.
Synchronous vs Asynchronous Replication
MinIO supports specifying either asynchronous (default) or synchronous replication for a given remote target.
Replication Process
MinIO uses a replication queuing system with multiple concurrent replication workers operating on that queue. MinIO continuously works to replicate and remove objects from the queue while scanning for new unreplicated objects to add to the queue.
Conclusion
MinIO grew to become the most broadly deployed object store on the planet by focusing on what mattered the most to developers, architects and applications. The service can be configured in several ways depending on the scalability, replication and level of service that you want for your solution design. Also brings the advantage of developers being able to deploy it locally and later extend for a more robust solution or even pay for that service if you don’t want to deal with this type of operation.
The integration with several solutions and support for several APIs show strong potential for being a Big Player on the Lakehouse Architecture. Recommend the reading of the article DuckDB and MinIO for a Modern Data Stack if you like to know more.
MinIO offers a variety of replication options, making it an excellent choice for multi-cloud solutions. These replication capabilities ensure high availability, data durability, and efficient disaster recovery across different cloud environments.