rramos.github.io

10 Jun, 2024 - About 5 minutes

minIO

Intro

MinIO is an object storage solution that provides an Amazon Web Services S3-compatible API and supports all core S3 features. MinIO is built to deploy anywhere - public or private cloud, baremetal infrastructure, orchestrated environments, and edge infrastructure.

Features

Install

Installed the Arch package with pacman but check the documentation for other methods based on your system.

sudo pacman -S minio

Launch the MinIO Server

Run the following command from the system terminal or shell to start a local MinIO instance using the ./data folder to store the data.

mkdir data
minio server ./data

You should be granted with information on the endpoints for the API, WebUI and CLI.

Testing locally

Tofu

The following repo provides one example where you can setup as code the required containers using the minio terraform provider.

Clone the following repo which will create a sample bucket and a dummy text_file

git clone git@github.com:rramos/tofu-minio.git
cd tofu-minio
tofu init
tofu plan
tofu apply

After applying that plan you will have a bucket called state-terraform-s3 with a object text.txt.

Let’s use other options now.

Console

The MinIO Console is a rich graphical user interface that provides similar functionality to the mc or mcli command line tool.
You can access it view browser

NOTE: The port used by MinIO depends on the configuration specified when you started the service. To determine the port, check the output of the server startup command

CLI

The MinIO Client mc or mcli command line tool provides a modern alternative to UNIX commands like ls, cat, cp, mirror, and diff with support for both filesystems and Amazon S3-compatible cloud storage services.

You should setup alias for your services (Obtain the service key and secret from the console)

mcli alias set myio http://192.168.1.178:9000 ACCESS_KEY SECRET_KEY

After that we can check if our service is operational with

mcli admin info myio

You should have a similar output

$ mcli admin info myio
● 192.168.1.178:9000
Uptime: 10 minutes
Version: 2024-06-06T09:36:42Z
Network: 1/1 OK
Drives: 1/1 OK
Pool: 1

┌──────┬────────────────────────┬─────────────────────┬──────────────┐
│ Pool │ Drives Usage │ Erasure stripe size │ Erasure sets │
│ 1st │ 85.9% (total: 953 GiB) │ 1 │ 1 │
└──────┴────────────────────────┴─────────────────────┴──────────────┘

27 B Used, 1 Bucket, 1 Object, 1 Version
1 drive online, 0 drives offline, EC:0

We can now use the cli to execute tradicional operations

mcli cp file.txt myio/state-terraform-s3/file.txt

The full list of commands is available on the following URL

API

In this example we will be copying data though the API using the minio module, let’s start by installing the required pip package.

pip install minio

Next generate a key/secret in the console and update the following python code minio_cp.py

from minio import Minio
from minio.error import S3Error

# MinIO server credentials
MINIO_URL = "127.0.0.1:9000" # or your MinIO server URL
ACCESS_KEY = "ACCESS_KEY"
SECRET_KEY = "SECRET_KEY"

# Initialize the MinIO client
minio_client = Minio(
MINIO_URL,
access_key=ACCESS_KEY,
secret_key=SECRET_KEY,
secure=False # Set to False if you are not using HTTPS
)

# File to upload
file_path = "./file2.txt"
bucket_name = "state-terraform-s3"
object_name = "file2.txt"

# Ensure the bucket exists
found = minio_client.bucket_exists(bucket_name)
if not found:
minio_client.make_bucket(bucket_name)
else:
print(f"Bucket '{bucket_name}' already exists")

# Upload the file
try:
minio_client.fput_object(
bucket_name, object_name, file_path
)
print(f"'{file_path}' is successfully uploaded as '{object_name}' to bucket '{bucket_name}'.")
except S3Error as err:
print(f"Error occurred: {err}")

You can test by generating a new file2.txt and executing the code.

echo "This is file2.txt > file2.txt"
python minio_cp.py

TODO: This example is merely explanatory and doesn’t use secure connection. It is not advisable to run it in production.

Example

one example with DuckDB creating a table pointing for that storage where sample_data.csv file exists

CREATE SECRET secret1 (
TYPE S3,
KEY_ID 'ACCESS_KEY',
SECRET 'SECRET_KEY',
REGION 'us-east-1',
ENDPOINT 'localhost:9000',
URL_STYLE 'path',
USE_SSL false
);

SELECT * FROM 's3://state-terraform-s3/sample_data.csv';

NOTE: Replace the ACCESS_KEY and SECRET from the ones provided in the console.

Replication

MinIO supports server-side and client-side replication of objects between source and destination buckets.

Server-Side Bucket Replication

Configure per-bucket rules for automatically synchronizing objects between MinIO deployments. The deployment where you configure the bucket replication rule acts as the “source” while the configured remote deployment acts as the “target”

Client-side Bucket Replication

Use the command process to synchronize objects between buckets within the same S3-compatible cluster or between two independent S3-compatible clusters

Replication of Delete Operations

MinIO supports replicating delete operations, where MinIO synchronizes deleting specific object versions and new delete markers. Delete operation replication uses the same replication process as all other replication operations.

Synchronous vs Asynchronous Replication

MinIO supports specifying either asynchronous (default) or synchronous replication for a given remote target.

Replication Process

MinIO uses a replication queuing system with multiple concurrent replication workers operating on that queue. MinIO continuously works to replicate and remove objects from the queue while scanning for new unreplicated objects to add to the queue.

Conclusion

MinIO grew to become the most broadly deployed object store on the planet by focusing on what mattered the most to developers, architects and applications. The service can be configured in several ways depending on the scalability, replication and level of service that you want for your solution design. Also brings the advantage of developers being able to deploy it locally and later extend for a more robust solution or even pay for that service if you don’t want to deal with this type of operation.

The integration with several solutions and support for several APIs show strong potential for being a Big Player on the Lakehouse Architecture. Recommend the reading of the article DuckDB and MinIO for a Modern Data Stack if you like to know more.

MinIO offers a variety of replication options, making it an excellent choice for multi-cloud solutions. These replication capabilities ensure high availability, data durability, and efficient disaster recovery across different cloud environments.

References

OLDER > < NEWER