pico.sh

Intro

We think of pico.sh as a hacker lab where we can experiment with new ways to interact with the web.

Features

Pico supports the following services:

  • prose.sh - A blog platform for hackers
  • pastes.sh - A pastebin for hackers
  • feeds.sh - An rss email notification service

The payed version called pico+ brings more service

pico+

Payed version of pico which brings extra services:

  • pgs.sh - 10GB asset storage
  • tuns.sh - Full access
  • imgs.sh - 2GB image registry storage
  • prose.sh - 1GB image storage
  • Beta access

Setup

Prose

Prose.sh - This service allows you to upload Github flavor markdown and it will generate the HTML content to display you just need to sync the data.

Create a post eg. ~/blog/hello-world.md

# hello world!

This is my first blog post.

Check out some resources:

- [pico.sh](https://pico.sh)
- [antoniomika](https://antoniomika.me)
- [bower.sh](https://bower.sh)

Cya!

And just publish with rsync

rsync ~/blog/* prose.sh:/

There are some special files you can setup to customize the css or add a footer

  • _styles.css
  • _footer.md

But that’s pretty much it

Check the following doc for more information

Pastes

You can also use the pastebin service

echo "foobar" | ssh pastes.sh

You can define a expiration

echo "foobar" | ssh pastes.sh FILENAME expires=2023-12-12

It will generate a url for your paste
eg: https://ruimsramos.pastes.sh/1709216080780412798

Conclusion

This article was about pico a hacker labs service as they advertise it.

It is extermely fast if you want to use a pastbin option to share some data or quickly upload in prose some markdown notes, when you don’t need to worry on setting up something fancing just to publish them and focus on the writing.

The pro service like tuns.hs and imgs.sh seem to be also powerfull. The later one if you ant to integrate with github actions for instance, but I didn’t evaluate that version

References

Parquet Compression

Introduction

Was reading this article where Philippe Rivière and Éric Mauvière optimized a 200GB Parquet data and prepare it to 549kB.

Now this work touch some very relevant points regarding Data Engineering procedures and best practises, I would suggest going on the article as it explains in detail what they applied in each stage and how.

Use Case

This new fascinating dataset just dropped on Hugging Face. French public domain newspapers 🤗 references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data. The data is stored in 320 large parquet files. The data loader for this Observable framework project uses DuckDB to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file.

Undoubtedly, this dataset proves immensely valuable for training and processing Language Model (LLM) models

Best Practises

I firmly believe that these best practices should be applied not only to Parquet but also to other columnar formats.

These are the key factors you should have into consideration:

1. Select only the Columns That you will use

This is one of simplest optimizations that you can do. Remember that data is stored in a columnar way so picking the columns that matter not only will will filter out very quickly as it will reduce significantly the volume

2. Apply the most appropriate Compression algoritm

The majority of contemporary data formats support compression. When examining the most common ones for Parquet—such as LZO, Snappy, and Gzip—we observe several notable differences (ref: sheet)

For instance gzip cannot be splitted, which means if you are going to process the data with a distributed process like Spark for instance you must use the driver to deal with all the uncompression.

LZO strikes a better balance between speed and compression rate when compared to Snappy. In this specific case, I would also recommend exploring Brotli as the datasets seem to contain text. Choosing an effective algorithm is crucial.

3. Sort the data

While it may not seem immediately relevant, aligning the rows in this manner results in extended streaks of constant values across multiple columns, enhancing the compaction ratio applied by the compression algorithm

Thoughs

They took it a step further by implementing additional optimizations, such as increasing the row_group_size. What’s crucial to highlight here is the significant gains achievable through the application of good engineering practices, resulting in faster and more cost-effective processes.

Additionally, DuckDB is exceptionally fast for executing these types of processes. While I’m eager to test it out, unfortunately, I find myself short on both time and disk space!

References

Sendmail Relay Configuration

Intro

In this article I will go through the process of setting up Sendmail to relay email to MailJet service.

There are several options to setup relaying on your web hosting service, and also several providers that you can consider.

Incorporating the SMTP relay service with Mailjet allows to take advantage of other services provided such and Campaign management.

Requirements

  • For this setup you will need to have access to your server and permissions to install software.
  • Create one account on MailJet service
  • Have permissions to change your domain DNS records

MailJet

For this setup we are considering MailJet service but you can use a different one.
Depending on the tier level, you will have different limitations.

The Free tier allows:

  • 200 emails per day
  • 1500 contacts
  • 6000 emails p/month

It is a good point to start and later increase if it makes sense.

DNS

SPF & DKIM are authentication systems that tell Internet Service Providers (ISPs), like Gmail and Yahoo, that incoming mail has been sent from an authorized system, and that it is not spam or email spoofing. To set Mailjet as an authorized sender and improve your deliverability, you need to modify your DNS records to include DKIM signature and SPF.

This document provides more detailed information

But basically you will need to include 2 TXT records

  • type: TXT , host: @ , value: “v=spf1 include:spf.mailjet.com ~all”

If you run a DNS query on your domain for TXT you need to see that info

dig -t TXT yourdomain.com

You also need to include the DKIM record follow the instructions provided

There is one option to validate if the configuration is working properly

Add Domains

You will also need to configure the allowed domains that will be allowed and validate senders.

In the following URL you can make those:

API Keys

The last step would be to create an API key for your service.

Go to following URL and create a new key, note it down as it will be required later.

Ok, now let’s configure our MTA

Configure Sendmail

For this setup you will need access to your hosting service and capable of installing software.

The following instructions are for a Ubuntu base distribution.

Install packages

sudo apt-get install sendmail

Configuration

In this setup we will configure to relay via SMTP all email using auth provided by the service

Start by editing the following file /etc/mail/sendmail.mc and add the following content at the end

dnl # Default Mailer setup
MAILER_DEFINITIONS
define(`SMART_HOST', `in-v3.mailjet.com')dnl
define(`RELAY_MAILER_ARGS', `TCP $h 587')dnl
define(`ESMTP_MAILER_ARGS', `TCP $h 587')dnl
define(`confAUTH_OPTIONS', `A p')dnl
TRUST_AUTH_MECH(`EXTERNAL DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
define(`confAUTH_MECHANISMS', `EXTERNAL GSSAPI DIGEST-MD5 CRAM-MD5 LOGIN PLAIN')dnl
FEATURE(`authinfo',`hash -o /etc/mail/authinfo/smtp-auth.db')dnl
MAILER(`local')dnl
MAILER(`smtp')dnl

We need to setup authentication, remember the previous API key that you created you will need to include the information associaded to API_KEY and API_SECRET on the following file /etc/mail/authinfo/smtp-auth

AuthInfo: "U:root" "I:API_KEY" "P:API_SECRET"
sudo mkdir /etc/mail/authinfo
sudo nano /etc/mail/authinfo/smtp-auth

Example:

AuthInfo: "U:root" "I:1233450786523741256e" "P:ety555qtfgdghsd88wrfer"

After this you need to run the following command to update the service configuration
files

make -C /etc/mail

And restart sendmail service

systemctl restart sendmail

Test

In order to test you can execute the following command

echo "Test Email" | mail -s "Subject Here" recipient@example.com 

You can now check in MailJet Stats session if your mail pass there.

Troubleshooting

You can check with the mailq command to understand if there is mail being block and the logs in /var/log/mail.log to understand if there is some issue.

Conclusion

In this article we went though the configuration of Sendmail service to relay emails through the Mailjet service. It covers the necessary configurations in both DNS and the Mailjet service to ensure seamless email delivery from your web hosting server.

References

Redpanda

Intro

In this article I’ll go through the Redpanda quickstart guide.
Spinning up a Redpanda cluster in Docker to evaluate in Linux

Requirements

Make sure you have docker and docker-compose

Setup

For lightweight testing, we are going to start a single Redpanda broker.

Create the following docker-compose.yml file with the content:

version: "3.7"
name: redpanda-quickstart
networks:
redpanda_network:
driver: bridge
volumes:
redpanda-0: null
services:
redpanda-0:
command:
- redpanda
- start
- --kafka-addr internal://0.0.0.0:9092,external://0.0.0.0:19092
# Address the broker advertises to clients that connect to the Kafka API.
# Use the internal addresses to connect to the Redpanda brokers'
# from inside the same Docker network.
# Use the external addresses to connect to the Redpanda brokers'
# from outside the Docker network.
- --advertise-kafka-addr internal://redpanda-0:9092,external://localhost:19092
- --pandaproxy-addr internal://0.0.0.0:8082,external://0.0.0.0:18082
# Address the broker advertises to clients that connect to the HTTP Proxy.
- --advertise-pandaproxy-addr internal://redpanda-0:8082,external://localhost:18082
- --schema-registry-addr internal://0.0.0.0:8081,external://0.0.0.0:18081
# Redpanda brokers use the RPC API to communicate with each other internally.
- --rpc-addr redpanda-0:33145
- --advertise-rpc-addr redpanda-0:33145
# Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system.
- --smp 1
# The amount of memory to make available to Redpanda.
- --memory 1G
# Mode dev-container uses well-known configuration properties for development in containers.
- --mode dev-container
# enable logs for debugging.
- --default-log-level=debug
image: docker.redpanda.com/redpandadata/redpanda:v23.3.5
container_name: redpanda-0
volumes:
- redpanda-0:/var/lib/redpanda/data
networks:
- redpanda_network
ports:
- 18081:18081
- 18082:18082
- 19092:19092
- 19644:9644
console:
container_name: redpanda-console
image: docker.redpanda.com/redpandadata/console:v2.4.3
networks:
- redpanda_network
entrypoint: /bin/sh
command: -c 'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console'
environment:
CONFIG_FILEPATH: /tmp/config.yml
CONSOLE_CONFIG_FILE: |
kafka:
brokers: ["redpanda-0:9092"]
schemaRegistry:
enabled: true
urls: ["http://redpanda-0:8081"]
redpanda:
adminApi:
enabled: true
urls: ["http://redpanda-0:9644"]
ports:
- 8080:8080
depends_on:
- redpanda-0

And start the execution with docker-compose up -d

Start Streaming

Let’s use the rpk command-line tool to create a topic, produce messages to it, and consume messages.

Get information about the cluster with the command

docker exec -it redpanda-0 rpk cluster info

Now lets create a topic called chat-room:

docker exec -it redpanda-0 rpk topic create chat-room

Producing messages for that topic

docker exec -it redpanda-0 rpk topic produce chat-room

Consuming one message from the topic

docker exec -it redpanda-0 rpk topic consume chat-room --num 1

You can install rpk on your system directly and connect with the broker

curl -LO https://github.com/redpanda-data/redpanda/releases/latest/download/rpk-linux-amd64.zip

Then unzip the file and put the rpk binary on bin path ex: unzip rpk-linux-amd64.zip -d ~/.local/bin/

You can test the connection to your broker with:

rpk cluster info -X brokers=127.0.0.1:19092

Generating Mock Data

Let’s use the following command from our references to product mock data.

Leave one terminal open with the following command

rpk topic consume Products -X brokers=127.0.0.1:19092

On a diferent terminal create the following file schema.avsc

{
"type": "record",
"name": "Products",
"namespace": "exp.products.v1",
"fields": [
{ "name": "id", "type": "string" },
{ "name": "productId", "type": ["null", "string"] },
{ "name": "title", "type": "string" },
{ "name": "price", "type": "int" },
{ "name": "isLimited", "type": "boolean" },
{ "name": "sizes", "type": ["null", "string"], "default": null },
{ "name": "ownerIds", "type": { "type": "array", "items": "string" } }
]
}

Make sure to install datagen

npm install -g @materializeinc/datagen

Create the following .env file

# Kafka Brokers
KAFKA_BROKERS=localhost:19092

# For Kafka SASL Authentication:
SASL_USERNAME=
SASL_PASSWORD=
SASL_MECHANISM=

# For Kafka SSL Authentication:
SSL_CA_LOCATION=
SSL_CERT_LOCATION=
SSL_KEY_LOCATION=

# Connect to Schema Registry if using '--format avro'
SCHEMA_REGISTRY_URL=
SCHEMA_REGISTRY_USERNAME=
SCHEMA_REGISTRY_PASSWORD=

Then execute the following command

datagen -s schema.avsc -n 10

And you just generated mock data based on the provided json file.
Take a look on the following repo for more details on datagen.

Conclusion

Redpanda provides a very quick alternative to have a quick kafka enviroment, which is especially good for developers. This article didn’t go deep on performance evalutions of Kafka vs Redpanda but their benchmarks worth assessing if that means reducing your kafka footprint.

Probably would let that for another article. Also I would like to test the SASL options and schema register option.

References

Strapi and React TODO Application

Intro

In this article I will build a Todo App with Strapi for the backend component and React as frontend. The guide was originally written by Chigozie Oduah check the references links as he as some very interesting articles about Strapi.

What is Strapi

Setup backend with Strapi

I will be using bun to setup packages due to improved performance checkout their page if you want to know more.

Let’s start by creating our backend with the command

bunx create-strapi-app todo-list --quickstart

This should have created a new folder todo-list you can run the following command on that folder to start your development

You should now access the browser to http://localhost:1337/admin and create you admin account so that we can start create a new collection.

If you need to restart the development environment you can enter the todo-list folder and run

bun run develop

Building the Backend

Now for our TODO application lets create a collection.

  1. Navigate to Content-Type Builder
  2. Select Create new collection type
  3. Call it Todo

Strapi uses this name to reference this collection within our application. Strapi automatically uses the display name to fill the rest of the text boxes.

Create the following fields:

  • item : Type ( Text )

And hit Save, as our application will be a simple Todo list application that single field will do the job.

Add test entries

After the collection is created, we add some test entries.

  1. Go to content Manager
  2. select the Todo collection and choose Create New entry
  3. After filling the item information you can Save and Publish

Repeat the previous step to have more entries.

Create API Endpoint for our collection

We create API endpoints for our frontend using the Todo collection. These endpoints allows a frontend to interact with our collection.

  1. Navigate to Settings
  2. Click on Roles under user permission & roles.
  3. Click on public to open the permissions given to the public.
  4. Toggle the Todo dropdown under Permissions and Select all to allow public access to our collection without auth.
  5. Hit Save

After performing the following steps you should be able to access the API

You should have a working APIs

  • Find (/api/todos GET ): We use this endpoint to get all the items in our Todo collection
  • Create (/api/todos POST): We use this endpoint to create a new item in our to-do collection.
  • Find one (/api/todos/ GET): We use this endpoint to get an item in our Todo collection.
  • Update (/api/todos/ PUT): We use this endpoint to update an item in our Todo collection
  • Delete (/api/todos/ DELETE): We use this endpoint to delete an item in our Todo collection.

Great that was easy, now lets setup our frontend React application to interact with this API endpoints.

Setup frontend React App

Now lets start the frontend application on the parent folder run the following command

bunx create-react-app todo-frontend

Next create the following two files for the environment variables:

  • .env.development
REACT_APP_BACKEND=http://localhost:1337/
  • .env.production
REACT_APP_BACKEND=/

You can run the frontend application with the following command

bun run start

And access the browser at http://localhost:3000 which will hold an empty react application.

Lets replace the App.js file the following content

import { useState, useEffect } from 'react';
import TodoItem from './TodoItem';
import './App.css';

function App() {
const [todos, setTodos] = useState([]);
const [newTodo, setNewTodo] = useState("");

useEffect(() => {
// update update the list of todos
// when the component is rendered for the first time
update();
}, []);

// This function updates the component with the
// current todo data stored in the server
function update() {
fetch(`${process.env.REACT_APP_BACKEND}api/todos`)
.then(res => res.json())
.then(todo => {
setTodos(todo.data);
})
}

// This function sends a new todo to the server
// and then call the update method to update the
// component
function addTodo(e) {
e.preventDefault();
let item = newTodo;
let body = {
data: {
item
}
};

fetch(`${process.env.REACT_APP_BACKEND}api/todos`, {
method: "POST",
headers: {
'Content-type': 'application/json'
},
body: JSON.stringify(body)
})
.then(() => {
setNewTodo("");
update();
})
}

return (
<div className="app">
<main>
{/* we centered the "main" tag in our style sheet*/}

{/* This form collects the item we want to add to our todo, and sends it to the server */}
<form className="form" onSubmit={addTodo}>
<input type="text" className="todo_input" placeholder="Enter new todo" value={newTodo} onChange={e => setNewTodo(e.currentTarget.value) }/>
<button type="submit" className="todo_button">Add todo</button>
</form>

{/* This is a list view of all the todos in the "todo" state variable */}
<div>
{
todos.map((todo, i) => {
return <TodoItem todo={todo} key={i} update={update} />
})
}
</div>

</main>
</div>
)
}
export default App;

Create the following file TodoItem.jsx with the following content:

import { useState } from "react";
import './App.css';

function TodoItem({ todo, update }) {

// Our component uses the "edit" state
// variable to switch between editing
// and viewing the todo item
const [edit, setEdit] = useState(false);
const [newTodo, setNewTodo] = useState("");

// This function changes the to-do that
// is rendered in this component.
// This function is called when the
// form to change a todo is submitted
function changeTodo(e) {
e.preventDefault();
let item = newTodo;
let pos = todo.id;
let body = {
data: {
item
}
};

fetch(`${process.env.REACT_APP_BACKEND}api/todos/${pos}`, {
method: "PUT",
headers: {
'Content-type': 'application/json'
},
body: JSON.stringify(body)
})
.then(() => {
setEdit(false);
update();
})
}

// This function deletes the to-do that
// is rendered in this component.
// This function is called when the
// form to delete a todo is submitted
function deleteTodo(e) {
e.preventDefault();
let pos = todo.id;

fetch(`${process.env.REACT_APP_BACKEND}api/todos/${pos}`, {
method: "DELETE"
})
.then(() => {
update();
})
}

return <div className="todo">
{/*
The below toggles between two components
depending on the current value of the "edit"
state variable
*/}
{ !edit
? <div className="name">{todo.attributes.item}</div>
: <form onSubmit={changeTodo}>
<input className="todo_input" type="text" placeholder="Enter new todo" value={newTodo} onChange={e => setNewTodo(e.currentTarget.value)} />
<button className="todo_button" type="submit">Change todo</button>
</form>
}
<div>
<button className="delete" onClick={deleteTodo}>delete</button>
<button className="edit" onClick={() => {
// this button toggles the "edit" state variable
setEdit(!edit)

// we add this snippet below to make sure that our "input"
// for editing is the same as the one for the component when
// it is toggled. This allows anyone using it to see the current
// value in the element, so they don't have to write it again
setNewTodo(todo.attributes.item)
}}>edit</button>
</div>
</div>
}

export default TodoItem;

Also replace App.css file with the following content:

.app {
display: flex;
justify-content: center;
text-align: center;
}

.todo_input {
height: 16px;
padding: 10px;
border-top-left-radius: 8px;
border-bottom-left-radius: 8px;
border: 2px solid blueviolet;
}

.todo_button {
border: 2px solid blueviolet;
background-color: transparent;
height: 40px;
border-top-right-radius: 8px;
border-bottom-right-radius: 8px;
}

.todo {
display: flex;
justify-content: space-between;
margin-top: 5px;
font-weight: 700;
margin-bottom: 5px;
min-width: 340px;
}

.edit {
width: 66px;
font-weight: 700;
background: blueviolet;
border: none;
border-top-right-radius: 5px;
height: 33px;
border-bottom-right-radius: 5px;
color: white;
font-size: medium;
}

.delete {
width: 66px;
font-weight: 700;
background: white;
border: 2px solid blueviolet;
border-top-left-radius: 5px;
height: 33px;
color: blueviolet;
border-bottom-left-radius: 5px;
font-size: medium;
}

.form {
padding-top: 27px;
padding-bottom: 27px;
}

.name {
max-width: 190.34px;
text-align: left;
}

After the last update you should have a workable todo app http://localhost:3000/

Deployment

I’ve seen several articles where developers bundle the frontend application on the public folder to keep a single server installation, but according to Strapi is not a good practice.

Conclusion

In this article we have setup Strapi to setup the backend for a Todo list application and a react frontend that would take advantage of the provided APIs using a headless architecture.

Strapi allows to quickly setup APIs for Collections that can be defined and managed through a provided UI. Very useful if one would like to decouple the development process, or if you don’ t won’ t to implement from scratch backend functionalities.

Regarding the level of customization would require extensive exploration. The backoffice allows to create auth tokens, webhooks, SSO, internationalization and also has a marketplace area to include more functionalities.

Also worth mention that if you can leverage Strapi Cloud to deploy your Production applications

References