This article was about pico a hacker labs service as they advertise it.
It is extermely fast if you want to use a pastbin option to share some data or quickly upload in prose some markdown notes, when you don’t need to worry on setting up something fancing just to publish them and focus on the writing.
The pro service like tuns.hs and imgs.sh seem to be also powerfull. The later one if you ant to integrate with github actions for instance, but I didn’t evaluate that version
Was reading this article where Philippe Rivière and Éric Mauvière optimized a 200GB Parquet data and prepare it to 549kB.
Now this work touch some very relevant points regarding Data Engineering procedures and best practises, I would suggest going on the article as it explains in detail what they applied in each stage and how.
Use Case
This new fascinating dataset just dropped on Hugging Face. French public domain newspapers 🤗 references about 3 million newspapers and periodicals with their full text OCR’ed and some meta-data. The data is stored in 320 large parquet files. The data loader for this Observable framework project uses DuckDB to read these files (altogether about 200GB) and combines a minimal subset of their metadata — title and year of publication, most importantly without the text contents —, into a single highly optimized parquet file.
Undoubtedly, this dataset proves immensely valuable for training and processing Language Model (LLM) models
Best Practises
I firmly believe that these best practices should be applied not only to Parquet but also to other columnar formats.
These are the key factors you should have into consideration:
1. Select only the Columns That you will use
This is one of simplest optimizations that you can do. Remember that data is stored in a columnar way so picking the columns that matter not only will will filter out very quickly as it will reduce significantly the volume
2. Apply the most appropriate Compression algoritm
The majority of contemporary data formats support compression. When examining the most common ones for Parquet—such as LZO, Snappy, and Gzip—we observe several notable differences (ref: sheet)
For instance gzip cannot be splitted, which means if you are going to process the data with a distributed process like Spark for instance you must use the driver to deal with all the uncompression.
LZO strikes a better balance between speed and compression rate when compared to Snappy. In this specific case, I would also recommend exploring Brotli as the datasets seem to contain text. Choosing an effective algorithm is crucial.
3. Sort the data
While it may not seem immediately relevant, aligning the rows in this manner results in extended streaks of constant values across multiple columns, enhancing the compaction ratio applied by the compression algorithm
Thoughs
They took it a step further by implementing additional optimizations, such as increasing the row_group_size. What’s crucial to highlight here is the significant gains achievable through the application of good engineering practices, resulting in faster and more cost-effective processes.
Additionally, DuckDB is exceptionally fast for executing these types of processes. While I’m eager to test it out, unfortunately, I find myself short on both time and disk space!
In this article I will go through the process of setting up Sendmail to relay email to MailJet service.
There are several options to setup relaying on your web hosting service, and also several providers that you can consider.
Incorporating the SMTP relay service with Mailjet allows to take advantage of other services provided such and Campaign management.
Requirements
For this setup you will need to have access to your server and permissions to install software.
Create one account on MailJet service
Have permissions to change your domain DNS records
MailJet
For this setup we are considering MailJet service but you can use a different one. Depending on the tier level, you will have different limitations.
The Free tier allows:
200 emails per day
1500 contacts
6000 emails p/month
It is a good point to start and later increase if it makes sense.
DNS
SPF & DKIM are authentication systems that tell Internet Service Providers (ISPs), like Gmail and Yahoo, that incoming mail has been sent from an authorized system, and that it is not spam or email spoofing. To set Mailjet as an authorized sender and improve your deliverability, you need to modify your DNS records to include DKIM signature and SPF.
We need to setup authentication, remember the previous API key that you created you will need to include the information associaded to API_KEY and API_SECRET on the following file /etc/mail/authinfo/smtp-auth
After this you need to run the following command to update the service configuration files
make -C /etc/mail
And restart sendmail service
systemctl restart sendmail
Test
In order to test you can execute the following command
echo"Test Email" | mail -s "Subject Here" recipient@example.com
You can now check in MailJet Stats session if your mail pass there.
Troubleshooting
You can check with the mailq command to understand if there is mail being block and the logs in /var/log/mail.log to understand if there is some issue.
Conclusion
In this article we went though the configuration of Sendmail service to relay emails through the Mailjet service. It covers the necessary configurations in both DNS and the Mailjet service to ensure seamless email delivery from your web hosting server.
In this article I’ll go through the Redpanda quickstart guide. Spinning up a Redpanda cluster in Docker to evaluate in Linux
Requirements
Make sure you have docker and docker-compose
Setup
For lightweight testing, we are going to start a single Redpanda broker.
Create the following docker-compose.yml file with the content:
version:"3.7" name:redpanda-quickstart networks: redpanda_network: driver:bridge volumes: redpanda-0:null services: redpanda-0: command: -redpanda -start ---kafka-addrinternal://0.0.0.0:9092,external://0.0.0.0:19092 # Address the broker advertises to clients that connect to the Kafka API. # Use the internal addresses to connect to the Redpanda brokers' # from inside the same Docker network. # Use the external addresses to connect to the Redpanda brokers' # from outside the Docker network. ---advertise-kafka-addrinternal://redpanda-0:9092,external://localhost:19092 ---pandaproxy-addrinternal://0.0.0.0:8082,external://0.0.0.0:18082 # Address the broker advertises to clients that connect to the HTTP Proxy. ---advertise-pandaproxy-addrinternal://redpanda-0:8082,external://localhost:18082 ---schema-registry-addrinternal://0.0.0.0:8081,external://0.0.0.0:18081 # Redpanda brokers use the RPC API to communicate with each other internally. ---rpc-addrredpanda-0:33145 ---advertise-rpc-addrredpanda-0:33145 # Tells Seastar (the framework Redpanda uses under the hood) to use 1 core on the system. ---smp1 # The amount of memory to make available to Redpanda. ---memory1G # Mode dev-container uses well-known configuration properties for development in containers. ---modedev-container # enable logs for debugging. ---default-log-level=debug image:docker.redpanda.com/redpandadata/redpanda:v23.3.5 container_name:redpanda-0 volumes: -redpanda-0:/var/lib/redpanda/data networks: -redpanda_network ports: -18081:18081 -18082:18082 -19092:19092 -19644:9644 console: container_name:redpanda-console image:docker.redpanda.com/redpandadata/console:v2.4.3 networks: -redpanda_network entrypoint:/bin/sh command:-c'echo "$$CONSOLE_CONFIG_FILE" > /tmp/config.yml; /app/console' environment: CONFIG_FILEPATH:/tmp/config.yml CONSOLE_CONFIG_FILE:| kafka: brokers: ["redpanda-0:9092"] schemaRegistry: enabled: true urls: ["http://redpanda-0:8081"] redpanda: adminApi: enabled: true urls: ["http://redpanda-0:9644"] ports: -8080:8080 depends_on: -redpanda-0
And start the execution with docker-compose up -d
Start Streaming
Let’s use the rpk command-line tool to create a topic, produce messages to it, and consume messages.
Get information about the cluster with the command
Redpanda provides a very quick alternative to have a quick kafka enviroment, which is especially good for developers. This article didn’t go deep on performance evalutions of Kafka vs Redpanda but their benchmarks worth assessing if that means reducing your kafka footprint.
Probably would let that for another article. Also I would like to test the SASL options and schema register option.
In this article I will build a Todo App with Strapi for the backend component and React as frontend. The guide was originally written by Chigozie Oduah check the references links as he as some very interesting articles about Strapi.
What is Strapi
Setup backend with Strapi
I will be using bun to setup packages due to improved performance checkout their page if you want to know more.
Let’s start by creating our backend with the command
bunx create-strapi-app todo-list --quickstart
This should have created a new folder todo-list you can run the following command on that folder to start your development
You should now access the browser to http://localhost:1337/admin and create you admin account so that we can start create a new collection.
If you need to restart the development environment you can enter the todo-list folder and run
bun run develop
Building the Backend
Now for our TODO application lets create a collection.
Navigate to Content-Type Builder
Select Create new collection type
Call it Todo
Strapi uses this name to reference this collection within our application. Strapi automatically uses the display name to fill the rest of the text boxes.
Create the following fields:
item : Type ( Text )
And hit Save, as our application will be a simple Todo list application that single field will do the job.
Add test entries
After the collection is created, we add some test entries.
Go to content Manager
select the Todo collection and choose Create New entry
After filling the item information you can Save and Publish
Repeat the previous step to have more entries.
Create API Endpoint for our collection
We create API endpoints for our frontend using the Todo collection. These endpoints allows a frontend to interact with our collection.
Navigate to Settings
Click on Roles under user permission & roles.
Click on public to open the permissions given to the public.
Toggle the Todo dropdown under Permissions and Select all to allow public access to our collection without auth.
Hit Save
After performing the following steps you should be able to access the API
useEffect(() => { // update update the list of todos // when the component is rendered for the first time update(); }, []);
// This function updates the component with the // current todo data stored in the server functionupdate() { fetch(`${process.env.REACT_APP_BACKEND}api/todos`) .then(res => res.json()) .then(todo => { setTodos(todo.data); }) }
// This function sends a new todo to the server // and then call the update method to update the // component functionaddTodo(e) { e.preventDefault(); let item = newTodo; let body = { data: { item } }; fetch(`${process.env.REACT_APP_BACKEND}api/todos`, { method: "POST", headers: { 'Content-type': 'application/json' }, body: JSON.stringify(body) }) .then(() => { setNewTodo(""); update(); }) }
return ( <divclassName="app"> <main> {/* we centered the "main" tag in our style sheet*/} {/* This form collects the item we want to add to our todo, and sends it to the server */} <formclassName="form"onSubmit={addTodo}> <inputtype="text"className="todo_input"placeholder="Enter new todo"value={newTodo}onChange={e => setNewTodo(e.currentTarget.value) }/> <buttontype="submit"className="todo_button">Add todo</button> </form> {/* This is a list view of all the todos in the "todo" state variable */} <div> { todos.map((todo, i) => { return <TodoItemtodo={todo}key={i}update={update} /> }) } </div> </main> </div> ) } exportdefaultApp;
Create the following file TodoItem.jsx with the following content:
functionTodoItem({ todo, update }) { // Our component uses the "edit" state // variable to switch between editing // and viewing the todo item const [edit, setEdit] = useState(false); const [newTodo, setNewTodo] = useState("");
// This function changes the to-do that // is rendered in this component. // This function is called when the // form to change a todo is submitted functionchangeTodo(e) { e.preventDefault(); let item = newTodo; let pos = todo.id; let body = { data: { item } };
// This function deletes the to-do that // is rendered in this component. // This function is called when the // form to delete a todo is submitted functiondeleteTodo(e) { e.preventDefault(); let pos = todo.id; fetch(`${process.env.REACT_APP_BACKEND}api/todos/${pos}`, { method: "DELETE" }) .then(() => { update(); }) }
return<divclassName="todo"> {/* The below toggles between two components depending on the current value of the "edit" state variable */} { !edit ? <divclassName="name">{todo.attributes.item}</div> : <formonSubmit={changeTodo}> <inputclassName="todo_input"type="text"placeholder="Enter new todo"value={newTodo}onChange={e => setNewTodo(e.currentTarget.value)} /> <buttonclassName="todo_button"type="submit">Change todo</button> </form> } <div> <buttonclassName="delete"onClick={deleteTodo}>delete</button> <buttonclassName="edit"onClick={() => { // this button toggles the "edit" state variable setEdit(!edit) // we add this snippet below to make sure that our "input" // for editing is the same as the one for the component when // it is toggled. This allows anyone using it to see the current // value in the element, so they don't have to write it again setNewTodo(todo.attributes.item) }}>edit</button> </div> </div> }
exportdefaultTodoItem;
Also replace App.css file with the following content:
I’ve seen several articles where developers bundle the frontend application on the public folder to keep a single server installation, but according to Strapi is not a good practice.
Conclusion
In this article we have setup Strapi to setup the backend for a Todo list application and a react frontend that would take advantage of the provided APIs using a headless architecture.
Strapi allows to quickly setup APIs for Collections that can be defined and managed through a provided UI. Very useful if one would like to decouple the development process, or if you don’ t won’ t to implement from scratch backend functionalities.
Regarding the level of customization would require extensive exploration. The backoffice allows to create auth tokens, webhooks, SSO, internationalization and also has a marketplace area to include more functionalities.
Also worth mention that if you can leverage Strapi Cloud to deploy your Production applications