In this article I will build a Todo App with Strapi for the backend component and React as frontend. The guide was originally written by Chigozie Oduah check the references links as he as some very interesting articles about Strapi.
What is Strapi
Setup backend with Strapi
I will be using bun to setup packages due to improved performance checkout their page if you want to know more.
Let’s start by creating our backend with the command
bunx create-strapi-app todo-list --quickstart
This should have created a new folder todo-list you can run the following command on that folder to start your development
You should now access the browser to http://localhost:1337/admin and create you admin account so that we can start create a new collection.
If you need to restart the development environment you can enter the todo-list folder and run
bun run develop
Building the Backend
Now for our TODO application lets create a collection.
Navigate to Content-Type Builder
Select Create new collection type
Call it Todo
Strapi uses this name to reference this collection within our application. Strapi automatically uses the display name to fill the rest of the text boxes.
Create the following fields:
item : Type ( Text )
And hit Save, as our application will be a simple Todo list application that single field will do the job.
Add test entries
After the collection is created, we add some test entries.
Go to content Manager
select the Todo collection and choose Create New entry
After filling the item information you can Save and Publish
Repeat the previous step to have more entries.
Create API Endpoint for our collection
We create API endpoints for our frontend using the Todo collection. These endpoints allows a frontend to interact with our collection.
Navigate to Settings
Click on Roles under user permission & roles.
Click on public to open the permissions given to the public.
Toggle the Todo dropdown under Permissions and Select all to allow public access to our collection without auth.
Hit Save
After performing the following steps you should be able to access the API
useEffect(() => { // update update the list of todos // when the component is rendered for the first time update(); }, []);
// This function updates the component with the // current todo data stored in the server functionupdate() { fetch(`${process.env.REACT_APP_BACKEND}api/todos`) .then(res => res.json()) .then(todo => { setTodos(todo.data); }) }
// This function sends a new todo to the server // and then call the update method to update the // component functionaddTodo(e) { e.preventDefault(); let item = newTodo; let body = { data: { item } }; fetch(`${process.env.REACT_APP_BACKEND}api/todos`, { method: "POST", headers: { 'Content-type': 'application/json' }, body: JSON.stringify(body) }) .then(() => { setNewTodo(""); update(); }) }
return ( <divclassName="app"> <main> {/* we centered the "main" tag in our style sheet*/} {/* This form collects the item we want to add to our todo, and sends it to the server */} <formclassName="form"onSubmit={addTodo}> <inputtype="text"className="todo_input"placeholder="Enter new todo"value={newTodo}onChange={e => setNewTodo(e.currentTarget.value) }/> <buttontype="submit"className="todo_button">Add todo</button> </form> {/* This is a list view of all the todos in the "todo" state variable */} <div> { todos.map((todo, i) => { return <TodoItemtodo={todo}key={i}update={update} /> }) } </div> </main> </div> ) } exportdefaultApp;
Create the following file TodoItem.jsx with the following content:
functionTodoItem({ todo, update }) { // Our component uses the "edit" state // variable to switch between editing // and viewing the todo item const [edit, setEdit] = useState(false); const [newTodo, setNewTodo] = useState("");
// This function changes the to-do that // is rendered in this component. // This function is called when the // form to change a todo is submitted functionchangeTodo(e) { e.preventDefault(); let item = newTodo; let pos = todo.id; let body = { data: { item } };
// This function deletes the to-do that // is rendered in this component. // This function is called when the // form to delete a todo is submitted functiondeleteTodo(e) { e.preventDefault(); let pos = todo.id; fetch(`${process.env.REACT_APP_BACKEND}api/todos/${pos}`, { method: "DELETE" }) .then(() => { update(); }) }
return<divclassName="todo"> {/* The below toggles between two components depending on the current value of the "edit" state variable */} { !edit ? <divclassName="name">{todo.attributes.item}</div> : <formonSubmit={changeTodo}> <inputclassName="todo_input"type="text"placeholder="Enter new todo"value={newTodo}onChange={e => setNewTodo(e.currentTarget.value)} /> <buttonclassName="todo_button"type="submit">Change todo</button> </form> } <div> <buttonclassName="delete"onClick={deleteTodo}>delete</button> <buttonclassName="edit"onClick={() => { // this button toggles the "edit" state variable setEdit(!edit) // we add this snippet below to make sure that our "input" // for editing is the same as the one for the component when // it is toggled. This allows anyone using it to see the current // value in the element, so they don't have to write it again setNewTodo(todo.attributes.item) }}>edit</button> </div> </div> }
exportdefaultTodoItem;
Also replace App.css file with the following content:
I’ve seen several articles where developers bundle the frontend application on the public folder to keep a single server installation, but according to Strapi is not a good practice.
Conclusion
In this article we have setup Strapi to setup the backend for a Todo list application and a react frontend that would take advantage of the provided APIs using a headless architecture.
Strapi allows to quickly setup APIs for Collections that can be defined and managed through a provided UI. Very useful if one would like to decouple the development process, or if you don’ t won’ t to implement from scratch backend functionalities.
Regarding the level of customization would require extensive exploration. The backoffice allows to create auth tokens, webhooks, SSO, internationalization and also has a marketplace area to include more functionalities.
Also worth mention that if you can leverage Strapi Cloud to deploy your Production applications
After this updates run the dbt debug command. To make sure the connection is working properly
dbt debug 00:31:58 Running with dbt=1.7.6 00:31:58 dbt version: 1.7.6 00:31:58 python version: 3.11.6 00:31:58 python path: /home/rramos/Development/local/dbt/bin/python 00:31:58 os info: Linux-6.6.10-zen1-1-zen-x86_64-with-glibc2.38 00:31:58 Using profiles dir at /home/rramos/.dbt 00:31:58 Using profiles.yml file at /home/rramos/.dbt/profiles.yml 00:31:58 Using dbt_project.yml file at /home/rramos/Development/local/dbt/imdb/dbt_project.yml 00:31:58 adapter type: clickhouse 00:31:58 adapter version: 1.7.1 00:31:58 Configuration: 00:31:58 profiles.yml file [OK found and valid] 00:31:58 dbt_project.yml file [OK found and valid] 00:31:58 Required dependencies: 00:31:58 - git [OK found] ... 00:31:58 Registered adapter: clickhouse=1.7.1 00:31:58 Connection test: [OK connection ok]
If the connection test passed properly, one just need to create the model via dbt.
dbt run
And you should have a similar output
dbt run 00:38:13 Running with dbt=1.7.6 00:38:13 Registered adapter: clickhouse=1.7.1 00:38:13 Unable to do partial parsing because a project config has changed 00:38:15 Found 1 model, 6 sources, 0 exposures, 0 metrics, 421 macros, 0 groups, 0 semantic models 00:38:15 00:38:15 Concurrency: 1 threads (target='dev') 00:38:15 00:38:15 1 of 1 START sql view model `imdb`.`actor_summary` ............................. [RUN] 00:38:15 1 of 1 OK created sql view model `imdb`.`actor_summary` ........................ [OK in 0.17s] 00:38:15 00:38:15 Finished running 1 view model in 0 hours 0 minutes and 0.27 seconds (0.27s). 00:38:15 00:38:15 Completed successfully 00:38:15 00:38:15 Done. PASS=1 WARN=0 ERROR=0 SKIP=0 TOTAL=1
Test query the model
SELECT* FROM imdb_dbt.actor_summary WHERE num_movies >5 ORDERBY avg_rank DESC
Conclusion
In this article I’ve went trough the process of setup a Clickhouse database and setup dbt to setup the models with IMDB test data for actors, directors, movies, etc.
This two systems work like a charm together. Clickstream shows great performance for analytical queries, and dbt compiles and runs your analytics code against your data platform, enabling you and your team to collaborate on a single source of truth for metrics, insights, and business definitions.
Would like to extend this exercise by incorporating github actions related with dbt test actions before promoting to production.
ClickHouse is a true column-oriented DBMS. Data is stored by columns, and during the execution of arrays (vectors or chunks of columns). Whenever possible, operations are dispatched on arrays, rather than on individual values. It is called “vectorized query execution” and it helps lower the cost of actual data processing.
Architecture
ClickHouse was initially built as a prototype to do just a single task well: to filter and aggregate data as fast as possible. That’s what needs to be done to build a typical analytical report, and that’s what a typical GROUP BY query does. The ClickHouse team has made several high-level decisions that, when combined, made achieving this task possible:
Column-oriented storage: Source data often contain hundreds or even thousands of columns, while a report can use just a few of them. The system needs to avoid reading unnecessary columns to avoid expensive disk read operations.
Indexes: Memory resident ClickHouse data structures allow the reading of only the necessary columns, and only the necessary row ranges of those columns.
Data compression: Storing different values of the same column together often leads to better compression ratios (compared to row-oriented systems) because in real data a column often has the same, or not so many different, values for neighboring rows. In addition to general-purpose compression, ClickHouse supports specialized codecs that can make data even more compact.
Vectorized query execution: ClickHouse not only stores data in columns but also processes data in columns. This leads to better CPU cache utilization and allows for SIMD CPU instructions usage.
Scalability: ClickHouse can leverage all available CPU cores and disks to execute even a single query. Not only on a single server but all CPU cores and disks of a cluster as well.
Attention to Low-Level Details
But many other database management systems use similar techniques. What really makes ClickHouse stand out is attention to low-level details. Most programming languages provide implementations for most common algorithms and data structures, but they tend to be too generic to be effective.
Setup
In order to install run the following script
curl https://clickhouse.com/ | sh
One can start the server with the following command
./clickhouse server
With a different terminal lets start a client with
The statement uses traditional SQL DDL, with one extend information regarding the execution engine. The MergeTree option provides improved performance for managed tables but there are also options to integrate with external systems such as BigQuery, S3, Kafka, PostgreSQL, …
Insert some data
INSERTINTO my_first_table (user_id, message, timestamp, metric) VALUES (101, 'Hello, ClickHouse!', now(), -1.0 ), (102, 'Insert a lot of rows per batch', yesterday(), 1.41421 ), (102, 'Sort your data based on your commonly-used queries', today(), 2.718 ), (101, 'Granules are the smallest chunks of data read', now() +5, 3.14159 )
query
SELECT* FROM my_first_table ORDERBYtimestamp
Now lets create a table from external data in S3 and a materialized table using the MergeTree engine from it.
SELECTDISTINCT(pickup_ntaname) FROM trips_raw LIMIT 10;
The S3 table engine supports parallel reads. Writes are only supported if the table definition does not contain glob patterns
You will need to configure access credentials on the config.xml if you are using private data or you need to write data. You can also define individual configuration filed on the conf.d directory like the following example.
Core integrations: built or maintained by ClickHouse, they are supported by ClickHouse and live in the ClickHouse GitHub organization
Partner integrations: built or maintained, and supported by, third-party software vendors
Community integrations: built or maintained and supported by community members. No direct support is available besides the public GitHub repositories and community Slack channels
Also the available documentation seems well prepared.
Production
This article only touches the surface on the available options to setup Clickhouse one should read the Scale Out section to understand best to deploy in Production
Conclusion
In this article we followed the quick-start guide in order to setup a Clickhouse server loaded data from S3 a public dataset with 10M records and performed a select distinct query on that table.
Clickhouse presents as a very interesting OLTP solution. If you are considering a solution for Analytical Reporting this is something to have on your radar.
I also liked the fact you can start small having you service full deployment and maintained by you, but still having the capability to scaling horizontally or moving to Cloud Offering which would granting the support characteristics that most of the times are missing in open-source source software.
The fact there is a huge list of Adopters and a Company providing support and defining a roadmap for the product also brings reassurance to use this Product.
The documentation provide several integrations patterns that are worth checking.
In this article I will go through the steps to setup github actions to deploy hexo pages upon push requests into Github pages
Github Actions
GitHub Actions makes it easy to automate all your software workflows, now with world-class CI/CD. Build, test, and deploy your code right from GitHub. Make code reviews, branch management, and issue triaging work the way you want.
Check the available documentation as there are several interesting examples
Hexo Setup
This tech notes site is maintained by Hexo a markdown blob framework.
The framework generates html static content based on Markdown articles an one needs carry the following steps to update content
Create/Update Markdown content
Execute hexo generate to generte content
Validate the pages with hexo serve
Deploy to your hosting service hexo deploy
Setup
Include the following file .github/workflows/hexo-deploy.yml on your hexo repo with the content:
# Caching dependencies to speed up workflows. (GitHub will remove any cache entries that have not been accessed in over 7 days.) -name:Cachenodemodules uses:actions/cache@v1 id:cache with: path:node_modules key:${{runner.os}}-node-${{hashFiles('**/package-lock.json')}} restore-keys:| ${{ runner.os }}-node- -name:InstallDependencies if:steps.cache.outputs.cache-hit!='true' run:npmci # Deploy hexo blog website. -name:Deploy id:deploy uses:sma11black/hexo-action@v1.0.4 with: deploy_key:${{secrets.HEXO_DEPLOY_KEY}} user_name:<GITHUB_USER># (or delete this input setting to use bot account) user_email:<GITHUB_EMAIL># (or delete this input setting to use bot account) commit_msg:${{github.event.head_commit.message}}
# Use the output from the `deploy` step(use for test action) -name:Gettheoutput run:| echo"${{ steps.deploy.outputs.notify }}"
Replace <GITHUB_USER> with your user account and <GITHUB_EMAIL> with your email address
Generate a new ssh-key with the comand ssh-keygen -t rsa -C "<GITHUB_EMAIL>" making sure to use you email account
This step will generate 2 files a pub key which you need to configure on the destination repo as one allowed Deployment key
And on the source repo you need to configure a secret where you will put the ssh-key
Configure a personal token and register also as a secret on the source repository as ACCESS_TOKEN
That’s it you just need to start pushing changes
NOTE: This assumes the hexo source repository was already configured for the destination github pages account.
Multi repos
It is important to notice that you cannot assign the same deployment key for several repositories.
That is why I used a personal token, but there should be better alternatives.
Conclusion
Github Actions is a really powerfull CI/CD tool and for this type of static generation content works rather well.
I had several issues regarding github submodules where the authetication was not passing. If you use the same approach for themes, you may endup on the same situation and using a token approach would be preferable
Also the ssh-keys being bounded by repo caused some initial confusion and there should be a better way to setup the autentication but I didn’t explore it in detail.
Also package-lock.json are required for this to work and is advisable to have your source repo as private.
This workflow can certainly be improve like including tests and making sure that grammar validation is done as one example.
Fabric is a centralized product implemented by Microsoft in a SaaS way, that combines several services such as DataLake, Orchestration Processing Vizualization and AI.
Billing is defined by the amount of processing used and the ammount of storage used.
OneLake is the solution for storage where all the Data is automatically indexed for discovery, lineage and Governance are configured with support of Pureview.
Data can be virtualized with external storage locations from different cloud providers, no data duplication is needed similar to pointers.
Also data is stored in Delta guaranteeing ACID compliant characteristics.
Integration with PowerBI for reporting using the familiar look
Data Activator is the realtime processing component wich triggers actions based on rules, like automated reports or procedures.
The Product also brings CoPilot features for PowerBI, DS Notebooks and DataFactory cleaning processes.
Components
OneLake
Data Factory
Synapse
Data Activator
PowerBI
Integrations
Databricks integration through Delta Uniform, also Unity Catalog integrates with OneSecurity
Copilot
Several integrations of Copilot, but the one with PowerBI is really interesting to produce quick visualizations from natual languange prompts
Conclusion
Not much to conclued here as the Product needs to be deeply tested from my part. I have some concerns on vender lock-in, but the fact that it sits on top a DataLake and supports openformats brings some reassurance.
One thing that also troubles me, is that if one client just want to use a partial feature would need to aquire the full solution.
For instance D365 in the past allowed the client to export data from that system to a Datalake with internal feature of the Product and now it relies on Fabric only, if this strategy is going to be followed for the rest of MS portfolio, example DataFactory, although increased stability or support. I found the lack of flexibility concerning, by forcing clients to use this solution.