Teaching the Elephant to Swim: In-Place Analytics with Kafka and pg_duckdb
Thursday, June 25 at 14:20–15:10
-
I am a co-founder of Baremon with more than 25 years of experience in database services, data governance and cloud technologies. My background is rooted in hands-on database administration and ETL, and gradually evolved into leading and shaping data platforms as organisational complexity and scale increase.
Today, I combine deep technical expertise with a pragmatic, forward-looking approach. I focus on building robust, secure and reliable data ecosystems, where performance, stability and informed decision-making are business-critical.
At Baremon, I contribute to both the technical and strategic development of data services, working with platforms such as Oracle, AWS, Collibra and Kubernetes across different industries.
I place strong emphasis on mentoring and knowledge sharing. This includes leading trainings, proofs of concept and customer enablement initiatives that help teams operate and evolve their data platforms with confidence.
-
Senior Consultant at Baremon, specialising in database services, data governance, and cloud transformation. I help organisations build secure, reliable, and future-ready data platforms across cloud and hybrid environments. My work combines hands-on technical expertise with a pragmatic, business-focused approach to delivering sustainable outcomes.
Modern real-time fraud detection stacks have become an exercise in infrastructure sprawl. A typical architecture involves PostgreSQL for transactions, Kafka for events, a vector database for embeddings, a data warehouse for analytics, and complex ETL pipelines or event-sourcing frameworks to keep everything in sync. This 'six-box' architecture introduces six points of failure and a significant consistency challenge.
In this session, we demonstrate a 'PostgreSQL-First' architecture that reduces this complexity by 70% without sacrificing performance or correctness. We argue that in high-stakes environments, where a single error can cost millions, PostgreSQL must remain the sovereign Source of Truth, while Kafka should serve strictly as a distributed commit log – not as the authority on state.
We will explore:
- The Write Path: Why we chose the Transactional Outbox pattern over standard CDC to maintain semantic control and domain-driven event boundaries.
- The Recovery Contract: Using the Consumer Inbox pattern and ON CONFLICT DO NOTHING in PostgreSQL to ensure idempotent processing and safe history replays.
- In-Place Intelligence: Replacing dedicated vector databases with pgvector for behavioural fingerprinting and using pg_duckdb on read replicas for zero-ETL, sub-second analytics.
- PostgreSQL 18 Internals: How native UUIDv7 eliminates B-tree fragmentation in high-throughput streams, and how the new AIO subsystem removes structural performance ceilings on modern NVMe storage.
The talk concludes with a live demonstration using the Kaggle Credit Card Fraud dataset, where we will deliberately 'break' the system and show how the database-anchored architecture ensures deterministic recovery even under extreme pressure.
Takeaways:
Attendees will leave with a blueprint for building lean, explainable, and operationally 'boring' systems that leverage the latest PostgreSQL 18 features to replace fragmented best-of-breed stacks.