Alle artikelen

Navigating Different Types of Databases and Why You Should Probably Just Go With Postgres

In previous blog posts, we've explored how to identify your first data and AI use case and provided a practical guide to getting started with data and AI. Now, it's time to tackle a fundamental question that every organization faces when embarking on their data journey: which database should you use?

It might even be a recurring question with every project for some organisations.

Next to that, the database landscape and its offerings is vast and constantly evolving, with vendors promising revolutionary approaches and paradigm-shifting technologies. The latest thing you might have seen around: Vector Databases like Pinecone and Weaviate.

It's easy to get caught in the hype cycle, chasing the latest trend instead of focusing on what delivers real business value.

Therefore, I felt like saving you some time and write down my perspective. If you still think this is text is too long (and you are not an expert): just go with a PostgreSQL instance, it will take you far.

If you are interested in the why and how, keep reading!

The Database Landscape: An Overview

Let's explore the main types of databases you'll encounter, their strengths, weaknesses, and when they make sense for your business.

Relational Databases

Key examples: PostgreSQL, MySQL, Oracle, SQL Server

Data model: Organizes data into tables with rows and columns, using relationships between tables to model complex data. Follows the relational model first coined by Codd in 1970. This type of database has been around for a long time (and will be for the foreseeable future).

When to use:

  • Structured data with well-defined relationships

  • Applications requiring strong data consistency and ACID transactions

  • Business applications with complex querying needs

  • When data integrity is paramount

Relational databases have dominated the market since the 1980s—and for good reason. The combination of the relational model with SQL (Structured Query Language) provides a powerful, declarative way to work with data that has stood the test of time.

Document Stores

Key examples: MongoDB, Couchbase

Data model: Stores semi-structured documents (typically JSON) that can contain nested data.

When to use:

  • Highly variable data structures that evolve frequently

  • Applications requiring quick iteration without rigid schemas

  • When flexibility is more important than complex joins

  • Rapid prototyping and development

Document databases gained popularity because they allow developers to quickly iterate without defining strict schemas upfront. However, as applications mature, they often need the core features that relational model provides. This is why most document databases now offer SQL-like querying and ACID transactions.

Key-Value Stores

Key examples: Redis, Memcached, DynamoDB

Data model: Simple mapping of keys to values, with values typically treated as opaque blobs.

When to use:

  • Caching layers

  • Session stores

  • Real-time applications requiring sub-millisecond response times

  • Simple data structures with access primarily by a single key

Key-value stores excel at simple, high-throughput operations. They're perfect for use cases like caching or maintaining session data, but limited for complex queries or when you need to understand the structure of the stored values. Oftentimes, you’ll eventually want to do this. For example, when you want to do analysis, or find occurrences of specific values.

Columnar / Column-Family Databases

Key examples: Apache Cassandra, HBase, ScyllaDB

Data model: Data organized by columns rather than rows, optimizing for analytical queries.

When to use:

  • Write-heavy workloads with predictable query patterns

  • Time-series data with massive scale

  • When high write throughput and horizontal scaling are critical

  • When eventual consistency is acceptable

These databases were popularized by Google's Bigtable paper and excel at handling massive write volumes and scaling horizontally. They typically trade some consistency for availability and partition tolerance (following the CAP theorem).

Time-Series Databases

Key examples: InfluxDB, TimescaleDB

Data model: Optimized for data points indexed by time, often with specialized functions for time-based analysis.

When to use:

  • IoT data collection

  • Application and system monitoring

  • Financial market data

  • Any data where time is the primary axis

Time-series databases are specialized for handling data where time is the organizing principle, with optimizations for efficient storage and querying of time-ordered measurements. TimescaleDB even does it while keeping the data strictly relational, offering a lot of benefits.

Graph Databases

Key examples: Neo4j, ArangoDB

Data model: Represents data as nodes and edges in a graph, optimizing for relationship traversal.

When to use:

  • Highly interconnected data with complex relationships

  • Recommendation engines

  • Fraud detection

  • Network analysis

  • Knowledge graphs

Graph databases excel when relationships between entities are as important as the entities themselves.

Vector Databases

Key examples: Pinecone, Weaviate

Data model: Stores high-dimensional vectors (embeddings) with efficient similarity search.

When to use:

  • AI applications requiring semantic search

  • Recommendation systems based on embeddings

  • Image, audio, or text similarity matching

  • RAG (Retrieval Augmented Generation) systems

The newest entrants to the database scene, vector databases are purpose-built for AI applications that rely on embedding vectors from machine learning models.

Data Lakes & Data Lakehouses

Key examples: S3, Azure Storage Accounts, Databricks (Delta Lake), Snowflake, Amazon Redshift

Data model: Store raw data in various formats (Datalake), often with a layer providing SQL access and ACID guarantees (Data Lakehouse).

When to use:

  • When you need to store and analyze massive volumes of diverse data

  • When separating storage from compute is beneficial for cost management

  • For organizations with complex analytical needs spanning multiple data types

  • Data science and ML workloads on large datasets

Data lakes evolved from the MapReduce/Hadoop era, offering cost-effective storage for large volumes of data. Modern data lakehouses add structure and SQL capabilities, combining the best of data warehouses and data lakes. If you want a deep dive on this topic, read our Data Lakes blog here.

When in doubt, go for Postgres!

With all these options, why do we recommend Postgres as your starting point?

1. Versatility and Stability Through Extensions

PostgreSQL's extension ecosystem allows it to function as multiple database types:

  • TimescaleDB extension turns PostgreSQL into a Time-Series database. Performance of the TimescaleDB extension is great, even when comparing it to dedicated timeseries databases like Influx. Wolk recently has done a migration from Influx to Timescale for a manufacturing client, and performance increases were significant.

  • pgvector enables vector similarity search for AI applications, making it a Vector DB. At wolk, we have build RAG applications with this extension on a moderately sized Postgres instance and results were great!

  • AGE, is an extension for PostgreSQL that enables users to leverage a Graph Database on top of the existing relational databases.

  • pg_duckdb allows for querying files, Datalake(house) style. Now you can combine relational data and file-based data in one place.

  • Citus, is a PostgreSQL extension that transforms Postgres into a distributed database, so you can achieve high performance at any scale - mimicking a Data Warehouse.

All these extensions are open-source and deployable on any Postgres instance. This extensibility means PostgreSQL can evolve as your needs change, without requiring you to migrate to entirely different systems.

2. Strong Compliance with SQL Standards

PostgreSQL implements a large portion of the SQL standard, making it easier to:

  • Find developers familiar with its query language

  • Port applications to or from other SQL databases

  • Trust query results to be correct and consistent

3. ACID Compliance and Data Integrity

PostgreSQL provides full ACID (Atomicity, Consistency, Isolation, Durability) compliance, ensuring your data remains consistent even during failures or concurrent operations.

4. Scalability Paths

While not automatically distributed like some NoSQL databases, PostgreSQL offers several paths to scale:

  • Vertical scaling (larger machines) works surprisingly well

  • Read replicas for scaling read operations

  • Extensions like Citus for horizontal scaling

  • Managed services like Azure Database for PostgreSQL or Amazon RDS

5. Cost-Effective and Open Source

PostgreSQL is free, open-source, and has a vibrant community and ecosystem, reducing vendor lock-in and licensing costs.

Starting Small: A Practical Approach

Following our philosophy of clearly scoped projects, here's how to approach database selection:

  1. Start with PostgreSQL for almost any data project

  2. Use extensions to add specialized functionality as needed

  3. Monitor performance and identify specific bottlenecks

  4. Consider specialized systems only when you've hit clear limitations

Common Database Migration Pitfalls

Before rushing to adopt specialized databases, be aware of these common pitfalls:

  1. Premature optimization: Choosing specialized databases before actually hitting performance limitations

  2. Underestimating migration costs: Data migration, code refactoring, and new operational processes all add hidden costs

  3. Overlooking operational complexity: Each new database type adds operational overhead for backups, monitoring, security, and maintenance

  4. Missing PostgreSQL features: Many organizations migrate to NoSQL databases without fully exploring PostgreSQL's capabilities

  5. Following hype cycles: Choosing databases based on trends rather than actual business requirements

The Wolk Database Decision Framework

When deciding which database to use, follow this simple framework:

  1. Start with PostgreSQL as your default choice

  2. Define your specific requirements clearly (consistency, scale, query patterns)

  3. Explore PostgreSQL extensions that address your specific needs

  4. Benchmark and test with realistic workloads before committing to migration

  5. Only add specialized databases when you have concrete evidence they're needed

By starting small with PostgreSQL and iterating as your needs evolve, you'll avoid unnecessary complexity while maintaining the flexibility to specialize when truly necessary. If you are completely sure that you need something on a bigger scale, have a look at our Introduction to Data Lakes.

Interested in more? Have a look at the “What goes around comes around.. And around..” talk by Andy Pavlo here.

Ready to start your database journey with a practical, value-driven approach?

Reach out to me at stijn@wolk.work, or send me a message on LinkedIn.


Stay up to date!

Subscribe to our newsletter, de Wolkskrant, to get the latest tools, trends and tips from the industry.