Neo4j for Beginners (Part 2)- Cypher Fundamentals

In Part 1, we learned what makes graph databases different: data is stored as nodes (things) and relationships (connections), and those connections are first-class citizens in the database. We also wrote your first Cypher query to find patterns in banking data. Now it’s time to go deeper. In this article, lets learn how to create your own graph data, master essential Cypher clauses, and write the queries that make graphs truly powerful.

Graph Database Fundamentals: The Building Blocks
Setting Up Your Practice Environment
Creating Your Banking Graph
Essential Cypher Clauses
Wrap Up

Graph Database Fundamentals: The Building Blocks

Before we start creating data and writing queries, let’s establish a solid foundation by understanding the core concepts that make up Neo4j’s property graph model. These fundamentals will guide everything you do with graph databases.

The Property Graph Model

Neo4j uses a property graph database model. At its core, a graph data structure consists of nodes (discrete objects) that can be connected by relationships. Think of it like a network or mind map where circles represent things and arrows represent connections between those things. Here’s the simplest possible visualization.

(Node A) --[RELATIONSHIP]--> (Node B)

The Neo4j property graph model consists of five fundamental building blocks:

Nodes - Represent entities or discrete objects in your domain
Labels - Classify and group nodes into sets
Relationships - Connect nodes and describe how they’re related
Relationship Types - Classify relationships
Properties - Store data as key-value pairs on both nodes and relationships

Let’s explore each of these in detail.

Nodes: The Things in Your Graph

Nodes are used to represent entities — the discrete objects in your domain. In our banking context, customers, products, accounts, and transactions would all be nodes.Key characteristics of nodes:

A node can have zero or more labels
A node can have zero or more properties
Nodes can exist independently (a node doesn’t need relationships)
The simplest possible graph is a single node with no relationships:

(Customer)

Nodes represent the “nouns” of your domain — the people, places, things, or concepts that matter to your application.

Labels: Classifying Your Nodes

Labels shape your domain by grouping nodes into sets. All nodes with a certain label belong to the same set, which allows you to perform operations on specific types of nodes. Key characteristics of labels:

A node can have zero to many labels
Labels can be added or removed at runtime
Labels are case-sensitive (:Person is different from :person)

Examples:

All customer nodes could be labeled :Customer
A person who is both a customer and an actor could have labels :Person:Customer:Actor
You can use labels for temporary states: :Suspended for suspended accounts, :Active for active products
In the example below, a single node has multiple labels to describe different dimensions: (tom:Person:Actor:Customer)
This flexibility lets you query nodes from different perspectives. You might ask for “all Persons” in one query and “all Actors” in another, even if some nodes satisfy both criteria

Relationships: The Connections Between Things

A relationship describes a connection between a source node and a target node. This is where graphs truly differ from relational databases—relationships are not inferred or calculated; they’re stored directly as first-class data structures.Key characteristics of relationships:

Connects exactly one source node to one target node
Always has a direction (though you can ignore direction in queries when appropriate)
Must have exactly one relationship type
Can have properties (key-value pairs)
A node can have a relationship to itself Example relationship:

(alice:Customer)-[:USES {since: date(‘2020-01-15’), status: ‘active’}]->(savings:Product)

This shows:

Source node: alice (a Customer)
Relationship type: USES
Direction: From alice to savings (indicated by ->)
Properties: since and status stored on the relationship itself
Target node: savings (a Product)
Relationships always have a direction, but you can traverse them in either direction or ignore direction entirely when querying. The direction is part of the data model and can carry semantic meaning—”Alice USES Product” is different from “Product USED_BY Alice,” even though they describe the same connection from different perspectives.
Important: A node can have a relationship to itself. For example, if Tom Hanks knows himself:

(tom:Person)-[:KNOWS]->(tom)

Relationship Types: Classifying Connections

Just as labels classify nodes, relationship types classify relationships. Every relationship must have exactly one type. Key characteristics:

A relationship has exactly one type (unlike labels, you can’t have multiple)
Types define the semantic meaning of the connection
Types are case-sensitive
Common relationship types in banking:
- :USES - Customer uses Product
- :OWNS - Customer owns Account
- :ISSUED_BY - Card issued by Bank
- :TRANSFERS_TO - Account transfers to Account
The relationship type tells you how two nodes are connected. This is crucial for traversals—when you follow relationships, you often want to follow only specific types. You might want to find “all products a customer USES” without including “all accounts they OWN.”

Properties: Storing Data

Properties are key-value pairs that store data on both nodes and relationships. They’re where you put the actual information about your entities and connections. Key characteristics:

Properties are stored as key-value pairs
Both nodes and relationships can have properties
Values can be primitives (numbers, strings, booleans) or homogeneous arrays
Properties are case-sensitive
Naming convention: Use camelCase for properties (firstName rather than first_name)
Supported data types:

Numbers:

CREATE (:Example {count: 42, price: 19.99})

Strings and booleans:

CREATE (:Example {name: 'Alice', active: true})

Dates and temporal types:

CREATE (:Customer {since: date('2020-01-15'), lastLogin: datetime()})

Arrays (homogeneous lists):

CREATE (:Example {
  tags: ['premium', 'verified'],
  scores: [95, 87, 92],
  active: [true, false, true]
})

Important: Properties are only stored when they have values. There’s no concept of null values being stored—if a property isn’t set, it simply doesn’t exist on that node or relationship.

Setting Up Your Practice Environment

Neo4j Sandbox runs entirely in your browser—no installation required. Visit sandbox.neo4j.com, create a free account, and launch a blank sandbox. Within seconds, you’ll have a working Neo4j instance with the Browser interface ready to go. This is perfect if you want to start immediately without worrying about setup. If you’re using a sandbox that has sample data, or if you want to start fresh at any point, you can clear everything with:

MATCH (n) DETACH DELETE n

This finds all nodes (MATCH (n)), detaches them from their relationships, and deletes everything. This was very useful for the exercise below .

Creating Your Banking Graph

In Part 1, you queried existing data. Now you’ll learn how to create it yourself. Neo4j uses the CREATE clause to add nodes and relationships to your graph.

Creating Nodes

Let’s start by creating some customers. Each customer is a node with properties that describe them:

CREATE (alice:Customer {id: 'C001', name: 'Alice', since: date('2020-01-15')})

Let’s break down what’s happening here:

CREATE tells Neo4j we’re adding something new
(alice:Customer …) creates a node. The word before the colon (alice) is a variable we can use in this query, and the word after (:Customer) is a label that categorizes this node
{id: ‘C001’, name: ‘Alice’, since: date(‘2020-01-15’)} are properties—key-value pairs that store information about this customer
The date() function creates a proper date object rather than just text

Run that query, then create a few more customers:

CREATE (bob:Customer {id: 'C002', name: 'Bob', since: date('2021-03-20')})
CREATE (carol:Customer {id: 'C003', name: 'Carol', since: date('2019-07-10')})
CREATE (david:Customer {id: 'C004', name: 'David', since: date('2022-06-05')})

Each CREATE statement adds one node to your graph. You should see a confirmation message after each one: “Added 1 node.”

Creating Products

Now let’s add the banking products these customers might use:

CREATE (savings:Product {code: 'SAV-001', name: 'Savings Account', category: 'Deposit'})
CREATE (checking:Product {code: 'CHK-001', name: 'Checking Account', category: 'Deposit'})
CREATE (credit:Product {code: 'CRD-001', name: 'Credit Card', category: 'Credit'})
CREATE (loan:Product {code: 'LON-001', name: 'Personal Loan', category: 'Credit'})
CREATE (mortgage:Product {code: 'MTG-001', name: 'Mortgage', category: 'Credit'})

Notice we’re using different properties here (code, name, category) because products need different information than customers. This flexibility is one of the strengths of graph databases—different node types can have completely different properties.

Creating Relationships

Now comes the interesting part: connecting customers to products. This is where graphs really shine. We need to find the nodes we want to connect, then create a relationship between them:

MATCH (alice:Customer {id: 'C001'})
MATCH (savings:Product {code: 'SAV-001'})
CREATE (alice)-[:USES {since: date('2020-01-15'), status: 'active'}]->(savings)

Let’s understand this step by step:

The first MATCH finds Alice (the customer we created earlier)
The second MATCH finds the Savings Account product
CREATE (alice)-[:USES …]-> (savings) creates a relationship from Alice to the Savings Account
The relationship type is USES (inside the square brackets)
The direction matters: the arrow -> points from customer to product, meaning “Alice uses Savings Account”
Relationships can have properties too: we’re storing when the relationship started and its current status

Add several more relationships to build out our dataset:

MATCH (alice:Customer {id: 'C001'})
MATCH (credit:Product {code: 'CRD-001'})
CREATE (alice)-[:USES {since: date('2020-06-10'), status: 'active'}]->(credit)

MATCH (bob:Customer {id: 'C002'})
MATCH (checking:Product {code: 'CHK-001'})
CREATE (bob)-[:USES {since: date('2021-03-20'), status: 'active'}]->(checking)

MATCH (bob:Customer {id: 'C002'})
MATCH (credit:Product {code: 'CRD-001'})
CREATE (bob)-[:USES {since: date('2021-08-15'), status: 'active'}]->(credit)

MATCH (bob:Customer {id: 'C002'})
MATCH (loan:Product {code: 'LON-001'})
CREATE (bob)-[:USES {since: date('2023-01-10'), status: 'active'}]->(loan)

MATCH (carol:Customer {id: 'C003'})
MATCH (savings:Product {code: 'SAV-001'})
CREATE (carol)-[:USES {since: date('2019-07-10'), status: 'active'}]->(savings)

MATCH (carol:Customer {id: 'C003'})
MATCH (checking:Product {code: 'CHK-001'})
CREATE (carol)-[:USES {since: date('2019-07-10'), status: 'active'}]->(checking)

MATCH (carol:Customer {id: 'C003'})
MATCH (mortgage:Product {code: 'MTG-001'})
CREATE (carol)-[:USES {since: date('2020-11-20'), status: 'active'}]->(mortgage)

MATCH (david:Customer {id: 'C004'})
MATCH (credit:Product {code: 'CRD-001'})
CREATE (david)-[:USES {since: date('2022-06-05'), status: 'active'}]->(credit)

MATCH (david:Customer {id: 'C004'})
MATCH (loan:Product {code: 'LON-001'})
CREATE (david)-[:USES {since: date('2023-09-12'), status: 'active'}]->(loan)

After running all these queries, we’ve built a small but complete banking graph. We have customers, products, and the relationships between them—including metadata about when each relationship started and whether it’s currently active.

customers products visualisation

Verify Your Graph

Let’s make sure everything is there:

MATCH (c:Customer)-[u:USES]->(p:Product)
RETURN c.name, p.name, u.since
ORDER BY c.name, u.since

You should see all your customers, the products they use, and when they started using them. This is the graph you’ll query throughout the rest of this article

From Relational to Graph: A Mental Model Shift

If you’re coming from a relational database background, you might be wondering how the graph model you just created compares to what you’re used to. Let’s make that connection explicit. In relational databases, you would model our banking scenario with separate tables for Customers and Products, connected through a JOIN table (often called Customer_Product or similar). That JOIN table would contain foreign keys pointing to both tables, plus any relationship metadata like since and status. In Neo4j, we’ve eliminated that intermediate JOIN table entirely. The relationship itself is the connection, and it carries the metadata directly. No foreign keys, no JOIN table, no extra lookup step.

What you gain:

Direct connections: Relationships physically point from one node to another
Constant-time traversals: Following a relationship doesn’t require index lookups or table scans
Clearer model: What you sketch on the whiteboard is what you store in the database

This is what we mean by “index-free adjacency”—each node directly references its relationships, making traversals extremely fast regardless of database size

Essential Cypher Clauses

You now have data in your graph. To ask useful questions of it, we need to master the core Cypher clauses that filter, shape, and aggregate your results. These clauses are the building blocks of every query you’ll write.

WHERE: Filtering Your Results

The WHERE clause lets you filter nodes and relationships based on their properties. In Part 1, you saw simple property matching like {name: ‘Alice’} directly in the MATCH pattern. WHERE is more flexible and handles complex conditions. Find all customers using Credit products:

MATCH (c:Customer)-[:USES]->(p:Product)
WHERE p.category = 'Credit'
RETURN c.name, p.name

The WHERE clause filters after the pattern is matched. Only relationships where the product category equals ‘Credit’ will appear in results. You can combine multiple conditions:

MATCH (c:Customer)-[:USES]->(p:Product)
WHERE p.category = 'Credit' AND c.since > date('2021-01-01')
RETURN c.name, p.name, c.since

This finds customers who joined after January 2021 and use Credit products. Use OR for alternatives:

MATCH (c:Customer)-[:USES]->(p:Product)
WHERE p.category = 'Credit' OR p.category = 'Deposit'
RETURN c.name, p.name, p.category

You can also filter on relationship properties:

MATCH (c:Customer)-[u:USES]->(p:Product)
WHERE u.status = 'active' AND u.since > date('2022-01-01')
RETURN c.name, p.name, u.since

This finds only active relationships that started after 2022. Notice we gave the relationship a variable (u) so we can reference its properties.

Comparison operators work as you’d expect:

MATCH (c:Customer)-[u:USES]->(p:Product)
WHERE u.since < date('2021-01-01')
RETURN c.name, p.name, u.since
ORDER BY u.since

The <> operator means “not equal”:

MATCH (c:Customer)-[:USES]->(p:Product)
WHERE p.category <> 'Deposit'
RETURN c.name, p.name

ORDER BY and LIMIT: Controlling Output

ORDER BY sorts your results, and LIMIT restricts how many rows you get back. These are essential when exploring data or building user-facing features. Sort customers alphabetically:

MATCH (c:Customer)-[:USES]->(p:Product)
RETURN c.name, p.name
ORDER BY c.name

The default order is ascending. Use DESC for descending:

MATCH (c:Customer)
RETURN c.name, c.since
ORDER BY c.since DESC

This shows newest customers first. You can sort by multiple fields:

MATCH (c:Customer)-[:USES]->(p:Product)
RETURN c.name, p.category, p.name
ORDER BY c.name, p.category

This sorts first by customer name, then by product category within each customer. LIMIT restricts results to the first N rows:

MATCH (c:Customer)-[:USES]->(p:Product)
RETURN c.name, p.name
LIMIT 5

Always use LIMIT when exploring your graph, especially as it grows.

Without it, you might accidentally return millions of rows. Combine them for “top N” queries:

MATCH (c:Customer)
RETURN c.name, c.since
ORDER BY c.since DESC
LIMIT 3

This shows your three most recent customers.

COUNT and Aggregations

Aggregations let you compute summary statistics. The most common is COUNT, which counts how many times something appears. How many products does each customer use?

MATCH (c:Customer)-[:USES]->(p:Product)
RETURN c.name, COUNT(p) AS product_count
ORDER BY product_count DESC

The AS keyword

creates an alias for the result column. Now you can reference product_count in ORDER BY. Key insight: When you use an aggregation function like COUNT, Cypher automatically groups results by all non-aggregated columns in your RETURN clause. Here, results are grouped by c.name, and products are counted within each group.

Other aggregation functions work similarly:

MATCH (c:Customer)-[u:USES]->(p:Product)
RETURN c.name, 
       COUNT(p) AS product_count,
       MIN(u.since) AS first_product,
       MAX(u.since) AS latest_product

COLLECT

COLLECT is particularly useful—it creates a list of values:

MATCH (c:Customer)-[:USES]->(p:Product)
RETURN c.name, COLLECT(p.name) AS products

This returns each customer with an array of all their product names. Perfect for seeing someone’s complete product portfolio at a glance. You can count without grouping by using COUNT(*):

MATCH (c:Customer)-[:USES]->(p:Product {category: 'Credit'})
RETURN COUNT(*) AS credit_product_users

This counts the total number of customer-product relationships where the product is Credit, giving you a single number.

DISTINCT: Removing Duplicates

Sometimes patterns in your graph create duplicate results. DISTINCT removes them. Without DISTINCT, this query might return the same customer multiple times if they use multiple Credit products:

MATCH (c:Customer)-[:USES]->(p:Product)
WHERE p.category = 'Credit'
RETURN DISTINCT c.name

This returns each customer name only once, no matter how many Credit products they use.

You can also use DISTINCT with COUNT:

MATCH (p:Product)<-[:USES]-(c:Customer)
RETURN p.name, COUNT(DISTINCT c) AS unique_customers

This counts how many different customers use each product. Without DISTINCT, if a customer had multiple relationships to the same product (which shouldn’t happen in our model, but could in more complex graphs), they’d be counted multiple times.

WITH: Chaining Query Steps

WITH is one of Cypher’s most powerful clauses. It lets you chain multiple query steps together, passing results from one step to the next. Think of it as a pipeline. Find customers who use more than 2 products:

MATCH (c:Customer)-[:USES]->(p:Product)
WITH c, COUNT(p) AS product_count
WHERE product_count > 2
RETURN c.name, product_count

Here’s what happens:

First MATCH finds all customer-product relationships
WITH groups by customer and counts products, creating product_count
WHERE filters on that count (you can’t filter on aggregations in the first MATCH)
RETURN shows the results
Without WITH, you couldn’t filter on COUNT(p) because aggregations happen after WHERE clauses in the same query block.
WITH is also useful for transforming data mid-query:

This collects all products into a list, then filters customers based on the size of that list.

MATCH (c:Customer)-[:USES]->(p:Product)
WITH c, COLLECT(p.name) AS products
WHERE SIZE(products) > 2
RETURN c.name, products

If you want to use my version of the queries , single file containing all the code : code

Wrap Up

We’ve come a long way in this tutorial. We started by understanding the five fundamental building blocks of Neo4j’s property graph model—nodes, labels, relationships, relationship types, and properties. You then built your own banking graph from scratch, creating customers, products, and the relationships between them. Finally, you mastered the essential Cypher clauses that let you filter, sort, aggregate, and transform your graph data

Written on December 24, 2025

Srini Ponugupaty