Data Science Series EP 6

Rushi Chudasama
6 min readSep 22, 2021

Introduction to Neo4j and Gephi Tool

Neo4j:

Neo4j is the world’s leading open-source Graph Database which is developed using Java technology. It is highly scalable and schema-free (NoSQL).

At first, I will run a hello world query which will create the 2 nodes called Neo4j and Hello world and 1 relation called says.

5.1 Run this query you will find 2 nodes
5.2 The relationship created just by a simple query
5.3 Table view of nodes & relations

Here I have used the Movies database for demo purposes only, you can create by yourself just by clicking Create new. Start the Movies database and see the database in the Neo4j browser…

After that load, the movie database to the neo4j and we can see the data in graph format.

In this database, There are 9 person nodes and 8 movies nodes, and a total of 18 relationships between nodes. use the below command to find total nodes.

MATCH (n) RETURN count(n)
//find labels in database
CALL db.labels()
// Find types of relationship between tables
CALL db.relationshipTypes()

By using this query we can know that how the person is connected to the movie, who is the producer of the movie, which role the person acted in the movie.

// query for the movies released in 1990s
MATCH (nineties:Movie) WHERE nineties.released >= 1990 AND nineties.released < 2000 RETURN nineties.title
//query for list all tom hanks movie
MATCH (tom:Person {name: "Tom Hanks"})-[:ACTED_IN]->(tomHanksMovies) RETURN tom,tomHanksMovies
// cloud atlas
MATCH (cloudAtlas {title: "Cloud Atlas"})<-[:DIRECTED]-(directors) RETURN directors.name

Advantages of Neo4j databases

  • Performance: In relational databases, performance suffers as the number and depth of relationships increase. In graph databases like Neo4j, performance remains high even if the amount of data grows significantly.
  • Flexibility: Neo4j is flexible, as the structure and schema of a graph model can be easily adjusted to the changes in an application. Also, you can easily upgrade the data structure without damaging existing functionality.
  • Agility: The structure of a Neo4j database is easy to upgrade, so the data store can evolve along with your application.

Gephi:

Gephi is a visualization application developed in the Java language. It is mainly used for visualizing, manipulating, and exploring networks and graphs from raw edge and node graph data. It is a free and open-source application. It is an excellent tool for data analysts and data science enthusiasts to explore and understand graphs. The primary goal is to enable the user to make a hypothesis, discover hidden patterns, isolate structure singularities and defects during data sourcing.

In this demo, I have chosen simple karate.gml dataset and performed some basic gephi operations on it. So let’s get started.

  1. Open Gephi and click on New Project. Then choose File->Open and load the dataset of your choice as shown below. On loading the dataset it would show the number of nodes and edges present in the dataset as well as the type of the graph.

2. Below is how all the nodes and edges are displayed when initially data is loaded.

3. Now we can represent the data in various layouts. In the left pane choose the layout option and choose the layout of your choice and click on Run. In the below image I have chosen the ForceAtlas layout which displays the data in the following form.

4. Next we can differentiate the nodes based on various ranking like their In-Degree, Out-Degree, or Degree and show them in different colors. For this in the left pane on the top side choose Nodes->Ranking there choose the ranking like in the below image In-Degree is chosen, where red color nodes have lower in-degree compared to white and Dark grey node has highest in-degree rankings.

5. More clear visualizations can also be made by displaying the nodes in various sizes. For instance, in the below image nodes having higher degrees are larger in size compared to nodes having less degree i.e nodes in Dark grey have a high value of degree compared to nodes in white and red color.

For displaying in various sizes in the left pane in the Appearance section select the Size option and then mention the minimum and maximum size of nodes you want to display. I have given the Min size to be 10 and Max size to be 30.

6. Next we generate a Degree Distribution graph for Degree, In-Degree, and Out-Degree and also get the Average Degree value for all the nodes. To generate the graph simply in the right pane choose the Statistics tab and there run Average Degree in the Network Overview section.

A report will be generated as well the column for the degree will be added to the dataset table.

To see the Data Table in the top Menu Bar select Window->Data Table and you would be able to see your table as in the above image where after running the Average Degree function columns for In-degree, Out-Degree and Degree is added for each node present.

8. Now we can try and different functionalities as well as try various layouts in the Gephi tool. In the below image I have used the Noverlap Layout.

Conclusion:

Tap to learn More Here about Neo4j and Gephi

Previous blogs about Orange tool Blog1, Blog2 & Blog3.

LinkedIn:

More Projects and Blogs:

Blogs:

Final Note:

Thanks for reading! If you enjoyed this article, please hit the clap 👏button as many times as you can. It would mean a lot and encourage me to keep sharing my knowledge. If you like my content follow me on medium I will try to post as many blogs as I can.

--

--