The Wika Network Dataset

Wika Network
5 min readFeb 9, 2022

--

How to sync Wika Blockchain data with indexed databases using Subquery framework.

Audience

  • Wika community members who want to use the blockchain data to analyze or build.
  • Subquery users who want to learn from an example use case.

Overview

The Wika Network ETL repo provides an easy way to index the Wika blockchain data into 3 databases:

  • As tables: Postgres.
  • As a graph: Neo4J.
  • As documents: Elastic Search.

It relies on Subquery and was developed by starting from its default scaffolding.

This article will describe how we used Subquery to implement this use case; and describe the databases that you can generate and sync with the blockchain.

How we built on top of Subquery

Subquery proposes a very intuitive workflow (Subquery documentation) to sync blockchain data with a Postgres database:

1. Initializing the project

Very straightforward after installing the dependencies

subql init subql_wika

2. Updating the manifest file

In the file project.yml we set the blockchain endpoint, the genesis hash of our testnet, and the starting block. The mapping section was kept to the defaults but that’s where the magic happens, it basically says that each new block data will be handled by the function handleBlock, events by handleEvents, etc.

specVersion: 0.2.0
name: subql_wika
version: 1.0.0
description: ''
repository: https://github.com/randombishop/wika_etl
schema:
file: ./schema.graphql
network:
endpoint: wss://testnode3.wika.network:443
genesisHash: '0x59732b25bb635769e91a71f818c6d845b9bdcd371bb93d1512b1eacedb53d4be'
dataSources:
- kind: substrate/Runtime
startBlock: 1777550
mapping:
file: ./dist/index.js
handlers:
- handler: handleBlock
kind: substrate/BlockHandler
- handler: handleEvent
kind: substrate/EventHandler
- handler: handleCall
kind: substrate/CallHandler

3. Defining our dataset

This is done in the file schema.graphql, where we defined 4 entities:

  • BlockInfo: id and sync date
  • UrlMetadata: to store the title, description and some metadata about each webpage.
  • LikeEvent and UrlRegisteredEvent: the core of our blockchain data.
type BlockInfo @entity {

id: ID! #id is a required field
blockNum: Int! @index(unique: true)
syncDate: Date!

}


type UrlMetadata @entity {

id: ID! #id is a required field
title: String
description: String
image: String
icon: String
updatedAt: Date!

}


type LikeEvent @entity {

id: ID! #id is a required field
url: String! @index(unique: false)
user: String! @index(unique: false)
numLikes: Int!
blockNum: Int!

}


type UrlRegisteredEvent @entity {

id: ID! #id is a required field
url: String! @index(unique: false)
owner: String! @index(unique: false)
active: Boolean!
blockNum: Int!

}

This uses the GraphQL standard, which is important to get familiar with if you’re planning to use Subquery.

4. Mapping logic

The code to transform the data and load it into the indexed databases is pretty straightforward JavaScript code located in the src folder.

By default, Subquery will provide data access functions to load into Postgres, so we added a plugins directory to code for pulling the metadata from websites, Neo4J and Elastic Search.

5. Additional changes

We also added Neo4j, ElasticSearch and Kibana to the services in docker-compose.yml (they all provide ready-to-use docker images.)

Also, one important part to understand about Subquery: the mapping logic code runs in a sandbox, which is restricted by default to a minimal set of dependencies. And as of current version, it doesn’t provide a way to easily extend the list of dependencies. So the workaround was to modify the following file directly inside the docker image of the Subquery engine:

/usr/local/lib/node_modules/@subql/node/dist/indexer/sandbox.service.js

Finally, all dependencies were pinned to specific versions to facilitate reproducibility.

"devDependencies": {
"@polkadot/api": "7.5.1",
"@subql/cli": "0.19.0",
"@subql/types": "0.13.0",
"@types/chai": "4.3.0",
"@types/mocha": "9.1.0",
"chai": "4.3.4",
"cheerio": "1.0.0-rc.10",
"mocha": "9.1.4",
"neo4j-driver": "4.4.1",
"node-fetch": "2.6.7",
"typescript": "4.5.5"
}

And that’s pretty much it, there a few more little details that we changed after the initialization of the project, all documented in template_change_log.md but the main ones above are really all you need to know if you’re planning to build an ETL with Subquery.

6. Building and running

There are 3 steps to start the ETL:

  • Generate the model classes
  • Compile the JavaScript code into TypeScript.
  • docker-compose up!

You’ll find the exact how-to in the readme.md doc.

Wika Blockchain Data

With your ETL up and running, the blockchain data will be synced into Postgres, Neo4j and ElasticSearch, and here is how it looks like!

1. Postgres

Here are the tables and columns you will find in Postgres

block_infos
like_events
url_metadata
url_registered_event

The Postgres data can also be queried using GraphQL, which should be running at localhost:3000

2. Neo4J

Neo4J provides an interesting alternative to explore the data from a graph perspective instead of tables:

  • Each user is represented by a node (User class) and includes the total number of likes sent.
  • Each URL is represented by a node (Url class.) and includes the total number of likes received.
  • Likes are represented by the relationship LIKES, storing the number of likes as well.
  • Ownership are represented by the relationship OWNS.

You can connect to the Neo4J browser at localhost:7474 and use the CQL query language to explore the data set.

3. Elastic Search

The docker-compose comes with a Kibana service (front end for Elastic Search), and can be accessed at localhost:5601

You will find each Url represented as a document with the fields title, description, image and icon.

That’s all folks!

Hope you found this Subquery use case helpful, and enjoy Wika’s data!

If you need help with this repo, or any feedback, you can reach us on discord or through github issues.

--

--