The Wika Network Dataset

Wika Network

5 min readFeb 9, 2022

How to sync Wika Blockchain data with indexed databases using Subquery framework.

Audience

Wika community members who want to use the blockchain data to analyze or build.
Subquery users who want to learn from an example use case.

Overview

The Wika Network ETL repo provides an easy way to index the Wika blockchain data into 3 databases:

As tables: Postgres.
As a graph: Neo4J.
As documents: Elastic Search.

It relies on Subquery and was developed by starting from its default scaffolding.

This article will describe how we used Subquery to implement this use case; and describe the databases that you can generate and sync with the blockchain.

How we built on top of Subquery

Subquery proposes a very intuitive workflow (Subquery documentation) to sync blockchain data with a Postgres database:

1. Initializing the project

Very straightforward after installing the dependencies

subql init subql_wika

2. Updating the manifest file

In the file project.yml we set the blockchain endpoint, the genesis hash of our testnet, and the starting block. The mapping section was kept to the defaults but that’s where the magic happens, it basically says that each new block data will be handled by the function handleBlock, events by handleEvents, etc.

specVersion: 0.2.0
name: subql_wika
version: 1.0.0
description: ''
repository: https://github.com/randombishop/wika_etl
schema:
  file: ./schema.graphql
network:
  endpoint: wss://testnode3.wika.network:443
  genesisHash: '0x59732b25bb635769e91a71f818c6d845b9bdcd371bb93d1512b1eacedb53d4be'
dataSources:
  - kind: substrate/Runtime
    startBlock: 1777550
    mapping:
      file: ./dist/index.js
      handlers:
        - handler: handleBlock
          kind: substrate/BlockHandler
        - handler: handleEvent
          kind: substrate/EventHandler
        - handler: handleCall
          kind: substrate/CallHandler

3. Defining our dataset

This is done in the file schema.graphql, where we defined 4 entities:

BlockInfo: id and sync date
UrlMetadata: to store the title, description and some metadata about each webpage.
LikeEvent and UrlRegisteredEvent: the core of our blockchain data.

type BlockInfo @entity {

  id: ID! #id is a required field
  blockNum: Int! @index(unique: true)
  syncDate: Date!

}


type UrlMetadata @entity {

  id: ID! #id is a required field
  title: String
  description: String
  image: String
  icon: String
  updatedAt: Date!

}


type LikeEvent @entity {

  id: ID! #id is a required field
  url: String! @index(unique: false)
  user: String! @index(unique: false)
  numLikes: Int!
  blockNum: Int!

}


type UrlRegisteredEvent @entity {

  id: ID! #id is a required field
  url: String! @index(unique: false)
  owner: String! @index(unique: false)
  active: Boolean!
  blockNum: Int!

}

This uses the GraphQL standard, which is important to get familiar with if you’re planning to use Subquery.

4. Mapping logic

The code to transform the data and load it into the indexed databases is pretty straightforward JavaScript code located in the src folder.

By default, Subquery will provide data access functions to load into Postgres, so we added a plugins directory to code for pulling the metadata from websites, Neo4J and Elastic Search.

5. Additional changes

We also added Neo4j, ElasticSearch and Kibana to the services in docker-compose.yml (they all provide ready-to-use docker images.)

Also, one important part to understand about Subquery: the mapping logic code runs in a sandbox, which is restricted by default to a minimal set of dependencies. And as of current version, it doesn’t provide a way to easily extend the list of dependencies. So the workaround was to modify the following file directly inside the docker image of the Subquery engine:

/usr/local/lib/node_modules/@subql/node/dist/indexer/sandbox.service.js

Finally, all dependencies were pinned to specific versions to facilitate reproducibility.

"devDependencies": {
  "@polkadot/api": "7.5.1",
  "@subql/cli": "0.19.0",
  "@subql/types": "0.13.0",
  "@types/chai": "4.3.0",
  "@types/mocha": "9.1.0",
  "chai": "4.3.4",
  "cheerio": "1.0.0-rc.10",
  "mocha": "9.1.4",
  "neo4j-driver": "4.4.1",
  "node-fetch": "2.6.7",
  "typescript": "4.5.5"
}

And that’s pretty much it, there a few more little details that we changed after the initialization of the project, all documented in template_change_log.md but the main ones above are really all you need to know if you’re planning to build an ETL with Subquery.

6. Building and running

There are 3 steps to start the ETL:

Generate the model classes
Compile the JavaScript code into TypeScript.
docker-compose up!

You’ll find the exact how-to in the readme.md doc.

Wika Blockchain Data

With your ETL up and running, the blockchain data will be synced into Postgres, Neo4j and ElasticSearch, and here is how it looks like!

1. Postgres

Here are the tables and columns you will find in Postgres

url_registered_event

The Postgres data can also be queried using GraphQL, which should be running at localhost:3000

2. Neo4J

Neo4J provides an interesting alternative to explore the data from a graph perspective instead of tables:

Each user is represented by a node (User class) and includes the total number of likes sent.
Each URL is represented by a node (Url class.) and includes the total number of likes received.
Likes are represented by the relationship LIKES, storing the number of likes as well.
Ownership are represented by the relationship OWNS.