DAO Data Grant Update #2
Monthly update for the Decentraland DAO-owned data aggregation grant.
DAO Data Grant
Update #2
October 5th, 2022
37,-5 (Atlas Norte)
By howieDoin – Business Development and Infrastructure at Atlas Corp
How we doin’ DCL community? In our last DAO Grant Update I spoke a little bit about the discovery and analysis we had done at the onset of the project, so this time I’d like to discuss more tangible updates as we’ve been busy building in the last month. I’d like to share a bit about the architectural decisions we’ve made as the codebase starts to take shape. We’ll be beginning to post some of this code in an open source repository later this month so stay tuned.
High Level Architecture
As stated in the initial DAO Grant proposal, the “DAO DATA” platform is being built as a series of microservices. Each application will be small and fit for purpose, and can be run independently of one another (e.g. on different infrastructure). The code is comprised of four different applications, each with their own installation and deployment process:
DAO_DATA_COLLECT: A set of microservices to collect data from each of the active DAO catalyst nodes. Expects an external MongoDB cluster to store the data.
DAO_DATA_QUERY: A set of microservices to provide a set of common queries and access to the DAO data to the general public.
DAO_DATA_DERIVE: A set of change-agent microservices to create derived metrics from the raw DAO data allowing more performant queries downstream.
DAO_DATA_ARCHIVE: A set of microservices to archive data when the storage footprint of the data outgrows its capacity. Expects an external AWS glacier instance.
Each of the above four applications can be run independently from each other, on separate infrastructure if desired. To date, work has been completed on DAO_DATA_COLLECT and DAO_DATA_QUERY is not far from completion.
There were no real blockers this month, though recent misinformation campaigns on twitter misrepresenting the number of Daily Active Users in Decentraland has renewed our resolve to get access to this data published as quickly as possible.
DAO_DATA_COLLECT
The DAO_DATA_COLLECT application is architected to be as straightforward as possible to get up and running. The requirement to run this application is to have a server with docker installed on it, and a mongoDB database resilient enough to handle the throughput of data.
The application comes packaged with an install script that queries the catalyst smart contract – the official source of catalyst nodes supporting Decentraland – to get the current list as we expect this list will change over time. This install process sets up a docker-compose file ultimately used to deploy the application and host one instance of the collection agent per node.
This application is also designed with a feature to run redundantly. By configuring in the settings whether the deployment will be a “leader” or “follower”, you can deploy two separate applications to different servers and the “follower” will automatically begin collecting data if it detects the “leader” application is no longer collecting data.
Now one of the major points of this initiative was to reduce the amount of requests being sent to each catalyst node so we don’t advise that everyone run this application on their own or it would exacerbate the status quo. Instead we are aiming for complete transparency in how the data is being collected, and posting the code such that others can run it if we are no longer able to host the service ourselves (which we deem unlikely).
DAO_DATA_QUERY
The DAO_DATA_QUERY application can be run independently from the DAO_DATA_COLLECT application and provides endpoints that can be used to retrieve raw and derived data from the database. We’ve identified a number of core queries we aim to support at launch, most of which have been completed but the DAO_DATA_QUERY application will also support retrieval of derived data created by DAO_DATA_DERIVE which is on the roadmap for the next update.
The query currently does not require authorization as it is open to all Decentraland users, though a [configurable] rate limiter is built in to prevent abuse of the system. The intention is for users looking to obtain catalyst node data use these endpoints instead of catalyst node endpoints to reduce strain on the Decentraland core infrastructure and allow this server to provide content and analytics for user applications.
The following queries have been defined and will be available on initial release:
GET /islands – returns a list of the latest islands data from all monitored catalyst nodes. This query is meant to be more efficient than querying each of the nodes separately.
GET /islands/<realm> - returns islands data for a given node, as if it’s the catalyst server itself.
GET /users – Returns unique Decentraland players and a count per minute for the last hour
GET /users/<x>/<y> – Returns unique Decentraland players for given parcel coordinates x,y and a count per minute for the last hour
GET /signed-fetch/<address>/<x>/<y>/<since>/<tolerance> - Signed fetch call replacement. This endpoint returns True or False as to whether a user was in world by passing in a wallet address, an x,y parcel location, and the number of minutes to check from (e.g. 60 to check user presence over the last hour). Note: tolerance is optional and may be omitted to check that a user was present on a specific parcel.
Identified Future Queries that require Derived Data:
/uptime/<node> - 30 day uptime information for a given catalyst node
/daily-active-users – Unique daily active users in Decentraland
Next Steps
As mentioned, we’re about halfway complete with the code outlined by the architecture above. Throughout October we’ll be focusing on building DAO_DATA_DERIVE to create additional database collections with derived data which will allow us to complete DAO_DATA_QUERY. After this, we’ll build the final microservice for DAO_DATA_ARCHIVE which will back up and purge old data to help manage the cost of the infrastructure as the data begins to accumulate.
Once all code is complete, we can provision the production infrastructure, deploy the code, and provide access to the community. We look forward to sharing our continued progress in our next update one month from today.
-howieDoin