Data Horde: How the Annotation Worker ...Worked

So the annotation thing. You remember that, right? Well, here is how the worker seemed to function. Note that I'm getting this information from a brief cursory glance (and chatting with one of the devs). I know it works because I had three of them running at any given time. But how? Uh, *shrug*

Let's get started, shall we? So the worker (at omarroth/archive) the code starts by creating a new Worker class. This is our basic worker.

The run function creates a BatchProcess and calls its run. *sigh* So what does that do? Well it asks the server for a batch, pulls it up from a database, and retrieves the annotations for each of them ...which is done in yet another class, this one called AnnotationProcess.

So what does AnnotationProcess do? It does a request to YouTube to get the annotations. (The URL in the repository was changed after the fact. By me. Interesting.) How it gets those annotations is interesting: to make sure the worker is functioning properly, there is a trust system. A fresh worker won't actually get a new batch; it'll get one that's already been verified. As it gives more valid responses, it's more likely to get a new video. This way, the likelihood of getting garbage data is minimized, which is important for an archival project.

Once all the videos in a batch have been downloaded, they're verified with the server and then uploaded to DigitalOcean Spaces, a cloud storage service. This goes on ad infinitum until YouTube decides to pull the plug.

And that is what (I think) the annotation worker did.

- glmdgrielson

Data Horde

Saturday, February 29, 2020

How the Annotation Worker ...Worked

No comments:

Post a Comment