Let's get started, shall we? So the worker (at omarroth/archive) the code starts by creating a new
Worker
class. This is our basic worker.The
run
function creates a BatchProcess
and calls its run
. *sigh* So what does that do? Well it asks the server for a batch, pulls it up from a database, and retrieves the annotations for each of them ...which is done in yet another class, this one called AnnotationProcess
.So what does
AnnotationProcess
do? It does a request to YouTube to get the annotations. (The URL in the repository was changed after the fact. By me. Interesting.) How it gets those annotations is interesting: to make sure the worker is functioning properly, there is a trust system. A fresh worker won't actually get a new batch; it'll get one that's already been verified. As it gives more valid responses, it's more likely to get a new video. This way, the likelihood of getting garbage data is minimized, which is important for an archival project.Once all the videos in a batch have been downloaded, they're verified with the server and then uploaded to DigitalOcean Spaces, a cloud storage service. This goes on ad infinitum until YouTube decides to pull the plug.
And that is what (I think) the annotation worker did.
- glmdgrielson
No comments:
Post a Comment