Data Horde: 2020

Saturday, March 7, 2020

Taking a Screenshot

Screenshots, while easily faked, often serve as important records of the past. They are often used as records of chats and website user interfaces. Here’s how to take one:

iOS/iPadOS:

If the device has a home button: Quickly press the home and sleep/wake buttons. A screenshot will be saved to your camera roll. (Newer versions of iOS/iPadOS offer the ability to edit/share a screenshot before saving.)

If the device does not have a home button: Quickly press the volume up and sleep/wake buttons. A screenshot will be saved to your camera roll. (Newer versions of iOS/iPadOS offer the ability to edit/share a screenshot before saving.)

Windows:

All versions: Press the Print Screen (PrtScn) button. A screenshot will be copied to your clipboard and can be pasted in an image editor, Word document, PowerPoint presentation, etc.

Windows 7/8/8.1: These users can also use the pre-installed Snipping Tool.

Windows 10: In addition to the Snipping Tool, these users can also use the more modern Snip and Sketch app. Tip: in the Settings app, under Ease of Access, in Keyboard, you can check “Use the PrtScn button to open screen snipping” to use the PrtScn button to take a screenshot with the more robust Snip and Sketch app instead of copying the image to the clipboard. (I use this feature.)

Alternatively, you can also use the third-party Greenshot app to take screenshots also.

macOS:

All versions: Press Shift-Command-3. After a short delay, a screenshot will be copied to the desktop.

macOS Mojave or newer: These users can also use the Screenshot app to take screenshots.

Android:

Press and hold the power button until a menu pops up. Then press “Screenshot”. Otherwise, try pressing and holding the Power and Volume Down buttons for a few seconds. A screenshot icon will appear at the top of your screen. If these instructions don’t work, try looking for instructions specific to your device, as it can vary between devices.

—tech234a

For Linux users, there are a couple of packages you could use. The GNOME solution is GNOME Screenshot which lets you take a screenshot with the PrtScr button, though you do have to navigate through a menu to do so. Hit the print screen button, wait a few seconds, and it asks you where to save this image. (Useful for people who don't have a Paint analogue anywhere on their system.)

The KDE solution is called Spectacle which seems to function similarly. (I have no familiarity with this so I'm guessing here.)

- glmdgrielson

Friday, March 6, 2020

Goodbye, Blogger!

So, this page may or may not be our last post on this blog. Only been at it for two months and we're already giving up on it.

Well, not quite. You see, we've moved! We've found ~~some new overlords~~ a partner over at Gaming Alexandria who will be helping us expand, reach out, and do other fun things!

So ah, this blog may or may not be seeing too much love any more. Ah well.

Move to the Data Horde!

- glmdgrielson

Monday, March 2, 2020

Weekly Summary 03/02/2020

Get ready to dramatically throw off your coats, and say "welcome" to spring! As a wise man once said: "Spring is the time of plans and projects", everyone (including us here at Data Horde) has a few things in the works so get hyped!

But first, it's worth mentioning a couple of websites that are expected to shutdown within this month:

The self-proclaimed "tech tabloid" the inquirer and the music database FreeDB are scheduled for shutdown on the 31st of March. Both sites are slowly becoming more and more unusable but a good portion of the inquirer survives as Wayback Machine snapshots and there are a few freeDB mirrors, although a few programs which are hard-coded to access freeDB directly are expected to break.

Moving onto more hopeful news...

The brave web-browser is adding in a new built-in feature for detecting websites that have gone offline, so that's the good old 404 and its siblings. Upon receiving an appropriate error, brave will offer a button which will take you to the Wayback Machine to see if it can find any snapshots. In a world where one-click Google and Wikipedia are almost expected to be bundled with everything, we can only hope this built-in feature will inspire change industry to make all of our lives better.

You can read more details about this on the Internet Archive Blog: https://blog.archive.org/2020/02/25/brave-browser-and-the-wayback-machine-working-together-to-help-make-the-web-more-useful-and-reliable/

Finally there's been a recent status update by BlueMaxima of Flashpoint. His Medium article, starts with a big thank you to all the media outlets and people online covering the project.

"As a result. Flashpoint broke several records in terms of things like monthly unique visits. In fact, we had more unique people visit Flashpoint’s website in this month than the entirety of 2019. My servers were so stressed from people downloading Flashpoint that an automatic abuse switch flipped in my web hoster. That was fun to deal with!"

The rest of the article talks a whole bunch of things, ranging from technical improvements to rarities added to the collection such as the Late Chuck Jones' Thomas Timberwolf. The full article is a must read:
https://medium.com/bluemaximas-flashpoint/flashpoint-status-update-march-2020-6b1bd57e5df0

That's about it for this week...

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Saturday, February 29, 2020

How the Annotation Worker ...Worked

So the annotation thing. You remember that, right? Well, here is how the worker seemed to function. Note that I'm getting this information from a brief cursory glance (and chatting with one of the devs). I know it works because I had three of them running at any given time. But how? Uh, *shrug*

Let's get started, shall we? So the worker (at omarroth/archive) the code starts by creating a new Worker class. This is our basic worker.

The run function creates a BatchProcess and calls its run. *sigh* So what does that do? Well it asks the server for a batch, pulls it up from a database, and retrieves the annotations for each of them ...which is done in yet another class, this one called AnnotationProcess.

So what does AnnotationProcess do? It does a request to YouTube to get the annotations. (The URL in the repository was changed after the fact. By me. Interesting.) How it gets those annotations is interesting: to make sure the worker is functioning properly, there is a trust system. A fresh worker won't actually get a new batch; it'll get one that's already been verified. As it gives more valid responses, it's more likely to get a new video. This way, the likelihood of getting garbage data is minimized, which is important for an archival project.

Once all the videos in a batch have been downloaded, they're verified with the server and then uploaded to DigitalOcean Spaces, a cloud storage service. This goes on ad infinitum until YouTube decides to pull the plug.

And that is what (I think) the annotation worker did.

- glmdgrielson

Monday, February 24, 2020

Weekly Summary 02/24/2020

This week was rather calm, with mostly positive developments.

A quick update on the Bethesda forums from last week: The shutdown has been rescheduled to March 9th, so that's 2 more weeks from now to grab anything you still haven't backed up.

Besides that, the Internet Archive recently digitized a large collection of Russian books. You can read more on their blogpost: https://blog.archive.org/2020/02/21/russian-book-covers/.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Monday, February 17, 2020

Weekly Summary 02/17/2020

The focus of this week is a new Archiving Project on documenting information around us about the recent Coronavirus outbreak.

The Internet Archive and the IIPC are working together to collect resources regarding the recent Novel Coronavirus (Covid-19) outbreak. The goal here is not just to collect academic research information but also to document containment efforts and social effects. Any of the following are of interest:

Coronavirus origins
Information about the spread of infection
Regional or local containment efforts
Medical/Scientific aspects
Social aspects
Economic aspects
Political aspects

So go ahead, feel free to share news or shares on social media as well even if you don't have in-depth information on the matter.

If you would like to nominate any websites or web content covering the outbreak for archiving you can use this form here: https://forms.gle/iAdvSyh6hyvv1wvx9. The collection will later be made available to all at the IIPC's archive-it page: https://archive-it.org/home/IIPC, where you can already find some interesting examples from previous collections on the Olympics or the European Refugee Crisis to give you an idea of the goal here.

If perhaps you would like to share offline content such as photos of pamphlets you may have encountered at school or in the workplace, the Archive Internet archive encourages you to upload to a collection of your own, preferably tagging it as 'coronavirus' so people will be able to find it in the future. Here's a featured example: https://archive.org/details/2019-nCoV

For more information see https://blog.archive.org/2020/02/13/archiving-information-on-the-novel-coronavirus-covid-19/ and https://netpreserveblog.wordpress.com/2020/02/13/cdg-collection-novel-coronavirus/.

In other news, the old Bethesda forums (https://forums.bethsoft.com) are expected to be shutting down in a few hours. The forums had been read-only for the last 2 years or so, and there are quite a few snapshots. The community has already moved onto the aptly named bethesda.net/community.

That's about it for this week.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Saturday, February 15, 2020

Keep Circulating the Tapes

If you recognize the title, chances are you know what I'm about to talk about. For those of you who don't, the phrase appeared in the credits of the first four seasons of a show called Mystery Science Theater 3000. MST3K credits

If you're not aware of the series' premise and wondering why they're encouraging something like this, note that one, it was on a network that at the time was just barely getting started, namely, Comedy Central. Two, the premise of the show is that it features actual B-movies. Naturally, this means that getting the rights to episodes of their own show can be a bit of a hassle. Reasons for this range from Toho deciding no, you can't have your movie to the rights holder being so paranoid she won't even let it be shown at conventions.

So recording episodes and sending them to your friends became a part of being a fan of the show. There's even a mention in the episode covering The Magic Sword of a parent sending tapes to their son as a means of connecting with him. And of course, there are plenty of uploads all over YouTube.

- glmdgrielson, a big nerd

Monday, February 10, 2020

Weekly Summary 02/10/2020

The main event this week has been recovery efforts following the fire that broke out near the Museum of Chinese in America (MOCA) back in January.

MOCA is deeply saddened and shocked by the devastating fire at Chinatown’s beloved 70 Mulberry. The MOCA team stayed on site until hoses stopped last night. We have reached out to emergency conservators. Thank you for outpouring of community support re: MOCA archives. pic.twitter.com/QqD1vFU5kO
— Nancy Yao Maasbach (@YaoMaasbach) January 24, 2020

Initial estimates of how much material had survived the ordeal were rather pessimistic, with reports claiming that if the fire was unable to reach any artifacts the water used to put it out was. Conservators from museums across New York and other volunteers have however decided to give Sun Zu a run for his money by attempting to recover the likely drenched documents.

The current status report: An estimated one third of the archives has been retrieved from the building, and 80% of this retrieved batch have been afflicted with moisture but appear salvageable. In addition 35,000 digitized objects have been recovered from back-ups. For further updates you can visit MOCA's own page http://www.mocanyc.org/visit/ or follow MOCA's Twitter.

This damaged scroll was recovered from our archives. It says Museum of Chinese in the Americas - MOCA’s name before 2009. During transfer to our recovery area, it was accidentally placed upside down. Seeing it that way, it feels like the fire has turned MOCA’s world upside down. pic.twitter.com/Q0hAEi14bv
— Museum of Chinese in America (@mocanyc) February 10, 2020

MOCA is an active advocate for archive digitalization, and in fact they had an event "Digitilzation Days" scheduled for this week -which has been obviously cancelled- where attendees would get a chance to digitize family photos, documents, VHS's, cassettes tapes, and even vinyl!

A GoFundMe has been set up and for readers in the New York area those of you who might be interested in volunteering should contact firerecovery@mocanyc.org .

In other news, this week The Eye had quite an unusual hardware test. A fairly large (over 500 GB) collection of assorted magazines is being served "temporarily", with it being stated that they would likely later be handed off to the Internet Archive for safekeeping.

These aren't anything too old, mostly magazines from 2019 but also few from this year. You can expect to find anything from Vogue to three different regional editions of the Financial Times.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Saturday, February 8, 2020

MSN Messenger is Back!

Think about the first instant messaging service that you used. Chances are it has been discontinued many years ago and got replaced by today's giants such as Skype or Discord. Except that there is a chance someone revived it!

Introducing the Escargot MSN Server. Since 2017, you can use your favourite (supported) version of MSN Messenger once again whether if it's for nostalgic reasons or just for fun. Follow the instructions, insist your friends to use Escargot and voilà! You're back in 2005.

Some versions has more bugs than others, and not every function is supported yet but regular instant messaging is in working order.

If you want to find out more, you can visit their website: https://escargot.log1p.xyz/

Monday, February 3, 2020

Weekly Summary 02/03/2020

A warm welcome to a cold February!
This week we have some updates on Yahoo Groups, and a couple of shutdowns.

We recently passed the deadline for the shutdown of Yahoo Groups, but as of me writing this it seems that the "get my data" services are still functional. Currently Archive Team's own Tracker reports 2.76 TB of data to have been saved. The "Save Yahoo Groups" team (aka Yahoo Groups Fandom Rescue Project, aka Yahoo Gedden) hasn't fully tallied up their data yet, but have counted the number of groups they've downloaded so far to be about 84,655 and the number of groups they are still retrieving from to be about 38,000 for a total of 123K groups! We did a more detailed report on the situation earlier this week which you can read here.

The SRPG/Card Game Hybrid Duelyst was announced to be shutting down later into February a few days ago:

We are saddened to announce that the Duelyst servers will be shutting down permanently on February 27th, 2020 at 3:00 PM PST. Saying goodbye to any game is never easy, and it’s always hardest when you love the game as much as we all love Duelyst. We’re incredibly proud of the effort our developers have put in on Duelyst over the years, and even more proud of the amazing community of friends and gamers that have enjoyed Duelyst.

We want to thank all of you for being a part of the Duelyst family, and for making our time working on Duelyst the amazing adventure it has been.

As a final thank you, the prices of in-game items have been lowered significantly and refunds are being offered for recent in-game currency purchases.

Seeing as the game requires a server connection it will be as good as dead. There is already discussion on fan revivals/sequels on the r/duelyst subreddit for anyone interested.

Finally, the goth social network Vampire Freaks was shut down two days ago, on February 1st. The site's shutdown had been announced months ago, so it likely didn't come as a surprise to much of the community. A portion of the diaspora of users seem to have migrated to the Discord Server of r/goth for anyone who might be looking to find a few old friends.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Saturday, February 1, 2020

Why is this important? Or, the story of Victor Trembly

So let's start off with a story. It's 2013 and I've just found out Homestar Runner made a reappearance. Strong Bad gets his name wrong and calls him Homestuck. I check that out and eventually I decide to check out the forums. I get acquainted and even start my own little thing called NetHacked, starring Victor Trembly. (Female, by the way. Name and gender were picked by two different people.) This of course died a quick and swift death. Then I retool it into "Victor has gotten loose and I need to get him back". This is moderately successful in the sense that I have some sort of audience. I had just dropped forcing him to watch MM8 cutscenes, MST3K-style, before all of a sudden, the forums disappeared. As far as I know, there was no backup. This is not a good thing. Preventing that sort of nonsense is why we're here. Because people put effort into things and they deserve to stick around.

@
Victor says hi.

- glmdgrielson

Thursday, January 30, 2020

Saving Private Groups: This Time the Mission is the Fan

In a little more than 24 hours Yahoo Groups will be biting the dust, if you were once a member of a community or perhaps own and/or know an owner of a restricted or private group which you'd like to save I urge you to contact archiver1.fandom@gmail.com or join the "Save Yahoo Groups" Discord Server: https://discord.gg/DyCNddf . Without further ado, the Fandom Rescue Story...

Ah, the late 90's! A time with dial-up internet, people ranting about wasting too much time in front of their televisions instead of on their phones, and this book called Harry Parter about witches and wizards or something. It was a different time in many ways and it's a bit frightening how fast things have changed considering how chronologically recent it was. And yet, some things were quite similar, but the way one went about doing them was kind of different. Take for instance what you would do in your free time...

So you want to socialize online in 1997 huh? Unfortunately you don't really have anything like Twitter or Reddit; maybe you could go on Usenet or IRC, if you don't mind having an unreliable chat log or none at all. Then why not join a mailing list? It's perfect for discussing lizards with herpetology nerds or sharing your Thundercats fan-fiction with fans of the show.

Yahoo group writing children's non fiction | Essay writing ...

This age, the age of the mailing list was a time that many online communities flourished, particularly fandom. What had previously been restricted to (maga)"zines" and conventions had finally started to gain traction online. A mailing list at the time was a luxury akin to Slack/Discord servers of today, you could start a group with people who had a common interest without having to go through the tedious process of setting up new hardware, instead people got messages and notifications delivered straight to their inboxes. This rapid notification system allowed for communities that would no longer "sleep", you would have updates almost 24/7. And as one can imagine, having such a party that never ends was incredibly addictive, though word-of-mouth mailing list providers such as OneList, EGroups and finally Yahoo! Groups soared in popularity.

"Our Group has been chosen to participate in the Yahoo! Groups Beta Program and all of the features that Yahoo! is contemplating have been incorporated into our Group. Threads can now be linked as Conversations and are searchable. Posting photos and links is now much easier."

- Post on a group blissfully unaware of their eventual demise

Time however was cruel to the mailing list, the last giant to survive the era was the aforementioned Yahoo! Groups which tried to modernize with its web interface but was unable to keep up with the rapid growth of technology at the time. Eventually users began migrating to newer websites, and by 2015 the website resembled a ghost town.

(Image taken from: https://www.archiveteam.org/index.php?title=Yahoo!_Groups)

Still, many fan communities traced their origins to the mailing lists, with older members sometimes recounting terms, stories or jokes that originated in those days to the newer members. It's safe to say that these groups left behind quite a legacy-- which Verizon (Media) recently decided to wipe off the face of the earth.

In mid-October of 2019, it was announced that Yahoo Groups would be shut down, what followed was outrage. Although it was the true that most of the former user base of Yahoo Groups had indeed moved on to other platforms, members of the early online fandom community did see what was at stake and were some of the first people to spring into action.

On October 22nd 2019 Tumblr user zhie started a Discord Server "Save Yahoo Groups" (link above), the same day Morgandawn started a Tumblr blog: https://yahoo-geddon.tumblr.com/. These two outlets combined together to form Fandom's Sortie against Verizon's Yahoo Groups Siege.

'...People here have been doing massive numbers of searches for fandom groups. Of course, some of us already belonged to fandom groups, and in some cases we have people coming in, saying, "These are really great groups that I think should be saved."

As far as I know, there has been no formal archiving project for fandom Yahoo Groups prior to this. During the time that Yahoo Groups was most active, there were fan fiction archives that sometimes duplicated what was at Yahoo Groups. But an enormous amount of fandom content at Yahoo Groups has never been archived.'

- Yahoo Groups Archiving Volunteer

While Archive Team had also gotten involved right off the bat, they stated their goal to be grabbing as many public groups as possible. Whereas the fandom community wanted to ensure the survival of their groups, some of which had restricted access (were publicly visible but an invite was needed to join) or were private (not publicly visible).

The two teams worked in tandem; with Archive Team providing tools and logistics for backing up the data, and the SYG team which worked to sniff out the more obscure fandom groups and establish contacts with the restricted/private group owners. Of course both teams played a tremendous role in publicizing the whole event, even managing to secure an extension for people to get more time for backing up their data.

Both Archive Team and the SYG team made a number of group lists for groups which they found, again keeping their own focus Archive Team set out to grab the data from their public groups lists and the SYG team split their group list into tabs, which volunteers would claim and try to get access into.

Many hours of searching, exchanging mails and sleepless nights later and TB's of data have been rescued from certain destruction. Archive Team's own Tracker reports 2.76 TB of data to have been saved. The SYG team hasn't fully tallied up their data yet, but have counted the number of groups they've retrieved and/or are retrieving from to be around 123K!

The Yahoo Groups Story is a fine tale which shows how different teams with complementing abilities and backgrounds can work together to accomplish things neither could have done as good on their own. If you too would like to become a part of this story, you can head on over to the Discord server and see if you can reach any of the owners that they're looking for.

Monday, January 27, 2020

Weekly Summary 01/27/2020

2020 is both the start of a new decade, but it also concluded the previous one. Perhaps now is a good time as ever to be reflecting on how things are going? Are we getting by? Is everything ok? Do we have any financial issues that require immediate attention? I'm speaking for all of us here-

and also companies...

This week most news is going to be on shutdowns...

A number of Tetris mobile games are going down. In what is thought to be a licensing change, EA's Tetris® and Tetris® Blitz games were announced to be shutting down in April of 2020. In fact the downloads for the Play Store seem to have already gone offline. N3TWORK Inc. who appear to be the new licensee seem to have uploaded their own version. This version is also available on the App Store together with EA's Tetris® (relished as Tetris® 2011) and Tetris® Blitz.

EA left a short message on the App Store versions of the game as their statement:

We have had an amazing journey with you so far but sadly, it is time to say goodbye. As of April 21, 2020, EA’s Tetris®/ Tetris® Blitz app will be retired, and will no longer be available to play.

Kindly note that you will still be able to enjoy the game and use any existing in-game items until April 21, 2020. We hope you have gotten many hours of enjoyment out of this game and we appreciate your ongoing support. Thank you!

Another game that's going silent is SingStar... It was a series of karaoke games for the Playstation 2 (later also PS3/4). Although the games will still be playable, servers and other online features such as the in-game store will be shutdown.

The SingStar Team released a bittersweet goodbye message on their website:

Announcement

After 15 incredible years, we have made the difficult decision to shut down the SingStore servers on 31 January 2020. After that date, you will still be able to enjoy your downloaded songs, but all online functionality and network features will no longer be available, and you will not be able to purchase any new songs from the SingStore. Any SingStar content you have shared to SingStar.com will be deleted. For PS3 users, please check that any digital purchases you have made and would like to keep have been downloaded onto your console before 31 January 2020. If you are a PS3 user and delete any content after that date, you won’t be able to redownload it.

Of course, any SingStar discs you have previously purchased will continue to work in offline mode, though you will not be able to obtain all trophies. You can read more about the digital store’s closure here: https://www.playstation.com/en-gb/legal/gameservers

We’ve loved watching the community grow, and have lots of fond memories from working on SingStar. Your support over the years has meant the world.

Thank you and lots of love,

The SingStar team

While on the subject of music, 8tracks the playlist sharing platform is another record that's been scratched... It's shutdown was revealed in a menacingly titled blogpost "To everything there is a season" by David Porter, one of the site's creators. It covers many topics, from the site's inception to the rise of modern music streaming.

Archive Team was pretty fast to catch on, and virtually the entire site has already been saved.

It's fortunate to see the site will be able to leave some form of a legacy. To quote a portion of the blogpost:

We — the remaining team at 8tracks — all think it’s still to hard to find playlists with a “soul behind the music.” User programmed playlists on Spotify and YouTube are great, but they remain relatively hard to navigate to find the best ones for a particular person’s taste, time or place. And there’s not (as yet) an ecosystem to allow curators to flourish. There’s still work to be done.

Last, but not least, Flashpoint rolled out their 7.1 update just a few hours ago so feel free to check that out.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Saturday, January 25, 2020

Tutorial: Back Up a Web Page or Web Site

Let's say there is a website you visit on a regular basis that contains useful information that you might not want to lose access to. This could happen for a variety of reasons, from the site owner deciding not to pay for hosting any more to a company deciding to replace the site with one that is supposed to be improved, but is instead more complicated.

Backing Up a Webpage for Yourself and Others with the Internet Archive Wayback Machine

The Internet Archive is a nonprofit organization dedicated to preserving digital data. One of their projects, called the Wayback Machine, keeps copies of webpages dating all the way back to 1996, and is considered trusted for citing a web page as it existed at a particular moment in time.

From the Wayback Machine homepage, you can use the large text box near the top center of the page to search for sites and pages that have already been archived. Entering a specific URL will open a calendar show all of the times that specific webpage has been archived, and you click on one of those dates to view a webpage how it existed at a specific moment in time. You can also prepend https://web.archive.org/*/ before a URL to view the list of archived versions of a page for a specific URL.

To back up a webpage and add it to the list of archived versions on the calendar page, you can use the small text box in the bottom right portion of the Wayback Machine homepage, or go directly to the Wayback Machine's Save Page Now tool. On that page, you can optionally check the "Save outlinks" box to also save the pages that are linked from the page URL you have entered. Once you click the "Save Page" button, the page you have specified will immediately and permanently be saved to the Wayback Machine for yourself and others to access.

Note: While some sites do block the Wayback Machine from displaying archived pages from their site, most do not.

Backing Up an Entire Website for Yourself and Others to the Internet Archive Wayback Machine

If you are aware of an important site that is disappearing soon and would like to save it to allow the public to access in the future, you can contact Archive Team on IRC by joining the #archiveteam channel on the HackInt IRC network. They have special tools that can download entire sites and then upload them to be accessible in the Wayback Machine.

Backing Up an Entire Website for Yourself Using WGET

If you want to back up a copy of a website for personal archiving or offline access purposes, you can use an open-source command-line utility called WGET. Most Linux distributions have it installed by default, and it is also available for macOS, Windows, and other platforms.

When you use this method, you are downloading a copy of a web site to your hard drive, and therefore will need enough disk space to store your copy of the website. In addition, your archive might not be considered a trusted citation by others in the future because it is easy to modify the content of the website you downloaded.

To start, open a command line to the folder to which you want to download your site. Then, you can type the following command:

wget --recursive --page-requisites --convert-links [website homepage address]

The --recursive switch tells WGET to search for links within a page, download them, then search those pages and download the links in those pages until all linked-to pages have been downloaded from that site.

The --page-requisites switch tells WGET to download any style, image, script, or other files that are needed to correctly display a webpage.

The --convert-links switch tells WGET to adjust the links in the pages after they are downloaded so they will work on your computer. Links to pages on the website are replaced with relative links on your computer.

Additional useful command switches:
--no-check-certificate is useful when your certificate trust store has problems or a site does not have a valid HTTPS configuration any more.
--no-parent prevents pages that are above your starting page's directory from being downloaded. Useful if the site you want to download shares a domain with other sites.
--timestamping allows you to update a previously-downloaded site by downloading only the pages on the server that are newer than the pages on your computer.
--mirror combines several of these command line switches
--span-hosts allows links to other sites to also be recursively downloaded. Be careful with this option, as you might end up downloading a large portion of the internet!

You can run wget --help for a full list of command switches.

Conclusion

There are many easy ways to save a copy of a webpage or website for the future. If you want to save a single webpage and optionally the webpages that are linked in that webpage, save it to the Internet Archive Wayback Machine. If there is an important website that may disappear soon, notify Archive Team. If you want a copy of a website for yourself, use WGET. The important thing is that you remember to archive the information before it is gone forever!

Monday, January 20, 2020

Weekly Summary 01/20/2020

This will be our very first weekly summary, part of a series where we talk about recent developments in the archiving community. Since this is our first one, we're going to keep the timeframe a bit wider than just a week.

So to kick things off, let's start with the single event that gave internet archiving the most publicity it's received in quite a bit: the Yahoo! Groups shutdown panic.

Yahoo! Groups was a combination of a mailing list and web forum. Many communities (particularly fandoms) were frequent users it in its hay day. On October 16th of 2019, Yahoo announced that after 12 days content uploads to the site would be disabled and that in less than a month the site would be going down.

Immediately, archivists and community members began sharing lists of important public groups and people sprung right into action! Unfortunately a lot of precautions were being taken against the archiving efforts, such as a ban on Archive Team members who'd made accounts in order to mass-download data from the groups. This however generated a lot of backlash, receiving coverage on Motherboard, npr and even in the Washington Post.

A rarity for this sort of thing, these efforts culminated in getting Yahoo to extend its deadlines to give people a more reasonable amount of time:

We have extended the deadline for Yahoo Groups and will now process ALL requests to download data that are submitted before 11:59 PM PT on Jan 31, 2020 (originally Dec 14). As long as the request meets this deadline, the content will not be deleted until the download is complete.
— Yahoo Customer Care (@YahooCare) December 10, 2019

Archiving efforts are still ongoing and Archive Team's project tracker currently reports about 2830 GB's of data to have been backed up, be sure to check the Archive Team page for more details. We also plan to do a a feature or two on the whole ordeal ourself, hopefully in the coming days.

Closing things from last year BlueMaxima's Flashpoint and Lost Media Wiki both posted their end of year updates...

The ominously named Flashpoint 7.0 "Eight Thousand Hours" was released, in reference to the approximate amount of hours remaining until the discontinuation of Adobe Flash Player, although the amount of time remaining as of this post is perhaps closer to 7200 hours. The project for preserving online games, animations etc. made with flash is still ongoing and you should definitely check https://bluemaxima.org/flashpoint/ where you can find a lot more information as well as a link to their Discord. 7.0 features over 8000 new games.

Fun fact: Did you know that there was actually a scrapped Toy Story movie?

It was going to involve Andy's toys going to Taiwan to get a malfunctioning Buzz repaired. A draft for the script was among the found media featured in Lost Media Wiki's last noticeboard for the year. Other interesting items include the final episode of The Hanna-Barbera Happy Hour, a Japan exclusive (20 year-old) DLC for Sonic Adventure and three episodes from a puppet show called "Binyah Binyah!" with a very short run on Nick Jr.

As we enter into into 202X, we have some signs that archiving frequency and awareness have gone up. Recently Brewster Kahle tweeted an impressive statistic on Internet Archive's Wayback Machine:

Wayback Machine just grew to 881,352,519,000 web URL's. That is 881 Billion. For every one that becomes important in the news or in someones personal world, we crawl and store millions of others just-in-case. go @internetarchive
— Brewster Kahle (@brewster_kahle) January 10, 2020

On the subject of Tweets, archive.is a popular choice for archiving "Web 2.0" sites by also saving graphical copies, without having to change to a lighter layout. This makes it an alternative to the Wayback Machine in some regards. It's especially become popular for archiving Twitter and Reddit threads for its "saving things as is" reputation.

However a couple of weeks ago users began to notice that they weren't able to save certain Twitter pages and a somewhat humorous exchange took place on the archive.is blog:

Finally let's talk a bit about TV news archiving. For our US readers, Recorder: The Marion Stokes Project has recently began screening. It's a documentary on the story of a TV producer/activist who built up over the course of 33 years one of the largest TV news recording collections to date, starting in the midst of the 1979 Iranian Hostage and ending at around the 2012 Sandy Hook massacre.

Recorder: The Marion Stokes Project Trailer from Matt Wolf on Vimeo

The documentary was directed by Matt Wolf and premiered at the 2019 Tribeca Film Festival.
You might be able to catch it this week in San Francisco, Asheville or Washington. Check out their website for more details.

Following Stokes' passing, the collection was donated by her son to the Internet Archive, who've since been working on its digitization so it can be accessible to all. This is a great time to check out the TV News archives which have been a bit more overshadowed compared to website and book archives on the same site.

Showcasing the worth of these archives the GDELT project recently made a tool utilizing the Internet Archive's Third Eye API.

As a first hack at using @kalevleetaru's new Chyron Explorer tool for @TVNewsArchive https://t.co/Hj2Wznezo4
I looked at differential sample word usage in all-news networks. CNN prefers “terror” in their Chyrons more than FOX, and FOX uses “terrorist” more than CNN pic.twitter.com/nQX5AvHydJ
— R Macdonald (@r_macdonald) January 7, 2020

What are chyrons? The bits of motionless text you see at the bottom of your screen often accompanied with an emphatic "Breaking News". The tool which was endorsed by the Internet Archive's own Roger MacDonald allows people to query various metrics on chyrons, such as word frequency.

And thus that brings us to end of this week's summary. I hope this was an enjoyable read for you all, if anything I can feel good about having taught people what on earth a Chyron is.

Stay tuned for next week's summary, where we'll hopefully see each other again same time next week at EST 9 AM!

Thursday, January 16, 2020

Bonjour

Hello everyone, I am Doritos Man. How did I end up here, you may ask? Well, let me tell you a story...

Earlier last year, I realised that YouTube was about to delete all of their annotations. That would mean that many videos would loose important information and other videos would loose their interactivity.

After making efforts to make people archive as many annotations as possible with the help of screen recorders, a YouTube user contacted me.

Eventually I found myself on a Discord server with people dedicated to archiving annotations before the deadline (the right way, by saving the annotation codes rather than simply recording the screen).

This experience taught me the importance of archiving. Without it, some (possibly important) information could be lost forever. And this is why I am here today. I am here to help in the process of archiving content to avoid losing any data that we might regret not having in the future.

Also seeing the decline of Adobe Flash player is also very sad to me. I, like many of you, grew up playing Flash games on sites. Luckily many teams of archivists are already working on saving as many games before it's too late. Without them, where would the Flash games be after the Flash websites remove the games due to incompatibility? Memories would be lost. Games would never be played again. Also kudos to Newgrounds for creating the Newgrounds player, which would give the user the ability to play Flash content again on their platform!

That's it for now. I am not the best at writing; I hope my introduction wasn't too bad.

See you soon.

Wednesday, January 15, 2020

Post-it Notes

So I'm looking around and I find an automatic level of Yoshi's Island on YouTube by some guy --wait, no, Some Guy. I follow him for a good while. There's a minor crisis when one of the features of his videos is announced to be discontinued: YouTube annotations. This was how he delivered commentary. Those going away would be a bit of a sticking point.

Flash forward to about a year later. He's now using VideoPad and griping about the inability to go back and fix his typos (of which there are about 400 as of this current writing). A new project is started for a SMW hack called Magical Crystals. But there's an ominous note in the description: annotations are going to completely disappear. I of course took this news spectacularly well. And by that I mean I have a backup of all of his videos on a flash drive.

I opened up youtube-dl and got six channels in their entirety out of fear: the previously referenced SomeGuy712x, PinkKittyRose, qzecwx which I'm surprised I remembered how to spell, Shinryu, Kit, and SuperMetalSonic360. It was only after doing that that I realized I was wasting time and storage space. Good job, me.

I then opened up a repository collecting various channels' worth of annotation data, which you can find here. I looked around for other efforts to save annotations. (This is the only reason I have any sort of social media, namely a Reddit account, by the way.) And that is when I found the Discord server now known as the Internet Trash Heap. They had a worker to find and archive annotation files. I naturally ran three of them. We saved 1.4 billion videos worth. (Yes, I did have to look that figure up.)

In the end, they now survive on Invidious, thanks in part to my only merged PR. I feel a certain level of pride about this. Certainly why Fukkireta brings back memories for me.

- glmdgrielson

——

Over the past year, a lot of progress has been made in restoring YouTube annotations.

When YouTube removed annotations on January 15, 2019, they removed the XML data for them, but not the code needed to render them. Because of this, I wrote a Firefox extension called AnnotationsReloaded, which replaced the response to the request for annotation data with the archived data. This worked until YouTube removed the rendering code from their player, which had started to break in August and was almost completely gone by the end of September. Because of this, glmdgrielson, afrmtbl, myself, and others began to work on re-implementing annotation rendering with new JS code. This code is now used in Invidious and in a new browser extension, Annotations Restored.

Also, an API has been added to Invidious for retrieving annotation data. A Reddit user by the name of Archivist has also collected some YouTube metadata, so we may be able to further expand the collection of data available on the API in the future.

- tech234a

——

The thing about the YouTube annotation removal was that it was so unnatural that even a year later the layout for it has yet to be updated to signify their removal.

General (vocal) consensus at the time was either neutral, or in favor of the removal. The feature was likened to the relic that video responses were, a feature whose removal was also celebrated for its outdatedness (at least in contrast to the rest of the website) and frequent abuse.

Yet it would seem that behind the vocal cheering was silent weeping... That common silence is strangely what brought us all together, which is kind of ironic when you think about how annotations themselves were at one point just as valid of an alternative to comments for communication between YouTube users, going back to video responses, with their removal collaborators began linking their videos or channels through the annotations.

The amount of links between annotations I found was scary, back in January of 2019 while trying to make a tool of my own for annotation back-up, I remember starting on a list of chuggaaconroy videos and only crawling from annotation link to annotation link I was able to scan maybe 100 thousand videos or more, which unfortunately weren't backed up because of some stupid oversights. But even with what we did recover I'm fairly certain it's possible to see that interlinkage.

So in fact, those of us freaking out in that large web of thousands of videos all knew each other, maybe like neighbors, we just hadn't met yet... Funny enough my own biggest contribution was perhaps bringing some publicity to the whole endeavor.

The last few days remaining... I thought I was alone until I was able to hit the jackpot with a certain post I made on Reddit. It was just a set of tools I'd made so people could back things up locally or on the waybackmachine. They weren't professional to say the least, with the .exe versions being actually broken. Yet I was able to stir much discussion in the comments helping a lot of people -myself included- find out about several independent archiving efforts, including the group who would later go down in history as the Internet Trash Heap.

That thread as far as I'm concerned is a treasure in its own right, and even if the actual topic that post was about is no longer relevant it really reflects the zeitgeist of those anxious days.

I'm glad to report those are just pleasant memories now...

- the Mad Programer

——

I, unfortunately, did not contribute much directly to archiving annotations. I first found out annotations were going to be removed completely from a tweet by Neil Cicierega about how he was editing the descriptions of an interactive video project he made so that the project would work without annotations. Being someone who already cared about loss of digital heritage, I was quite worried about all the information which would surely go away. I tried in vain to beg YouTube via social media, but obviously this didn't do anything. However, I did find the Discord server dedicated to mitigating this problem by gathering up all the annotations before they could be deleted. In the end, while not all annotations could be grabbed, the project was still able to scan billions of videos and thus saved a sizable portion of them, especially ones from popular channels. While I don't especially miss the feature (though YouTube hasn't really added anything new which could replace their functionality), I am glad that this chunk of Internet history was not lost forever.

-A.S.t.R.I.

Tuesday, January 14, 2020

Hey there

Hey there,

I'm the mad programer, though I do go by many names online. As one of the founders of the blog I might as well give you all a short backstory...

As seems to be the case for most us founders, my story too begins on YouTube. Let's all go back to the start of the last decade... So there I was reading through comments about how in a little game called Pokemon Platinum there was a glitch that made the game time count up to 998 hours and 59 minutes before immediately jumping to 999 hours 59 minutes and staying there forever. Someone mentioned how there was a video for it but the uploader had privated (later delisted? and even later yet reintroduced) it. On that momentous occasion I was introduced to the waybackmachine, a website that could send me way back and see websites and videos from the past. Since this was long before the fear of online immortality really hit companies, videos from the time period (circa 2010) still work although they require flash (which might not work for much longer). If you were to try a more recent video, it'll probably just catch the related videos and comment section if you're lucky.

After that I went about my life... Looking for something to do on long school bus trips I began binge-reading Wikipedia. Obscure movies, weird art movements, random villages, a whole bunch of history and languages... You'd think someone who'd gotten so involved in the website would have started contributing but if anything this period made me into a wiki-skeptic. For one reason or another I felt that if I were to contribute it would either be insignificant as I wouldn't have too much to contribute outside of Wikipedia, and even that would be challenged by other users who had this same website as their primary source. This was my second encounter with the waybackmachine. Skimming through the references on Wiki pages I discovered many interesting websites, some of which were now offline. To keep them alive people would link to screenshots on the waybackmachine. I'm not sure if I discovered the trove that was archive.org from these citations or from messing around the waybackmachine, but either way I'd found myself a new library where I didn't feel any such social alienation. The next thing I knew I was picking up old books from the 18th-19th century that time had long forgotten.

Yes, archive.org had now become my new pastime, but still I was merely a reader. I wasn't doing much to contribute, maybe except for a handful of times that I'd archived a page or two I wanted to keep on life-support. Mine was a slow descent down the rabbit hole, that is until the YouTube annotations mess which really gave me the drive to dive into the thick of it. Since then I've made small tools of my own (mostly outdated) and at the very least sacrificed some of my computer's power as manpower for projects hosted by other people.

Data Horde is my way of giving back to a community that really sheltered me through some turbulent times in my life, I can only hope I'll do a half decent job.

Nice to me you all...

The Data Horde

From behind the mountain, beckons a dark specter

Stretching thinly as it grows ever closer

The people are overrun with fright and terror

Yet the army to liberate them draws near

In an age where even the greatest chore can be made into a convenience one would expect that life would be effortless. Perhaps in some ways it has indeed become so, but with that effortlessness comes laziness... When one no longer needs to, why should they? Thus was born a pandemic of our times, a disease of negligence and apathy...

Everyone and everything competes for worth and attention. Who you'll spend your day with today, what works you need to finish, what songs you'll listen to, what books you'll read, what albums you'll skim through... In a fair competition it is assumed that everyone plays by the rules, sad to say this isn't one of those. Cheating is encouraged, everything goes. No time for chat, no money for books, no knowledge on that subject, no time, no money, no knowledge...

Perhaps it's better to throw some things away, to rid oneself of burden. That man who spoke those words is no longer me, that girl in the photo was never me to begin with... A collective disdain for the past makes them trail blindly after a future that they can never reach as by the time they've gotten there it has no worth left to claim. Good riddance! May those moments rot as they disintegrate buried deep within the sands of time! Yet others weep, as they've gone too far and can no longer find their way back. They were misfortunate forgetting to neglect leaving a trail, or leaving a trail of breadcrumbs that the creatures of the woods made their evening snack.

From this chaos emerged an order, from the victims heroes...

Data Horde is a blog we're starting to help promote data preservation, particularly online. Although there are many small contingents who define themselves as archivists or similar, most act as lone wanderers helping save whatever is in their vicinity. To remedy this, we set out on our mission with two main goals:

To help unite these independent groups by acting as an intermediary platform for them.
To inform others about archival projects and how they might get into a field which currently has a very steep learning curve or at the very least teach them about what they can do to spare themselves of trouble in the first place

We'll periodically be posting news related to archivist groups such as

new tools and technologies
announcements regarding online -or perhaps even offline- information that is at risk of disappearing ( servers shutting down, websites removing features or content based on abrupt policy changes )
ongoing projects, talking about who are coordinating them and how one can get involved
and results of completed projects such as community revivals

As for the more educational content, we plan to post

tutorials

on how to recover lost data which has been archived
on how to use data archive API's for tools and products you as a developer might be interested in designing with
on how to use available archiving tools for preserving data
on recording equipment
on storage
on redistribution of data

guides for sub-communities which have specific focusses

Archive Team which is more oriented on preserving websites
archive.org which also puts an emphasis on digitization of offline data
Lost Media Wiki which works to recover lost media by hunting recordings

interviews with members from the archiving community
essays and documentaries for encouraging discussion on more complex concepts that aren't very discussed outside of the field -or even inside in some cases- which we'd like to make the more the general population conscious to

If any of this sounds interesting to you, well stay tuned... We're out to save you and hopefully soon you yourself too, will be able to find some small way you can save someone else's world.

I'm The Mad Programer (single m) and I'm honored to welcome you all into The Horde!

Saturday, January 11, 2020

Introducing Myself

Hi!

I'm tech234a, and I look forward to being a contributor to this blog! I've have been interested in archiving for a while now, and think it would be cool to help keep the community informed about the world of archiving.

Some of my most major projects that I have led include the archiving of Google+ Comments placed on Blogger blogs as well as the archiving of video and audio content from G Suite Training/Synergyse. I have also helped out in developing tools to help access YouTube annotations after they were removed. (I made a Firefox extension called AnnotationsReloaded, but it doesn't work any more. I have contributed to the Annotations Restored project, which does work.) I have also helped out with data discovery on plays.tv and various other websites.

I am an active participant in the Internet Trash Heap Discord server, and I also follow Archive Team activities on IRC fairly regularly.

Anyway, I look forward to helping bring you quality content from all parts of the world of archiving, through regular summary posts as well as through project feature posts.

I encourage you to stay up-to-date with the latest archiving news by following this blog with your RSS reader.

Hope to see you soon!

Friday, January 10, 2020

Hello World!

My name is glmdgrielson. I am here with tech234a and themadprogramer. Our job is to make sure the stuff you love doesn't get thrown away because some admin with a severe lack of foresight forgot to make a backup. Preservation is important, dang it! We'll be talking about upcoming crises, projects that need attention, and who knows what when we run out of material. Our mission is simple: to get this message out there and to make sure stuff doesn't disappear. Also, go check out the Archive Team!