Backing up your AssemblerGames PMs/Conversations before the site goes down

2049 Donator
Donator
Registered
Joined
May 31, 2019
Messages
330
Reaction score
322
Points
63
If, like myself, you've been pretty active in private conversations on AG, you might have tens of pages of conversations, some of them with tens of pages of comments in them. That's potentially many thousands of replies. Some very important information and files might be in that mess, and you might feel anxious about loosing them forever. Xenforo doesn't allow to backup your PMs, probably so someone can sell an add-on that does it, but fear not, everything that you can access can be backed up.

Instead of copying thousands of posts by hand, and still miss the deadline, here's a mostly automated procedure. It could easily be adapted to backup your own favorite threads/PMs too if you want a personal backup of a few things.

********

You'll need to use the WGET, FIND, and SED programs. They should be easy to obtain on Linux and MacOS. Windows users might have to look into cygwin, Windows subsystem for Linux, or alternatives. Just google it.™

  1. Install Firefox and the export cookies addon: https://addons.mozilla.org/firefox/addon/export-cookies-txt/
  2. Log in into AG, making sure to tick "stay logged in". Use the addon to export the cookies for AG to a text file, mine is called "cookies-assemblergames-com.txt".
  3. Create a new folder, place the cookies file inside and open a console/terminal in the same folder.
  4. Run the following command:
    wget -mkEpnp --execute robots=off --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/
  5. WAIT!
  6. Rename the newly created "assemblergames.com" folder to A.
  7. Run the following command:
    wget -mkEp --execute robots=off -I/attachments/ -I/data/ -I/conversations --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/
  8. WAIT SOME MORE!
  9. Rename the folder "assemblergames.com" to B.
  10. Create a third folder called Final, copy the content of B to it.
  11. Copy the content of A over Final, overwriting/merging everything that was already there from B.
  12. Change the _bH variable to ./ (current dir) in every html files. In linux and probably OSX:
    find ./Final -type f -exec sed -i -e 's/_bH = "https:\/\/assemblergames\.com\/";/".\/"/g' {} \;
  13. Fix links of attachments:
    find ./Final/conversations/ -type f -exec sed -i -e 's/"https:\/\/assemblergames\.com\/attachments\//"\.\.\/\.\.\/attachments\//g' {} \;
  14. Profits $$$
  15. Like and subscribe!
You should now have an offline backup of your conversations/PMs in the Final folder. The main html file to open with your browser is Final/conversations/index.html

When you click on an attachment, it'll open a basic file browsing page with an index.html file, that file is actually your attachment; right-click, save as, choose a proper filename/extension.

Good luck!

Here's the script I tested on Linux, worked fine for myself. It took a few hours to scrape everything and my backup ended up being around 600 MB, 60 MB compressed.

Code:
#!/bin/bash

wget -mkEpnp --execute robots=off --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/

mv "assemblergames.com" A

wget -mkEp --execute robots=off -I/attachments/ -I/data/ -I/conversations --load-cookies=cookies-assemblergames-com.txt https://assemblergames.com/conversations/

mv "assemblergames.com" B

mkdir Final

cp -rf B/* Final/

cp -rf A/* Final/

find ./Final -type f -exec sed -i -e 's/_bH = "https:\/\/assemblergames\.com\/";/".\/"/g' {} \;

find ./Final/conversations/ -type f -exec sed -i -e 's/"https:\/\/assemblergames\.com\/attachments\//"\.\.\/\.\.\/attachments\//g' {} \;
 
Last edited:
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
Thanks for posting this. I'll also be contributing a process in the next day or two that uses a custom build of httrack for Windows, plus a quick fixer tool afterwards to do some cleanup and repairs. I've used that process to backup my own PMs successfully, and am currently in the process of finishing a full backup of assemblergames (with 0th bit) and all external image links, which I believe will finish sometime today. That'll provide an easy option for Windows users.
 
2049 Donator
Donator
Registered
Joined
May 31, 2019
Messages
330
Reaction score
322
Points
63
plus a quick fixer tool afterwards to do some cleanup and repairs
I'm all ears if you want to detail those fixes. I don't know much about html and my procedure here is very crude. I also have very little free time until D-day, so I'd take pointers for sure!
 
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
I'm all ears if you want to detail those fixes. I don't know much about html and my procedure here is very crude. I also have very little free time until D-day, so I'd take pointers for sure!
Well I basically just take httrack, which is possibly some of the worst code ever written, and patch it to make it not totally suck by:
-Fixing some catastrophic performance issues that make it basically run forever normally
-Make it actually support pages with unicode characters in the URL without running off the rails
-Add the concept of priorities so I can get it to (for example) scan all the forum index pages up front first to completion
-Fix and address a few dozen other things that can cause it to spear off track and mirror half the internet
-Other changes I did last year when mirroring AG that I've forgotten about

Once that's done, I can mirror the site in a fairly clean fashion, and httrack takes care of most of the hard work in mirroring the content and fixing links. I still need to monitor it closely as the scanning rules are vague and not fine-grained enough, so it can still decide to mirror an entire remote site if a .jpg ends up being a 404 html page with a link back to the root for example, but I can deal with that as it happens. Afterwards, I fix attachments by renaming them to actually be the original file types rather than binary content in html pages and fix the links, and disable the javascript code on AG pages that breaks all the links in a local mirror. After that, it's basically done, and the result is a browsable snapshot with all local content from the site in question and a select list of remote content included (primarily images).
 
2049 Donator
Donator
Registered
Joined
May 31, 2019
Messages
330
Reaction score
322
Points
63
Afterwards, I fix attachments by renaming them to actually be the original file types rather than binary content in html pages and fix the links, and disable the javascript code on AG pages that breaks all the links in a local mirror. After that, it's basically done, and the result is a browsable snapshot with all local content from the site in question and a select list of remote content included (primarily images).
That's the part I'm interested in. If you have code, I'm interested.

Why don't you use wget? Surely there's a Windows version somewhere?
 
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
I wrote a tool in C# to do most of the heavy lifting. It's throwaway code, but here you go:
Code:
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace UpdateAssemblerGamesLinks
{
    class Program
    {
        static void Main(string[] args)
        {
            string rootPath = args[0];

            string assemblerGamesPath = Path.Combine(rootPath, "assemblergames.com");
            string attachmentPath = Path.Combine(assemblerGamesPath, "attachments");

            Console.WriteLine("Renaming attachments");
            string[] attachmentFiles = Directory.GetFiles(attachmentPath, "index.html", SearchOption.AllDirectories);
            Console.WriteLine("Found {0} files", attachmentFiles.Length);

            int filesPerIncrement = attachmentFiles.Length / 100;
            int fileNoPerStep = 0;
            int currentFileNo = 0;
            foreach (string filePath in attachmentFiles)
            {
                if (fileNoPerStep == filesPerIncrement)
                {
                    Console.WriteLine("{0}%", currentFileNo / (filesPerIncrement > 0 ? filesPerIncrement : 1));
                    fileNoPerStep = 0;
                }
                ++currentFileNo;
                ++fileNoPerStep;

                string attachmentDirectoryPath = Path.GetDirectoryName(filePath);
                string attachmentDirectoryName = Path.GetFileName(attachmentDirectoryPath);
                string attachmentNewFileName = Path.GetFileNameWithoutExtension(attachmentDirectoryPath);
                if (!attachmentDirectoryName.Contains('.'))
                {
                    continue;
                }
                int indexOfSeparator = attachmentNewFileName.LastIndexOf('-');
                if (indexOfSeparator < 0)
                {
                    Console.WriteLine("Warning: Failed to locate extension separator for \"{0}\" in path \"{1}\". Assuming extension only.", attachmentNewFileName, attachmentDirectoryPath);
                    attachmentNewFileName = "." + attachmentNewFileName;
                }
                else
                {
                    StringBuilder stringBuilder = new StringBuilder(attachmentNewFileName);
                    stringBuilder[indexOfSeparator] = '.';
                    attachmentNewFileName = stringBuilder.ToString();
                }
                string attachmentNewFilePath = Path.Combine(attachmentDirectoryPath, attachmentNewFileName);
                attachmentNewFilePath = @"\\?\" + attachmentNewFilePath;

                try
                {
                    File.Move(filePath, attachmentNewFilePath);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.Move for file \"{0}\" to \"{1}\": {2}", filePath, attachmentNewFilePath, ex.ToString());
                    continue;
                }
            }

            Console.WriteLine("Editing html files");
            Regex regexAttachment = new Regex(@"\.\./attachments/(.+)-(.+)\.([0123456789]+)/index.html", RegexOptions.Compiled);
            Regex regexPhotobucket = new Regex(@"\.\./\.\./\.\./(.+)\.photobucket\.com/albums/", RegexOptions.Compiled);
            Regex regexSentinelFix = new Regex(@"data-baseurl=""(.+)page-\{\{sentinel\}\}""", RegexOptions.Compiled);
            Regex regexPollViewResultsDisable = new Regex(@"<input type=""button"" value=""View Results"" class=""button OverlayTrigger JsOnly"" data-href=""(.+)/poll/results"" />", RegexOptions.Compiled);
            string matchStringAddressRebase = @"if (_b && _b.href != _bH) _b.href = _bH;";
            string matchStringAddressRebaseNew = @"<!--if (_b && _b.href != _bH) _b.href = _bH;-->";
            string matchStringHTTrackMarker = @"<!-- Added by HTTrack --><meta http-equiv=""content-type"" content=""text/html;charset=UTF-8"" /><!-- /Added by HTTrack -->";
            string[] files = Directory.GetFiles(assemblerGamesPath, "*.html", SearchOption.AllDirectories);
            Console.WriteLine("Found {0} files", files.Length);

            filesPerIncrement = files.Length / 100;
            fileNoPerStep = 0;
            currentFileNo = 0;
            foreach (string filePath in files)
            {
                if (fileNoPerStep == filesPerIncrement)
                {
                    Console.WriteLine("{0}%", currentFileNo / (filesPerIncrement > 0 ? filesPerIncrement : 1));
                    fileNoPerStep = 0;
                }
                ++currentFileNo;
                ++fileNoPerStep;

                string fileContents;
                try
                {
                    fileContents = File.ReadAllText(filePath, Encoding.UTF8);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.ReadAllText for file \"{0}\": {1}", filePath, ex.ToString());
                    continue;
                }

                int directoryNestingDepth = filePath.Replace(assemblerGamesPath, "").Split(new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar }, StringSplitOptions.RemoveEmptyEntries).Length - 1;
                string replaceStringInsertFavicon = matchStringHTTrackMarker + "\n" + String.Format(@"<link rel=""shortcut icon"" href=""{0}favicon.ico"">", String.Concat(Enumerable.Repeat("../", directoryNestingDepth)));
                string fileContentsNew = fileContents;
                fileContentsNew = regexAttachment.Replace(fileContentsNew, @"../attachments/$1-$2.$3/$1.$2");
                fileContentsNew = regexPhotobucket.Replace(fileContentsNew, @"http://$1.photobucket.com/albums/");
                fileContentsNew = regexSentinelFix.Replace(fileContentsNew, @"data-baseurl=""page-{{sentinel}}.html""");
                fileContentsNew = regexPollViewResultsDisable.Replace(fileContentsNew, @"<!--<input type=""button"" value=""View Results"" class=""button OverlayTrigger JsOnly"" data-href=""$1/poll/results"" />-->");
                fileContentsNew = fileContentsNew.Replace(@"<noscript><a href=""poll/results.html"" class=""button"">View Results</a></noscript>", @"<a href=""poll/results.html"" class=""button"">View Results</a>");
                fileContentsNew = fileContentsNew.Replace(matchStringAddressRebase, matchStringAddressRebaseNew);
                fileContentsNew = fileContentsNew.Replace(matchStringHTTrackMarker, replaceStringInsertFavicon);

                try
                {
                    File.WriteAllText(filePath, fileContentsNew);
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Exception on File.WriteAllText for file \"{0}\": {1}", filePath, ex.ToString());
                    continue;
                }
            }

            Console.WriteLine("Complete");
            Console.ReadLine();
        }
    }
}

There is some manual work apart from this (mostly on the main index page) but not much. As for why I use httrack, well it's the devil I know. Been using it for over a decade, and know how to make it do what I want it to do. As I said, it's horrible code, but it's also been battle tested plenty, and I've fixed most of the bugs/issues that caused me grief when using it in anger. I can't say how my version compares to wget, as I've never really used it.
 
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
Oh, and here's the args I used to do full mirrors of assemblergames:
Code:
"https://assemblergames.com" -%25N0 --cache=0 -O "D:\Emulation\Websites\assemblergamesFinal11" -c32 -#L0 --disable-security-limits --max-rate=0 -%25c0 --depth=2000000000 --robots=0 --keep-alive --near --retries=0 --display --quiet -F "Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.117 Safari/537.36" -ad.doubleclick.net/* -mime:application/foobar -*.zip -*.tar -*.tgz -*.gz -*.rar -*.z -*.exe -*.7z -*.ace -*.RAR -*.bz2 -*.lzh -*.sit -*.mov -*.mpg -*.mpeg -*.avi -*.asf -*.divx -*.mp4 -*.mp3 -*.mp2 -*.rm -*.wav -*.vob -*.qt -*.vid -*.ac3 -*.wma -*.wmv -*.ogg -*.flac -*.cue -*.pdf -*.bin -https://assemblergames.com/account/* -https://assemblergames.com/find-new/* -https://assemblergames.com/forums/-/* -https://assemblergames.com/login/* -https://assemblergames.com/logout/* -https://assemblergames.com/online/* -https://assemblergames.com/watched/* -https://assemblergames.com/posts/* -https://assemblergames.com/search/* -https://assemblergames.com/threads/*/reply?* -https://assemblergames.com/threads/*/#post-* -https://assemblergames.com/threads/*/add-reply -https://assemblergames.com/watched/* -https://assemblergames.com/recent-activity/* -https://assemblergames.com/lost-password/* -https://assemblergames.com/misc/location-info* -https://assemblergames.com/misc/quick-navigation-menu?* -https://assemblergames.com/misc/style?* -https://assemblergames.com/threads/*/#navigation -https://assemblergames.com/members/*/followers -https://assemblergames.com/members/*/following -https://assemblergames.com/members/*/trophies -https://assemblergames.com/members/*/#* -https://assemblergames.com/members/*/recent-activity -https://assemblergames.com/members/*/recent-content -https://assemblergames.com/forums/*/?* -https://assemblergames.com/forums/*/watch -https://assemblergames.com/threads/*/#* -https://assemblergames.com/goto/* +https://assemblergames.com/attachments/* -https://assemblergames.com/threads/*/watch-confirm -https://assemblergames.com/posts/*/like -https://assemblergames.com/threads/*/reply?* -https://assemblergames.com/posts/*/report -https://assemblergames.com/conversations/*/report -https://assemblergames.com/conversations/*/reply?* -https://assemblergames.com/conversations/*/message?* -https://assemblergames.com/conversations/*/leave -https://assemblergames.com/conversations/*/toggle-starred -https://assemblergames.com/conversations/*/toggle-read -https://assemblergames.com/conversations/add -https://assemblergames.com/conversations/*/invite -https://assemblergames.com/conversations/*/edit -https://assemblergames.com/conversations/*/delete -https://assemblergames.com/members/*/report -https://assemblergames.com/members/*/ignore -https://assemblergames.com/members/*/follow?* -https://assemblergames.com/account/* -https://assemblergames.com/forums/*/create-thread -https://assemblergames.com/threads/*/tags -https://twitter.com/intent/tweet?* -https://assemblergames.com/profile-posts/*/like -https://assemblergames.com/profile-posts/*/comment -https://assemblergames.com/profile-posts/*/report -https://assemblergames.com/profile-posts/*/delete -https://assemblergames.com/profile-posts/* -https://assemblergames.com/attachments/do-upload.json?* -https://assemblergames.com/attachments/upload?* -https://assemblergames.com/*/mark-read?* -*.thingiverse.com/* -abload.de/* -sparbote.de/* -channelf.se/* -github.com/* -*.excite.co.jp/* -*.freeforums.net/* -www.dropbox.com/* -cozumel.ucoz.es/* -www.sega-16.com/* -*.fbcdn.net/* -geekologie.com/* -www.emutalk.net/* -exs.cx/*
There's more going on than this internally, as I hard-coded URL priorities to do multi-pass scanning skipping less important files until the most important ones are done (IE, first css, then forums, then threads, then data, then attachments, then members, etc). I've fiddled with limits internally to ensure it doesn't hammer the servers while still being as aggressive as possible. Since I'm scanning content off-site, I also have to monitor it closely as it scans, particularly in the last pass when it's ripped the local content and is focusing on external content. If it starts spiralling off into the ether, I blacklist the bad domain mid-scan to ignore and scraped links, then drop it from the final file content.
 
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
Here's my instructions for backing up your PMs on Windows:

1. Download and extract the following archive: http://nemesis.exodusemulator.com/AssemblerGames/AssemblerGamesBackupPMs.zip
2. Login to the forum at assemblergames.com
3. Obtain your session token. You can do this using the Firefox addon mentioned by FamilyGuy. An easier way if you're using Google Chrome is to hit "F12" to open the debug console, go to the "Network" tab, refresh assemblergames.com, then select it from the top of the list. Scroll down on the panel on the right to the "cookie:" section and take the value for the "xf_session" entry. See the image below for a visual guide:
AssemblerGamesSessionToken.png

Note, be careful with your session token, it's almost as good as your password for getting access to your account (no, the token value shown in that image isn't still valid).
4. Open the "Output\cookies.txt" file in the extracted contents of the zip file you downloaded, and replace the "00000.." value with the value you got from your cookies in the step above.
5. Run the "Backup.cmd" script in the root of the downloaded file. This should mirror the PM content and related content (such as attachments, images, etc) into the "Output" directory. Once the mirroring is done, a cleanup process will run to fix some links and other issues.

And that's it. That'll give you an offline-browseable version of your PMs. Enjoy.
 
Last edited:
Donator
Donator
Registered
Joined
Jun 1, 2019
Messages
114
Reaction score
94
Points
28
@Nemesis I keep getting the following error in Windows 7 64:

"The procedure entry Point CreateFile2 could not be located in the dynamic link library KERNEL32.dll".

Any help would be much appreciated~
 
Member
Registered
Joined
May 31, 2019
Messages
13
Reaction score
0
Points
1
Thank you for the tip.
I'll give try this late.
 
Donator
Donator
Registered
Joined
Sep 1, 2018
Messages
87
Reaction score
62
Points
18
@Nemesis I keep getting the following error in Windows 7 64:

"The procedure entry Point CreateFile2 could not be located in the dynamic link library KERNEL32.dll".

Any help would be much appreciated~
Had the same issue; here's the workaround that *seems* to be working:
Download from:
the file labeled: httrack_x64-noinst-3.49.2.zip
Extract it to the same directory where you extracted AssemblerGamesBackupPMs.zip, replacing any files as needed.
Run the Backup.cmd script.
 
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
I can do a Win7 compatible build in a day or so, I'm travelling right now. In the interim, the regular httrack build referenced above should work reasonably well for such a small limited mirror operation, as long as you don't use unicode characters in any of your conversation titles.
 
Member
Joined
Jun 5, 2019
Messages
6
Reaction score
2
Points
3
Thanks. It worked well with cygwin.

50mb unpacked, 1,5mb packed.
 
2049 Donator
Donator
Registered
Joined
May 31, 2019
Messages
330
Reaction score
322
Points
63
Thanks. It worked well with cygwin.

50mb unpacked, 1,5mb packed.
You're welcome and I'm glad I could help.

For the sake of safety, I'd suggest also backing up on Windows using @Nemesis 's script, in case either method misses something. Better safe than sorry with backups!
 
Donator
Donator
Registered
Joined
Jun 5, 2019
Messages
34
Reaction score
43
Points
18
Thanks a lot.
httrack tool works perfectly on win10.:D
 
Well-known member
Community Contributor
Registered
Joined
Jun 3, 2019
Messages
179
Reaction score
189
Points
43
Had the same issue; here's the workaround that *seems* to be working:
Download from:
the file labeled: httrack_x64-noinst-3.49.2.zip
Extract it to the same directory where you extracted AssemblerGamesBackupPMs.zip, replacing any files as needed.
Run the Backup.cmd script.


I tried all that, and get this on Windows 7

Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive information,
such as username/password authentication for websites mirrored in this project
do not share these files/folders if you want these information to remain private

14:55:47 Warning: * security warning: !!! BYPASSING SECURITY LIMITS - MONITOR THIS SESSION WITH EXTREME CARE !!!
14:55:48 Warning: file not stored in cache due to bogus state (broken size, expected 6133 got 475): https://assemblergames.com/conversations
14:55:48 Error: "Forbidden" (403) at link https://assemblergames.com/conversations (from primary/primary)
14:55:48 Warning: No data seems to have been transferred during this session! : restoring previous one!
 
Last edited:
Well-known member
Registered
Joined
May 30, 2019
Messages
57
Reaction score
123
Points
33
@Greg2600 That's what happens if you haven't set your session cookie correctly. Check the instructions again and make sure you modify the "cookies.txt" file as listed. I'd keep your browser window open with your account logged in when you run the backup.
 
Well-known member
Community Contributor
Registered
Joined
Jun 3, 2019
Messages
179
Reaction score
189
Points
43
I must have fouled something up when I pasted the session ID in. Did it all over again and it worked. Fabulous tool.
 
Top