Saturday, January 29, 2011

MAF User Manual

MAF saves a web page, or multiple web pages in several tabs, to a single "web archive" file. MAF, is an acronym for Mozilla Archive Format, a free add-on for the free Firefox and SeaMonkey browsers.

MAF provides other enhancements to the standard browser's save system. For a quick overview, see the features page.

Note: This manual covers the latest version of Mozilla Archive Format. Some features may not be available in the version from the Firefox Add-ons website.

Introduction

MAF archives are a convenient cross-platform means to preserve web pages. MAF archives store all the text, images, and other resources of a web page pages to a single file. When a MAF web archive is moved or renamed, the saved pages are unchanged.

The MAF add-on can also convert pages saved in MAFF format pages to other formats, such as to original web pages, or to the Microsoft MHTML format used by Microsoft's Internet Explorer browser.

Saving web archives

MAF provides two new options of file type in the Save As dialog box:

MAF "Save As" Dialog Menu
Web Archive, MAFF zipped

This option saves one or more pages inside a single Mozilla Archive Format File, or MAFF archive. MAFF archives are compressed using the universal, cross-platform ZIP specification for saving multiple files in one archive.

MAFF archives can be opened in the browser. If multiple tabs were saved, opening a MAFF archive opens all the tabs, exactly the way they were when they were saved.

It is possible to view the original location from which the page was saved. The contents of the archive, including any embedded media files, can be inspected and extracted using any ZIP utility, such as the free 7-Zip.

The Mozilla Archive Format extension generates MAFF archives using the fast, native ZIP implementation embedded in the Mozilla browser. The resulting files are usually smaller than the equivalent MHTML archives, and opening this kind of file is faster.

However, Microsoft's Internet Explorer browser cannot open MAFF files, so MAF can save web pages in the Microsoft format, MHTML, also.

Web Archive, MHTML

This web archive format, also known as MHT, is used by Microsoft's Internet Explorer browser. This option saves a single page inside a MIME HTML file, or MHTML archive.

MHTML files are encoded, not compressed. The encoding usually increases the size of the saved media files compared to the original. At present, the contents of an MHTML archive can be decoded by only a limited number of web browsers, or by using special utilities. However, MHTL archive format has the advantage that it can be shared with those who use only Internet Explorer.

Additional information saved in web archives

When you save a page as a web archive, the following additional information about the save operation is stored in the archive:

  • The original location from which the page was saved. This normally matches what is displayed in the address bar of the browser.

  • The date and time the page was saved.

  • The title of the page, if present.

  • The character set in effect at the time the page was saved.

    If the character set was changed manually using the View » Character Encoding menu item, the custom choice is remembered. This allows the document to be displayed correctly when it is reopened from the archive, even if it contains international characters.

If you re-save an already archived page to a different file, the save time and location from the original archive are preserved.

Opening web archives

After MAF is installed, web archives can be opened using the File » Open File... menu choice or using drag-and-drop, as with any saved web page.

Under Windows, MAF can also create file associations that open MAF web archives by double-clicking the file names in Windows Explorer.

Viewing information about archived pages

View Archive

By default, when you are displaying an archived page, an additional icon appears in the address bar of the browser. You can click the icon to display the following information about the archived page:

  • The original location the page was saved from, if available.
  • The date and time of the save operation, if available.

The original location is a link. You can click it with the left mouse button to open the original page in the same tab, or you can use the appropriate key combinations to open the link in a new tab or a new window.

From the popup panel with the information on the page, you can also access the Archives dialog, that provides additional information on all the archives that have been opened during the current browsing session.

The icon can also be displayed in the status bar. You can control the visibility and position of the icon from the interface preferences.

Integration with other extensions

One of the key features of Mozilla Archive Format is that it integrates not only with the core of the browser, but with other extensions as well, to provide a smooth user experience.

Some of these extensions, like UnMHT, must be installed separately; other extensions, like Save Complete, are embedded and updated together with MAF.

Multiple Tab Handler, by Shimoda Hiroshi

This extension adds a multiple selection interface and a new context menu to the Firefox tab bar.

MAF integrates with the tab selection context menu and adds an entry to save the selected tabs in an archive. For MHTML archives, MAF creates multiple files, while for MAFF archives all the tabs are saved in a single file.

Save Complete, by Stephen Augenstein

The Save Complete extension is integrated with MAF, but must be enabled from the preferences.

This extension replaces the system used by the browser to save complete web pages. The new system correctly handles style sheets referencing image files, that otherwise would not be saved causing some pages to appear differently.

File Title, by Pavel Cvrcek

The functionality of the File Title extension is also available from the MAF preferences.

This extension replaces the default file name suggested in the Save As dialog box with the title of the page being saved.

Title Save, by gm

This extension is similar to File Title, but does not affect the default behavior and adds a new item in the File menu to use the title of the page instead of the file name in the Save As dialog box.

You can use the new command to save MAFF and MHTML archives too.

You may install this extension if you want to selectively use the page title instead of the original file name. In this case, ensure that the default browser's naming strategy is selected in the MAF preferences, otherwise the title of the page might be used in all cases.

UnMHT, by Arai

This extension adds new options in the File menu to save MHTML archives, providing also other advanced features.

If UnMHT is installed, you can continue to use MAF to create and open MAFF archives, while MHTML archives are opened with UnMHT.

Converting previously saved pages to other file formats

You probably already have some web pages saved among your local files. These pages are often stored as file / folder pairs (like Page.html and Page_files), and you may want to convert them to a web archive format for easier maintenance.

You may also want to convert saved pages from a web archive format to another, for example from MHTML to MAFF to save disk space or vice versa to achieve compatibility with Internet Explorer.

Converting single pages

Converting a single page that was previously saved locally is as easy as opening the page in the browser and resaving it in another file format. The Mozilla Archive Format extension handles the details of the conversion process, and preserves the information about the original source, if available.

When converting a web page that is not stored in an archive, the following information is preserved:

  • The date and time of the original save operation is obtained from the local file's last modification time.
  • The original location is obtained from the special comment some browsers embed in the page when they save it. If not available, the local address is used.

    If the page was saved with Internet Explorer, the original location is stored like <!-- saved from url=(0023)http://www.example.org/ -->.

    If the page was saved using the standalone or the integrated Save Complete extension for Firefox, the original location is stored like <!-- Source is http://www.example.org/ -->.

    If the page was saved using SeaMonkey or Firefox without the Save Complete extension, the original location is not available.

When converting a web archive to another archive format, all the information that is supported by the destination file format is preserved.

When saving an archived page as a complete page outside of an archive, if the integrated Save Complete extension is enabled, the original source location is stored in a comment inside the saved page.

Converting multiple pages

If you have many saved pages that you want to convert to another file format, you can use the Saved Pages Conversion Wizard. You can start the wizard using the Tools » Mozilla Archive Format » Convert Saved Pages menu item. If the Mozilla Archive Format submenu is hidden, you must first enable it from the interface preferences.

The wizard allows you to convert all the pages located in one folder, optionally including all its subfolders, automating the task of opening each page and saving it using another file format. When using the conversion wizard, the following information must be considered:

  • The wizard operates on multiple files, but the results for each file are equivalent to converting a single page by opening and saving it manually. The same information about the original location is preserved, and the fidelity of the resulting page depends on the destination file formats and the current preferences.

  • For best results, it is recommended that you enable the Save Complete component before starting the conversion.

  • If you want to convert from MHTML to another file format, like MAFF, and you have installed the UnMHT extension, you must disable it for the duration of the conversion process.

  • The wizard only operates on one page for each file. If you want to convert from a multi-page MAFF archive to another file format, you should extract the archive first, using an ordinary ZIP utility. If you want, you can then convert the resulting complete web pages to MHTML using the conversion wizard.

  • If you are converting from a web archive format, ensure you have enough free space in your temporary folder, since the archives are normally extracted to the temporary folder before conversion. If you need to convert many pages and don't have enough free space, you may want to convert only some of them at a time, and restart the browser between each conversion batch. You can also move the temporary folder to a different drive in the advanced preferences.

  • In some cases, the automatic conversion of complex web pages may fail. These pages may need manual conversion.

Selecting which files to convert

First, you must select the source and destination file formats, and the source folder to be sought for source files. You can decide to look in subfolders of the selected folder or to convert only the files that are placed directly inside the selected folder.

The selected source format determines how the wizard will look for source files. The MAFF and MHTML web archive formats are recognized by their extension, respectively .maff and either .mht or .mhtml. Complete web pages are recognized because they have an associated support folder, for example Page.html and Page_files, but also Page (without extension) and Page_files. Web pages saved as single files, without support folders, are recognized by their extension only.

If you are using your browser in a language other than English, the recognition of additional support folder suffixes will be enabled. For example, if you are using your browser in French, a support folder named Page_fichiers can be recognized, in addition to Page_files. If you previously saved pages using a browser in a different language than the current one, the support folder names may not be recognized correctly.

The selected destination format determines how the wizard will assign the output file names. The extension in the source file name, if present, is always replaced with the correct extension for the destination file format. For MHTML, the advanced preferences determine whether the .mht or .mhtml extension is used.

The next step consists in selecting the destination folder. If you want, you can place the converted files in a different folder from the original files. This is particularly useful if you are converting from a read-only source, like a CD-ROM or a DVD. The original folder structure is always preserved, so that if a source file is located in a subfolder of the original folder, the converted file will be located in a subfolder with the same name in the destination folder.

You may also choose to place the converted files near the original files. Each converted file will be placed in the same folder as its original, with the same file name, but with a different extension. In this case, you may want to move the original out of the way, by selecting a folder that will be used as a bin for the original files that have been successfully converted.

If you are converting from the MAFF file format and the use of the "jar:" protocol is enabled in the advanced preferences, you will not be able to move the source files to another folder, since the browser will lock the files in place until it is closed. If you want to use this feature when converting from MAFF to another format, you should disable the use of the "jar:" protocol for the duration of the conversion process.

The conversion wizard will never delete or overwrite the source files. Since the converted pages may not be entirely faithful to the original, you should always keep a backup of your source files available, even after a successful conversion.

Finally, the source folder is scanned to locate the original files. Depending on how many files are present in the source, this operation may require some time. If you are working with large folder trees, you may want to repeat the wizard multiple times, converting one subfolder at a time.

Before the actual conversion begins, you have the option of fine-tuning your selection, and you can verify that the source files have been identified correctly. In the list of files, in addition to the source file name, support folder name and subfolder, you may display other columns like the full source, destination and bin paths.

If for any reason the destination file or support folder is already present, or if a file or support folder is already present in the folder where the source file would be moved after conversion, the source file name will appear in the list, but the selection checkbox will be disabled. This often indicates that the source file was converted successfully during a previous run of the wizard.

Completing the conversion

After you have selected the files to be converted, click the Finish button to start the conversion process. Depending on the number of files, this process may require some time.

You can cancel the conversion at any time by closing the wizard or by using the Back button. Canceling the operation may require some time.

When the operation is finished, you can see the count of how many files have been successfully converted and how many files failed. The icon near each file name indicates its current status: not selected, already converted, waiting for conversion, currently converting, conversion failed, or conversion succeeded.

Detailed information about the reasons for conversion failures is available in the Error Console, accessible from the Tools » Error Console menu item.

If you are satisfied with the results, click the Finish button to close the window. You may also use the Back button to retry the conversion process with the same settings, or to change your selection and repeat the process with different folders.

Preferences

The default settings in effect after installation are enough to allow correct loading and saving of both MAFF and MHTML archives. To enable or disable the integrated Save Complete extension, customize the interface, or modify advanced aspects of page loading and archiving, you can change the extension's preferences.

The preferences dialog can be accessed from the Tools » Mozilla Archive Format » Preferences menu item or from the button in the archive information popup. If the Mozilla Archive Format submenu or the icon to display the popup are hidden, you can still open the preferences dialog using the Options button in the Add-ons dialog, available from the Tools » Add-ons menu choice.

Main preferences

When saving complete web page contents:

This preference controls which method is used to find all the web resources (images, subpages, ...) that are included in the web page being saved. This step is preliminary to archiving all the resources in MAFF or MHTML format.

You may change this preference if the saved pages seem to be really different from their original version, to achieve a better result.

  1. Use browser's standard save system. (default) With this setting, the web pages are saved by the browser. How much of the web page is actually saved depends on the version of the browser being used.

  2. Preserve scripts and source using Save Complete. Allows more content to be saved, thanks to the integrated Save Complete extension written by Stephen Augenstein. This save mode attempts to preserve the dynamic features of the page by keeping all the scripts and the original page source code, but content generated by scripts may be missing from the resulting page. Note that if you also have a standalone version of Save Complete installed, MAF will continue to use the integrated one.

    Improvements to Save Complete are periodically included in new versions of MAF.

  3. Take a faithful snapshot of the page. This is the most accurate save mode, as it captures the current state of the page and creates an exact replica, including the current values of form fields, as well as video and audio embedded in the page. This save mode works especially well for pages that make extensive use of scripts and use dynamic technologies like AJAX.

    The resulting page will be static, as scripts are disabled by the save operation to preserve the integrity of the result when it is displayed again.

Note that the selected component will be used not only when saving archives, but also when saving complete pages using the
File » Save Page As... » Save as Type: Web Page, complete menu choice.

Create MHTML files fully compatible with other browsers

When this preference is enabled, Mozilla Archive Format will create MHTML files according to the original specification, allowing any browser to open the archives correctly, even in case of very complex pages. If this preference is disabled, MAF will generate a specific MHTML variant, that will open much more quickly in Firefox or SeaMonkey, even for very large documents, but that other browsers would not be able to display with proper formatting if the saved page contains nested CSS stylesheets or inner frames.

MAF is only able to create compatible MHTML files using the integrated Save Complete component. If the Save Complete component is disabled, only the MAF-specific MHTML variant is available.

For the suggested file name:

This preference controls which method is used to select the default file name in the Save As dialog box.

  1. Use browser's standard naming strategy (default) - With this setting, the Mozilla Archive Format extension does not alter the current behavior, which is determined by the browser or by other installed extensions. If no other extension affecting this behavior is installed, the original name of the file will be preferred to the title of the page.

  2. Use the title of the page whenever possible - With this setting, the title of the page will be preferred to the original file name. This is done for all HTML and XHTML pages, unless the server you are downloading the page from explicitly asked the browser to use a specific file name. Note that if other extensions affecting this behavior are installed, this setting may not work as expected.

Save extended metadata in MAFF archives

With this preference enabled, additional page information such as history, text zoom and scroll position is saved for each page. There is currently no preference to restore this saved information as yet.

Interface preferences

Show Mozilla Archive Format icon in:

You can control the visibility and position of the icon that provides access to the additional information about an archived page.

  1. Address bar - Always display the icon in the address bar. If the current page is not saved in an archive, the icon is grayed out, but you can still use it to access the Archives dialog or the preferences.
  2. Address bar, for archived pages only (default) - Keep the icon hidden during normal browsing, and display it only when viewing a page that is saved in an archive.
  3. Status bar - Display the icon in the status bar. If the current page is not stored in an archive, the icon is grayed out, but you can still use it to access the Archives dialog or the preferences. This option is recommended if you are using a theme that is not compatible with additional icons in the address bar.
  4. None - Do not display the icon. Note that if you hide the icon, you can still access the MAF preferences from the Tools menu or the Add-ons dialog.
Show Mozilla Archive Format menu items in:

You can select which menus will display the Mozilla Archive Format items. If you disable the Tab Bar Context Menu option but enable the Page Context Menu option, the tab-related menu items will appear in the page context menu instead of the tab bar.

Show these additional menu items:

The Save In Archive option enables displaying the additional Save Page In Archive and Save Frame In Archive menu items near the Save Page As and Save Frame As standard items, across all menus. These items open a special Save As dialog reserved for saving in archives, and are useful if you routinely use the standard Save As dialog to save only the text of a page, and need a separate option to save a page in an archive without changing the selection in the file type drop down list.

The Save Page In Archive and Save Frame In Archive menu items are always available under the Tools » Mozilla Archive Format menu, if visible, regardless of this preference.

File associations

File associations

This preferences pane allows you to create or refresh file associations on Windows, for the MAFF and MHTML formats separately.

File associations are always created explicitly for the current user of the system. In addition, if the current user has administration privileges, default file associations for all users are also created.

File associations are not removed when uninstalling MAF or the browser itself.

Advanced preferences

Temporary folder

This preference allows you to customize the location of the temporary files required to open and save the web archives.

If unspecified, this location defaults to the maftemp subdirectory of the system temporary directory.

If customized, the absolute path to the specified location is remembered. The contents of the selected folder will be lost if the Clear temporary folder when browser exits option is selected. There is usually no need to customize the temporary folder unless you use different browser profiles on the same computer at the same time.

Clear temporary folder when browser exits

This option is enabled by default. If disabled, the contents of the temporary directory are preserved after the browser exits, and must be cleaned up manually.

Rewrite absolute URLs in open archives

With this preference enabled, the archived web pages would be processed as they finish loading in tabs. The processing would replace absolute links to resources in the page with local resources in the archive (if possible). This would allow users to browse linked pages in an archive seamlessly.

Use the "jar:" protocol to access the contents of MAFF archives

If this preference is enabled, when you open a MAFF archive its contents will be accessed directly using the "jar:" protocol, without being extracted.

However, if you enable this option, the archive files you open will be locked, and you will be unable to move, rename or delete them until the browser is closed.

Save using the .mhtml file extension instead of .mht by default

If this option selected, and you do not type a file extension in the Save As dialog box or file extensions are hidden, the complete .mhtml extension will be appended to the file name of MHTML archives, instead of the more common .mht extension.

Display welcome window at next startup

The welcome dialog is usually displayed only when MAF is installed for the first time. All the options that are set by the welcome dialog are also available from the preferences, thus it is usually not necessary to display the dialog again. This option is only provided as a convenience for translators in need of proofreading the text in the welcome dialog multiple times.

Internal configuration settings

These configuration settings are not available from the preferences dialog, but only from the about:config page. Usually, they should not be changed unless there is a specific reason, and non-default settings may adversely impact functionality or performance.

extensions.maf.open.maff.ignorecharacterset

When this setting is enabled, the character set specified for pages saved inside MAFF archives is ignored, instead of being enforced when the page is displayed. Enabling this option may be useful to troubleshoot internationalization issues, but will cause saved pages to be displayed incorrectly in most cases.

extensions.maf.save.maff.compression

Controls the compression level to use when saving files in a MAFF archive.

  1. dynamic (default) - Use maximum compression for all files, but do not re-compress media files.
  2. best - Use maximum compression for all files.
  3. none - Store all the files uncompressed.

More documentation

This document provided the essential user documentation for the extension. Technical documentation about the internals of Mozilla Archive Format and the MAFF file format are available in separate documents, the API documentation and the MAFF specification.