Hypertext Markup Language
|types||Markup, Text (plain)|
|preferred||⚠️ under conditions|
HTML is the core web technology, turning the pre-www internet of 1991 into a World Wide Web.
It is basically a document markup language, in which you can store text plus formatting plus layout.
In earlier times, the layout and format options were basic, nowadays both are as sophisticated as the printed page can be.
In order to achieve the richest formatting, HTML works together with another standard for styles: CSS.
The ultimate reference to all these technologies is the Mozilla Developer Network (MDN):
HTML files may contain CSS styles inside, or they may refer to other files for their styling.
The World-wide-web is one of the most dynamic corners of IT. Standards are always on the move, and new patterns of organizing web content replace the best practices of yesterday.
HTML files may contain complete programs, documentation, and installation procedures on the one hand, or they may consist of long stretches of text with horribly redundant layout codes.
Most HTML in the world has not been typed by humans, but has been generated by software.
HTML in the archive¶
One scenario by which HTML files may enter the archive, is when a website gets archived. In that case, it is not the individual HTML files themselves that should be judged for their long-term preservability, but rather the website as an integral system.
In this scenario, it is preferable to archive the source code of the web site as well, not only the end result.
Nowadays, there are many popular ways to write HTML documentation as generated static pages. The source code resides in a software repository, and the generated pages are served by a service such as readthedocs or GitHub pages.
Another scenario is when HTML files with substantial content have been captured from a legacy system, or from other sources. If possible, scan the file for references to external files, and if possible, rescue those files as well, and store them in an organization that matches the way they are referenced. Then preserve the resulting directory in a TAR file.