Table of Contents
Many programs and desktops use the MIME system[MIME] to represent the types of files. Frequently, it is necessary to work out the correct MIME type for a file. This is generally done by examining the file's name or contents, and looking up the correct MIME type in a database.
It is also useful to store information about each type, such as a textual description of it, or a list of applications that can be used to view or edit files of that type.
For interoperability, it is useful for different programs to use the same database so that different programs agree on the type of a file and information is not duplicated. It is also helpful for application authors to only have to install new information in one place.
This specification attempts to unify the MIME database systems currently in use by GNOME[GNOME], KDE[KDE] and ROX[ROX], and provide room for future extensibility.
KDE uses .desktop
files, with Type=MimeType, one file
per type to determine type from file name. The files are arranged in the
filesystem to mirror the two-level MIME type hierarchy.
The syntax is very similar to other .desktop
files,
with Name=, Comment= etc.
Example file:
[Desktop Entry] Encoding=UTF-8 MimeType=application/x-kword Comment=KWord Comment[af]=kword [... etc. other translations ] Icon=kword Type=MimeType Patterns=*.kwd;*.kwt; X-KDE-AutoEmbed=false [Property::X-KDE-NativeExtension] Type=QString Value=.kwd
KDE does not have a separate system for specifying extension matches, but uses case-sensitive glob patterns for everything.
A single file stores all the rules for recognising files by content. This
is almost identical to file(1)'s magic.mime
database file, but without the encoding field.
The format is described in the file itself as follows:
# The format is 4-5 columns: # Column #1: byte number to begin checking from, ">" indicates continuation # Column #2: type of data to match # Column #3: contents of data to match # Column #4: MIME type of result
GNOME uses the gnome-vfs library to determine the MIME type of a file.
This library loads name-to-type rules from files with a '.mime' extension
in a system-wide directory (set at install time), and merged with those in the
user's directory. It loads textual descriptions for the types from
files in the same directories, ending with '.keys'. The file
gnome-vfs.mime
in the system directory is always loaded
first (allowing everything else to override it). The file
user.mime
in the user's directory is always loaded
last, making these settings take precedence over all others.
The format of the .mime files are described as follows:
# Mime types as provided by the GNOME libraries for GNOME. # # Applications can provide more mime types by installing other # .mime files in the PREFIX/share/mime-info directory. # # The format of this file is: # # mime-type # ext[,prio]: list of extensions for this mime-type # regex[,prio]: a regular expression that matches the filename # # more than one ext: and regex: fields can be present. # # prio is the priority for the match, the default is 1. This is required # to distinguish composed filenames, for example .gz has a priority of 1 # and .tar.gz has a priority of 2 (thus a file having the filename # something.tar.gz will match the mime-type for tar.gz before the mime-type # for .gz # # The values in this file are kept in alphabetical order for convenience. # Please maintain this when adding new types. Also consider adding a # human-readable description to gnome-vfs.keys when adding a new type here. # # Also do please not add illegal mime types, observe the mime standard when # adding new types.
When looking up the type for a file, gnome-vfs looks first for an exact-case
match for the extension, then an all upper-case match, then an all lower-case
match. If no matches are found, or there is no '.' in the name, then the
regular expression matches are checked. It does this first for rules with
priority 2, then for those with priority 1. The modification time on the
mime-info
directories is used to detect changes.
The .keys files contain type-to-description rules, eg:
application/msword description=Microsoft Word document [de]description=Microsoft Word-Dokument ...
Guidelines for writing descriptions can be found in the
mime-descriptions-guidelines.txt
file.
The format for magic entries is defined as:
# The format of magic entries is: # # offset_start[:offset_end] pattern_type pattern [&pattern_mask] type # # <offset_start> and <offset_end> are decimal numbers (file offsets). # # <pattern_type> is (byte | short | long | string | date | beshort | # belong | bedate | leshort | lelong | ledate). # # <pattern> is an ASCII string with non-printable characters escaped # as hex or octal escape sequences, and spaces and other important # whitespace escaped with '\'. # # <pattern_mask> is a string of hex digits. The mask must be the same # length as the pattern. # # <type> is a valid MIME type. # # Order magic patterns such that ambiguous ones (such as # application/x-ms-dos-executable) are at the end of the list and # therefore get applied last. # # Avoid rules that require a seek deep into the examined file. If you # must, locate such rules at the end of the list so that they get # applied last # # When designing new document formats, make them easily recognizable # by defining a sufficiently unique magic pattern near the document # start. A good pattern is at least four bytes long and contains one # or two non-printable characters so that text files won't be # misidentified.
ROX searches MIME-info
directories in
CHOICESPATH
(~/Choices/MIME-info:/usr/local/share/Choices/MIME-info:/usr/share/Choices/MIME-info
by
default). Files from earlier directories override those in later ones, but
the order within a directory is not specified.
The files are in the same format as GNOME, except:
There are no .keys files, so files of all extensions are loaded.
The priority is ignored.
A case-sensitive match is tried first, then a lower-case match. No upper-case match is tried.
Multiple extensions are allowed. Eg:
application/x-compressed-postscript ext: ps.gz eps.gz
When looking up the type for a file, ROX starts with the first '.' and tries a case-sensitive match of the remaining text against the extensions. The it tries again with the filename in lower-case. It then tries again from the second '.', and so on. If no type is found, it tries the regular expressions.
ROX has no rules for determining a file's type from its contents.
In discussions about these systems, it was clear that the differences between the databases were simply a result of them being separate, and not due to any fundamental disagreements between developers. Everyone is keen to see them merged.
This spec proposes:
A standard way for applications to install new MIME related information.
A standard way of getting the MIME type for a file.
A standard way of getting information about a MIME type.
Standard locations for all the files, and methods of resolving conflicts.
Further, the existing databases have been merged into a single package [SharedMIME].
There are two important requirements for the way the MIME database is stored:
Applications must be able to extend the database in any way when they are installed, to add both new rules for determining type, and new information about specific types.
It must be possible to install applications in /usr, /usr/local and the user's home directory (in the normal Unix way) and have the MIME information used.
The directories to be used to store the files in the database are:
/usr/share/mime/
/usr/local/share/mime/
~/.mime/
In the rest of this document, paths shown with the prefix
<MIME>
indicate the files should be loaded from
all the directries listed above. For example, “Load all the
<MIME>/text/html.xml
files” means to load
/usr/share/mime/text/html.xml
,
/usr/local/share/mime/text/html.xml
, and
~/.mime/text/html.xml
(if they exist).
Where the information from these files is conflicting, information from directories lower in the list takes precedence.
Any file named Override.xml
takes precedence over all other files in
the same packages
directory. Tools which let the user edit the
database should edit the file ~/.mime/packages/Override.xml
.
Each application that wishes to contribute to the MIME database will install a
single XML file, named after the application, into one of the three
<MIME>/packages/
directories (depending on where the user requested
the application be installed). After installing, uninstalling or modifying this
file, the application MUST run the update-mime-database command,
which is provided by the freedesktop.org shared database[SharedMIME].
update-mime-database is passed the mime
directory containing the packages
subdirectory which was
modified as its only argument. It scans all the XML files in the packages
subdirectory, combines the information in them, and creates a number of output files:
<MIME>/globs
(contains a mapping from extension to MIME type)
<MIME>/magic
(contains a mapping from file contents to MIME type)
<MIME>/MEDIA/SUBTYPE.xml
(one file for each MIME
type, giving details about the type)
The format of these generated files and the source files in packages
are explained in the following sections. This step serves several purposes. First, it allows
applications to quickly get the data they need without parsing all the source XML files (the
base package alone is over 700K). Second, it allows the database to be used for other
purposes (such as creating the /etc/mime.types
if desired). Third, it
allows some validation to be performed on the input data, and removes the need for other
applications to carefully check the input for errors themselves.
Each application provides only a single XML source file, which is installed in the
packages
directory as described above. This file is an XML file
whose document element is named mime-info
and whose namespace URI
is http://www.freedesktop.org/standards/shared-mime-info. All elements
described in this specification MUST have this namespace too.
The document element may contain zero or more mime-type
child nodes,
in any order, each describing a single MIME type. Each element has a type
attribute giving the MIME type that it describes.
Each mime-type
node may contain any combination of the following elements,
and in any order:
glob
elements have a pattern
attribute. Any file
whose name matches this pattern will be given this MIME type (subject to conflicting rules in
other files, of course).
magic
elements contain a list of match elements, any of which may match,
and an optional priority
attribute for all of the contained rules. Low numbers
should be used for more generic types (such as 'gzip compressed data') and higher values for specific
subtypes (such as a word processor format that happens to use gzip to compress the file).
The default priority value is 50.
Each child element can be any of string
, host16
,
host32
, big16
,
big32
, little16
,
little32
or byte
. Each of these elements has
offset
, type
,
value
and, optionally, mask
attributes. Each element corresponds to one line of
file(1)'s magic.mime
file.
They can be nested in the same way to provide the equivalent of continuation
lines.
action
elements introduce an action that can be performed on files of this
type. There may be several actions for each type. The format for this element has not yet been
decided.
comment
elements give a human-readable textual description of the MIME
type. There may be many of these elements with different xml:lang
attributes
to provide the text in multiple languages.
Applications may also define their own elements, provided they are namespaced to prevent collisions.
Unknown elements are copied directly to the output XML files like comment
elements.
Here is an example source file, named diff.xml
:
<?xml version="1.0"?> <mime-info xmlns='http://www.freedesktop.org/standards/shared-mime-info'> <mime-type type="text/x-diff"> <comment>Differences between files</comment> <comment xml:lang="af">verskille tussen lêers</comment> ... <magic priority="50"> <string offset="0" value="diff "/> <string offset="0" value="*** "/> <string offset="0" value="Common subdirectories: "/> </magic> <glob pattern="*.diff"/> <glob pattern="*.patch"/> </mime-type> </mime-info>
In practice, common types such as text/x-diff are provided by the freedesktop.org shared database. Also, only new information needs to be provided, since this information will be merged with other information about the same type.
These files have a mime-type
element as the root node. The format is
as described above. They are created by merging all the mime-type
elements from the source files and creating one output file per MIME type. Each file may contain
information from multiple source files. The magic
and
glob
elements will have been removed.
The example source file given above would (on its own) create an output file called
<MIME>/text/x-diff.xml
containing the following:
<?xml version="1.0" encoding="utf-8"?> <mime-type xmlns="http://www.freedesktop.org/standards/shared-mime-info" type="text/x-diff"> <!--Created automatically by update-mime-database. DO NOT EDIT!--> <comment>Differences between files</comment> <comment lang="af">verskille tussen lêers</comment> ... </mime-type>
This is a simple list of lines containing a MIME type and pattern, separated by a colon. For example:
# This file was automatically generated by the # update-mime-database command. DO NOT EDIT! ... text/x-diff:*.diff text/x-diff:*.patch ...
KDE's glob system replaces GNOME's and ROX's ext/regex fields, since it is trivial to detect a pattern in the form '*.ext' and store it in an extension hash table internally. The full power of regular expressions was not being used by either desktop, and glob patterns are more suitable for filename matching anyway.
Applications MUST first try a case-sensitive match, then a case-insensitive
one. This is so that main.C
will be seen as a C++ file,
but IMAGE.GIF
will still use the *.gif pattern.
If several patterns match then the longest pattern SHOULD be used. In
particular, files with multiple extensions (such as
Data.tar.gz
) MUST match the longest sequence of extensions
(eg '*.tar.gz' in preference to '*.gz'). Literal patterns (eg, 'Makefile') must
be matched before all others. It is acceptable to match patterns of the form
'*.text' before other wildcarded patterns (that is, to special-case extensions
using a hash table).
There may be several rules mapping to the same type. They should all be merged. If the same pattern is defined twice, then they MUST be ordered by the directory the rule came from, as described above.
Common types (such as MS Word Documents) will be provided in the X Desktop Group's package, which MUST be required by all applications using this specification. Since each application will then only be providing information about its own types, conflicts should be rare.
These files have a similar format to
file(1)'s magic.mime
file.
Each line may be either a comment (starting with '#'), a new type (starting with '[') or
a rule to match (anything else).
Type lines are in the form "[" PRIORITY ":" TYPE "]".
Match lines are in the form ">"* START [":" END] TAB TYPE ["&" MASK] TAB VALUE. The offsets may be a range in the form START:END. The rule is considered to match if there is a match at either of these offsets, or at any offset in-between. The line may start with zero or more ">" characters, as for the normal file syntax. Whitespace in the value (after the tab) is significant, and fields are separated by exactly one tab character.
The above example would create a magic file with these contents:
# This file was automatically generated by the # update-mime-database command. DO NOT EDIT! ... [50:text/x-diff] 0 string diff 0 string *** 0 string Common subdirectories: ...
The system described in this document is intended to allow different programs to see the same file as having the same type. This is to help interoperability. The type determined in this way is only a guess, and an application MUST NOT trust a file based simply on its MIME type. For example, a downloader should not pass a file directly to a launcher application without confirmation simply because the type looks `harmless' (eg, text/plain).
Do not rely on two applications getting the same type for the same file, even if they both use this system. The spec allows some leeway in implementation, and in any case the programs may be following different versions of the spec.
The MIME database is NOT intended to store user preferences. Users should never edit the database. If they wish to make corrections or provide MIME entries for software that doesn't provide these itself, they should do so by means of the Override.xml mentioned in the section called “Directory layout”. Information such as "text/html files need to be opened with Mozilla" should NOT go in the database.
However, using extension elements introduced by additional namespaces (like a GNOME namespace), the database may be used to store static information, such as "Galeon is the GNOME default text/html browser".
[GNOME] The GNOME desktop, http://www.gnome.org
[KDE] The KDE desktop, http://www.kde.org
[ROX] The ROX desktop, http://rox.sourceforge.net
[DesktopEntries] Desktop Entry Specification, http://www.freedesktop.org/standards/desktop-entry-spec.html
[SharedMIME] Shared MIME-info Database http://www.freedesktop.org/standards/shared-mime-info.html