Apache HTTP Server Version 2.4
Description: | Associates the requested filename's extensions with the file's behavior (handlers and filters) and content (mime-type, language, character set and encoding) |
---|---|
Status: | Base |
Module Identifier: | mime_module |
Source File: | mod_mime.c |
This module is used to assign content metadata to the content
selected for an HTTP response by mapping patterns in the
URI or filenames to the metadata values. For example, the filename
extensions of content files often define the content's Internet
media type, language, character set, and content-encoding. This
information is sent in HTTP messages containing that content and
used in content negotiation when selecting alternatives, such that
the user's preferences are respected when choosing one of several
possible contents to serve. See
mod_negotiation
for more information
about content negotiation.
The directives AddCharset
, AddEncoding
, AddLanguage
and AddType
are all used to map file
extensions onto the metadata for that file. Respectively
they set the character set, content-encoding, content-language,
and media-type (content-type) of documents. The directive TypesConfig
is used to specify a
file which also maps extensions onto media types.
In addition, mod_mime
may define the handler and filters that originate and process
content. The directives AddHandler
, AddOutputFilter
, and AddInputFilter
control the modules
or scripts that serve the document. The MultiviewsMatch
directive allows
mod_negotiation
to consider these file extensions
to be included when testing Multiviews matches.
While mod_mime
associates metadata
with filename extensions, the core
server
provides directives that are used to associate all the files in a
given container (e.g., <Location>
, <Directory>
, or <Files>
) with particular
metadata. These directives include ForceType
, SetHandler
, SetInputFilter
, and SetOutputFilter
. The core directives
override any filename extension mappings defined in
mod_mime
.
Note that changing the metadata for a file does not
change the value of the Last-Modified
header.
Thus, previously cached copies may still be used by a client or
proxy, with the previous headers. If you change the
metadata (language, content type, character set or
encoding) you may need to 'touch' affected files (updating
their last modified date) to ensure that all visitors are
receive the corrected content headers.
Files can have more than one extension; the order of the
extensions is normally irrelevant. For example, if the
file welcome.html.fr
maps onto content type
text/html
and language French then the file
welcome.fr.html
will map onto exactly the same
information. If more than one extension is given that maps onto
the same type of metadata, then the one to the right will
be used, except for languages and content encodings. For example,
if .gif
maps to the media-type
image/gif
and .html
maps to the
media-type text/html
, then the file
welcome.gif.html
will be associated with the
media-type text/html
.
Languages and content encodings are treated accumulative, because one can assign
more than one language or encoding to a particular resource. For example,
the file welcome.html.en.de
will be delivered with
Content-Language: en, de
and Content-Type:
text/html
.
Care should be taken when a file with multiple extensions
gets associated with both a media-type
and a handler. This will
usually result in the request being handled by the module associated
with the handler. For example, if the .imap
extension is mapped to the handler imap-file
(from
mod_imagemap
) and the .html
extension is
mapped to the media-type text/html
, then the file
world.imap.html
will be associated with both the
imap-file
handler and text/html
media-type.
When it is processed, the imap-file
handler will be used,
and so it will be treated as a mod_imagemap
imagemap
file.
If you would prefer only the last dot-separated part of the
filename to be mapped to a particular piece of meta-data, then do
not use the Add*
directives. For example, if you wish
to have the file foo.html.cgi
processed as a CGI
script, but not the file bar.cgi.html
, then instead
of using AddHandler cgi-script .cgi
, use
<FilesMatch "[^.]+\.cgi$"> SetHandler cgi-script </FilesMatch>
A file of a particular media-type can additionally be encoded a
particular way to simplify transmission over the Internet.
While this usually will refer to compression, such as
gzip
, it can also refer to encryption, such a
pgp
or to an encoding such as UUencoding, which is
designed for transmitting a binary file in an ASCII (text)
format.
The HTTP/1.1 RFC, section 14.11 puts it this way:
The Content-Encoding entity-header field is used as a modifier to the media-type. When present, its value indicates what additional content codings have been applied to the entity-body, and thus what decoding mechanisms must be applied in order to obtain the media-type referenced by the Content-Type header field. Content-Encoding is primarily used to allow a document to be compressed without losing the identity of its underlying media type.
By using more than one file extension (see section above about multiple file extensions), you can indicate that a file is of a particular type, and also has a particular encoding.
For example, you may have a file which is a Microsoft Word
document, which is pkzipped to reduce its size. If the
.doc
extension is associated with the Microsoft
Word file type, and the .zip
extension is
associated with the pkzip file encoding, then the file
Resume.doc.zip
would be known to be a pkzip'ed Word
document.
Apache sends a Content-encoding
header with the
resource, in order to tell the client browser about the
encoding method.
Content-encoding: pkzip
In addition to file type and the file encoding, another important piece of information is what language a particular document is in, and in what character set the file should be displayed. For example, the document might be written in the Vietnamese alphabet, or in Cyrillic, and should be displayed as such. This information, also, is transmitted in HTTP headers.
The character set, language, encoding and mime type are all
used in the process of content negotiation (See
mod_negotiation
) to determine
which document to give to the client, when there are
alternative documents in more than one character set, language,
encoding or mime type. All filename extensions associations
created with AddCharset
,
AddEncoding
, AddLanguage
and AddType
directives
(and extensions listed in the MimeMagicFile
) participate in this select process.
Filename extensions that are only associated using the AddHandler
, AddInputFilter
or AddOutputFilter
directives may be included or excluded
from matching by using the MultiviewsMatch
directive.
To convey this further information, Apache optionally sends
a Content-Language
header, to specify the language
that the document is in, and can append additional information
onto the Content-Type
header to indicate the
particular character set that should be used to correctly
render the information.
Content-Language: en, fr
Content-Type: text/plain; charset=ISO-8859-1
The language specification is the two-letter abbreviation
for the language. The charset
is the name of the
particular character set which should be used.
Description: | Maps the given filename extensions to the specified content charset |
---|---|
Syntax: | AddCharset charset extension
[extension] ... |
Context: | server config, virtual host, directory, .htaccess |
Override: | FileInfo |
Status: | Base |
Module: | mod_mime |
The AddCharset
directive maps the given
filename extensions to the specified content charset (the Internet
registered name for a given character encoding). charset
is the media
type's charset parameter for resources with filenames containing
extension. This mapping is added to any already in force,
overriding any mappings that already exist for the same
extension.
AddLanguage ja .ja AddCharset EUC-JP .euc AddCharset ISO-2022-JP .jis AddCharset SHIFT_JIS .sjis
Then the document xxxx.ja.jis
will be treated
as being a Japanese document whose charset is ISO-2022-JP
(as will the document xxxx.jis.ja
). The
AddCharset
directive is useful for both to
inform the client about the character encoding of the document so that
the document can be interpreted and displayed appropriately, and for content negotiation,
where the server returns one from several documents based on
the client's charset preference.
The extension argument is case-insensitive and can be specified with or without a leading dot. Filenames may have multiple extensions and the extension argument will be compared against each of them.