| Cloners :   | |
| Relates :   | 
FULL PRODUCT VERSION :
java version "1.8.0_102"
Java(TM) SE Runtime Environment (build 1.8.0_102-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode)
Reproduced with a default clone of OpenJDK jdk8u
FULL OS VERSION :
Linux pm-cluster-rhel7-1b 3.10.0-327.28.3.el7.x86_64 #1 SMP Fri Aug 12 13:21:05 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
(can be reproduced on any 64-bit Linux flavour)
A DESCRIPTION OF THE PROBLEM :
We have an application server that generates code when applications are deployed. Recently it started failing during the deploy of a large application with "java.lang.OutOfMemoryError: Metaspace" errors. Experimenting with the command-line metaspace configuration flags made no difference. The only thing that did work was to disable CMS entirely, but this is not a practical long-term solution.
To try to determine the root cause of the issue, it was investigated using a clone of the OpenJDK jdk8u Mercurial repository. Local builds of the JDK with extra debug logging were made. Eventually the bug was tracked down to an implementation error in hotspot/src/share/vm/memory/metaspace.cpp. The ChunkManager::list_index() method returns the wrong answer for humongous class metadata chunks if the chunk size happens to be the same size as a non-class metadata medium chunk (8K).
Chunk sizes are specified as so (from metaspace.cpp):
 77  enum ChunkSizes {    // in words.
 78    ClassSpecializedChunk = 128,
 79    SpecializedChunk = 128,
 80    ClassSmallChunk = 256,
 81    SmallChunk = 512,
 82    ClassMediumChunk = 4 * K,
 83    MediumChunk = 8 * K
 84  };
list_index() is a static method that returns the index of an appropriate freelist:
2330  ChunkIndex ChunkManager::list_index(size_t size) {
2331    switch (size) {
2332      case SpecializedChunk:
2333        assert(SpecializedChunk == ClassSpecializedChunk,
2334               "Need branch for ClassSpecializedChunk");
2335        return SpecializedIndex;
2336      case SmallChunk:
2337      case ClassSmallChunk:
2338        return SmallIndex;
2339      case MediumChunk:
2340      case ClassMediumChunk:
2341        return MediumIndex;
2342      default:
2343        assert(size > MediumChunk || size > ClassMediumChunk,
2344               "Not a humongous chunk");
2345        return HumongousIndex;
2346    }
2347  }
It's obvious looking at the code that if an 8K class metadata chunk is requested, this method is going to erroneously claim that it's a medium chunk not a humongous chunk. This leads to 4K chunks being allocated from medium chunk freelist, if any are available there, which aren't big enough to hold the 8K of data needed. Consequently, the allocation fails, is retried a couple of times, causes GC to be initiated, the allocation is subsequently tried again, but fails for the same reason, eventually causing the java.lang.OutOfMemoryError.
The error *only* occurs when there are free chunks available on the medium chunk freelist. If there aren't any there, new chunks *of the correct size* are allocated from virtual memory space and all is well.
THE PROBLEM WAS REPRODUCIBLE WITH -Xint FLAG: Did not try
THE PROBLEM WAS REPRODUCIBLE WITH -server FLAG: Yes
REGRESSION.  Last worked in version 7u80
STEPS TO FOLLOW TO REPRODUCE THE PROBLEM :
Load a class requiring an 8K class metadata chunk when there are 4K chunks available on the medium chunk freelist.
EXPECTED VERSUS ACTUAL BEHAVIOR :
Expected: The class should load successfully
Actual: A java.lang.OutOfMemoryError: Metaspace error occurs
ERROR MESSAGES/STACK TRACES THAT OCCUR :
Metaspace debug log showing a (failed) request for 8061 words being "satisfied" using a 4096 word chunk:
SpaceManager::grow_and_allocate for 8061 words 2627 words used 1469 words left
Metadata humongous allocation:
  word_size 0x0000000000001f7d
  chunk_word_size 0x0000000000002000
    chunk overhead 0x0000000000000005
ChunkManager::free_chunks_get: free_list 0x00007f57c00a3fc0 head 0x0000000104729c00 size 4096
ChunkManager::chunk_freelist_allocate: 0x00007f57c00a3f80 chunk 0x0000000104729c00  size 4096 count 292 Free chunk total 1285504  count 609
SpaceManager::add_chunk: 8) Metachunk: bottom 0x0000000104729c00 top 0x0000000104729c28 end 0x0000000104731c00 size 4096
    used 5 free 4091
REPRODUCIBILITY :
This bug can be reproduced often.
---------- BEGIN SOURCE ----------
Once the issue was understood an attempt was made to create a standalone test case that could reproduce it, but that effort has so far failed.
---------- END SOURCE ----------
CUSTOMER SUBMITTED WORKAROUND :
Disabling CMS GC is the only known effective workaround.
A patch against the OpenJDK that fixes the issue has been written, but it's too big to fit here.