Problems with Java garbage collector and memory

用Java垃圾收集器和记忆问题

问题 (Question)

I am having a really weird issue with a Java application.

Essentially it is a web page that uses magnolia (a cms system), there are 4 instances available on production environment. Sometimes the CPU goes to 100% in a java process.

So, first approach was to make a thread dump, and check the offending thread, what I found was weird:

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000ce37800 nid=0x7dcb runnable 
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000ce39000 nid=0x7dcc runnable 

Ok, that is pretty weird, I have never had a problem with the garbage collector like that, so the next thing we did was to activate JMX and using jvisualvm inspect the machine: the heap memory usage was really high (95%).

Naive approach: Increase memory, so the problem takes more time to appear, result, on the restarted server with increased memory (6 GB!) the problem appeared 20 hours after restart while on other servers with less memory (4GB!) that had been running for 10 days, the problem took still a few more days to reappear. Also, I tried to use the apache access log from the server failing and use JMeter to replay the requests into a local server in an attemp to reproduce the error... it did not work either.

Then I investigated the logs a little bit more to find this errors

info.magnolia.module.data.importer.ImportException: Error while importing with handler [brightcoveplaylist]:GC overhead limit exceeded
at info.magnolia.module.data.importer.ImportHandler.execute(ImportHandler.java:464)
at info.magnolia.module.data.commands.ImportCommand.execute(ImportCommand.java:83)
at info.magnolia.commands.MgnlCommand.executePooledOrSynchronized(MgnlCommand.java:174)
at info.magnolia.commands.MgnlCommand.execute(MgnlCommand.java:161)
at info.magnolia.module.scheduler.CommandJob.execute(CommandJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

Another example

    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:2894)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at java.lang.StackTraceElement.toString(StackTraceElement.java:175)
    at java.lang.String.valueOf(String.java:2838)
    at java.lang.StringBuilder.append(StringBuilder.java:132)
    at java.lang.Throwable.printStackTrace(Throwable.java:529)
    at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)
    at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
    at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
    at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)
    at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
    at info.magnolia.module.templatingkit.functions.STKTemplatingFunctions.getReferencedContent(STKTemplatingFunctions.java:417)
    at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLinkNode(InternalLinkModel.java:90)
    at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLink(InternalLinkModel.java:66)
    at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at freemarker.ext.beans.BeansWrapper.invokeMethod(BeansWrapper.java:866)
    at freemarker.ext.beans.BeanModel.invokeThroughDescriptor(BeanModel.java:277)
    at freemarker.ext.beans.BeanModel.get(BeanModel.java:184)
    at freemarker.core.Dot._getAsTemplateModel(Dot.java:76)
    at freemarker.core.Expression.getAsTemplateModel(Expression.java:89)
    at freemarker.core.BuiltIn$existsBI._getAsTemplateModel(BuiltIn.java:709)
    at freemarker.core.BuiltIn$existsBI.isTrue(BuiltIn.java:720)
    at freemarker.core.OrExpression.isTrue(OrExpression.java:68)

Then I find out that such problem is due to the garbage collector using a ton of CPU but not able to free much memory

Ok, so it is a problem with the MEMORY that manifests itself in the CPU, so If the memory usage problem is solved, then the CPU should be fine, so I took a heapdump, unfortunatelly it was just too big to open it (the file was 10GB), anyway I run the server locallym loaded it a little bit and took a heapdump, after opening it, I found something interesting:

There are a TON of instances of

AbstractReferenceMap$WeakRef  ==> Takes 21.6% of the memory, 9 million instances
AbstractReferenceMap$ReferenceEntry  ==> Takes 9.6% of the memory, 3 million instances

In addition, I have found a Map which seems to be used as a "cache" (horrible but true), the problem is that such map is NOT synchronized and it is shared among threads (being static), the problem could be not only concurrent writes but also the fact that with lack of synchronization, there is no guarantee that thread A will see the changes done to the map by thread B, however, I am unable to figure out how to link this suspicious map using the memory eclipse analyzer, as it does not use the AbstracReferenceMap, it is just a normal HashMap.

Unfortunately, we do not use those classes directly (obviously the code uses them, but not directly), so I have seem to hit a dead end.

Problems for me are

  1. I cannot reproduce the error
  2. I cannot figure out where the hell the memory is leaking (if that is the case)

Any ideas at all?

我有一个很奇怪的问题,一个Java应用。

基本上它是一个网页,以木兰(CMS系统),有4的情况下,对生产环境。有时,CPU是100%在一个Java程序。

所以,第一种方法是使一个线程转储,并检查出错的线程,我发现什么奇怪的:

"GC task thread#0 (ParallelGC)" prio=10 tid=0x000000000ce37800 nid=0x7dcb runnable 
"GC task thread#1 (ParallelGC)" prio=10 tid=0x000000000ce39000 nid=0x7dcc runnable 

好的,这是很奇怪的,我从未有过的垃圾收集器那样的问题,所以我们做的下一件事就是激活JMX和使用jvisualvm检查机器:堆内存使用量很高(95%)。

幼稚的方法:增加内存,所以问题需要更多的时间出现,结果,在重新启动服务器增加内存(6 GB!)20小时后,重新启动时出现的问题,在其他服务器上使用较少的内存(4GB!)已经运行10天,问题仍然花了几天来再现。另外,我想使用服务器未使用JMeter重放请求到本地服务器在试图重现错误Apache的访问日志…它不工作。

然后我的日志一点点找到这个错误

info.magnolia.module.data.importer.ImportException: Error while importing with handler [brightcoveplaylist]:GC overhead limit exceeded
at info.magnolia.module.data.importer.ImportHandler.execute(ImportHandler.java:464)
at info.magnolia.module.data.commands.ImportCommand.execute(ImportCommand.java:83)
at info.magnolia.commands.MgnlCommand.executePooledOrSynchronized(MgnlCommand.java:174)
at info.magnolia.commands.MgnlCommand.execute(MgnlCommand.java:161)
at info.magnolia.module.scheduler.CommandJob.execute(CommandJob.java:91)
at org.quartz.core.JobRunShell.run(JobRunShell.java:216)
at org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:549)
    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded

另一个例子

    Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOf(Arrays.java:2894)
    at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:117)
    at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:407)
    at java.lang.StringBuilder.append(StringBuilder.java:136)
    at java.lang.StackTraceElement.toString(StackTraceElement.java:175)
    at java.lang.String.valueOf(String.java:2838)
    at java.lang.StringBuilder.append(StringBuilder.java:132)
    at java.lang.Throwable.printStackTrace(Throwable.java:529)
    at org.apache.log4j.DefaultThrowableRenderer.render(DefaultThrowableRenderer.java:60)
    at org.apache.log4j.spi.ThrowableInformation.getThrowableStrRep(ThrowableInformation.java:87)
    at org.apache.log4j.spi.LoggingEvent.getThrowableStrRep(LoggingEvent.java:413)
    at org.apache.log4j.AsyncAppender.append(AsyncAppender.java:162)
    at org.apache.log4j.AppenderSkeleton.doAppend(AppenderSkeleton.java:251)
    at org.apache.log4j.helpers.AppenderAttachableImpl.appendLoopOnAppenders(AppenderAttachableImpl.java:66)
    at org.apache.log4j.Category.callAppenders(Category.java:206)
    at org.apache.log4j.Category.forcedLog(Category.java:391)
    at org.apache.log4j.Category.log(Category.java:856)
    at org.slf4j.impl.Log4jLoggerAdapter.error(Log4jLoggerAdapter.java:576)
    at info.magnolia.module.templatingkit.functions.STKTemplatingFunctions.getReferencedContent(STKTemplatingFunctions.java:417)
    at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLinkNode(InternalLinkModel.java:90)
    at info.magnolia.module.templatingkit.templates.components.InternalLinkModel.getLink(InternalLinkModel.java:66)
    at sun.reflect.GeneratedMethodAccessor174.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:622)
    at freemarker.ext.beans.BeansWrapper.invokeMethod(BeansWrapper.java:866)
    at freemarker.ext.beans.BeanModel.invokeThroughDescriptor(BeanModel.java:277)
    at freemarker.ext.beans.BeanModel.get(BeanModel.java:184)
    at freemarker.core.Dot._getAsTemplateModel(Dot.java:76)
    at freemarker.core.Expression.getAsTemplateModel(Expression.java:89)
    at freemarker.core.BuiltIn$existsBI._getAsTemplateModel(BuiltIn.java:709)
    at freemarker.core.BuiltIn$existsBI.isTrue(BuiltIn.java:720)
    at freemarker.core.OrExpression.isTrue(OrExpression.java:68)

然后我发现这样的问题是由于使用一吨的CPU垃圾收集器,但不能够自由的记忆

好吧,所以是体现在CPU内存问题,如内存使用问题解决了,那么CPU应该很好,所以我带了一堆转储,不幸的是他只是太大打开它(文件10GB),反正我运行服务器locallym加载它点了一堆转储,打开它后,我发现了一些有趣的东西:

有一吨的实例

AbstractReferenceMap$WeakRef  ==> Takes 21.6% of the memory, 9 million instances
AbstractReferenceMap$ReferenceEntry  ==> Takes 9.6% of the memory, 3 million instances

另外,我发现了一个地图,这似乎是作为一个“缓存”(可怕的但真的),但问题是这样的地图是不同步的,它是共享的线程间的(静态),这个问题不仅可并行写入还因为缺乏同步,有没有保证线程将看到所做的更改被线程B,地图,但是,我不知道如何联系这个可疑的地图使用内存Eclipse分析仪,因为它不使用abstracreferencemap,它只是一个正常的HashMap。

不幸的是,我们不使用这些类的直接(明显的代码使用它们,而不是直接的),所以我似乎陷入了死胡同。

我的问题是

  1. 我无法复制的错误
  2. 我想不出地狱的内存泄漏的地方(在这种情况下)

在所有的任何想法吗?

最佳答案 (Best Answer)

The 'no-op' finalize() methods should definitely be removed as they are likely to make any GC performance problems worse. But I suspect that you have other memory leak issues as well.

Advice:

  • First get rid of the useless finalize() methods.

  • If you have other finalize() methods, consider getting rid of them. (Depending on finalization to do things is generally a bad idea ...)

  • Use memory profiler to try to identify the objects that are being leaked, and what is causing the leakage. There are lots of SO Questions ... and other resources on finding leaks in Java code. For example:


Now to your particular symptoms.

First of all, the place where the OutOfMemoryErrors were thrown is probably irrelevant.

However, the fact that you have huge numbers of AbstractReferenceMap$WeakRef and AbstractReferenceMap$ReferenceEntry objects is a string indication that something in your application or the libraries it is using is doing a huge amount of caching ... and that that caching is implicated in the problem. (The AbstractReferenceMap class is part of the Apache Commons Collections library. It is the superclass of ReferenceMap and ReferenceIdentityMap.)

You need to track down the map object (or objects) that those WeakRef and ReferenceEntry objects belong to, and the (target) objects that they refer to. Then you need to figure out what is creating it / them and figure out why the entries are not being cleared in response to the high memory demand.

  • Do you have strong references to the target objects elsewhere (which would stop the WeakRefs from being broken)?

  • Is / are the map(s) being used incorrectly so as to cause a leak. (Read the javadocs carefully ...)

  • Are the maps being used by multiple threads without external synchronization? That could result in corruption, which potentially could manifest as a massive storage leak.


Unfortunately, these are only theories and there could be other things causing this. And indeed, it is conceivable that this is not a memory leak at all.


Finally, your observation that the problem is worse when the heap is bigger. To me, this is still consistent with a Reference / cache-related issue.

  • Reference objects are more work for the GC than regular references.

  • When the GC needs to "break" a Reference, that creates more work; e.g. processing the Reference queues.

  • Even when that happens, the resulting unreachable objects still can't be collected until the next GC cycle at the earliest.

So I can see how a 6Gb heap full of References would significantly increase the percentage of time spent in the GC ... compared to a 4Gb heap, and that could cause the "GC Overhead Limit" mechanism to kick in earlier.

But I reckon that this is an incidental symptom rather than the root cause.

“无操作”finalize()方法绝对应该为他们可能会使任何色谱性能问题恶化删除。但我怀疑你有其他的内存泄漏问题以及。

建议

  • 第一次摆脱无用finalize()方法。

  • 如果你有其他的finalize()方法,考虑摆脱他们。(取决于完成要做的事情是一个坏主意……)

  • 使用内存分析器来确定被泄露的对象,和是什么原因造成的泄漏。有很多这样的问题…在Java代码中查找泄漏和其他资源。例如:


现在你的特定的症状。

首先,地方OutOfMemoryError被可能是无关紧要的。

然而,事实上,你有大量的AbstractReferenceMap$WeakRefAbstractReferenceMap$ReferenceEntry对象是一个字符串,显示在你的应用程序或库是用什么做的大量的缓存…那缓存被牵连的问题。(的AbstractReferenceMap类是Apache Commons图书馆收藏的一部分。它的父类ReferenceMapReferenceIdentityMap。)

你需要追踪地图对象(或对象),WeakRefReferenceEntry物体属于,和(目标),他们的对象的引用。然后你需要弄清楚什么是创造/他们明白为什么条目是不响应于高内存需求的清除。

  • 你有很强的参考目标的地方(这将停止weakrefs碰碎)?

  • 有地图(S)被不当使用而引起的泄漏。(仔细阅读……javadocs)

  • 是图没有外部同步多个线程使用?这可能会导致腐败,这可能会表现为一个巨大的存储泄漏。


不幸的是,这些都只是理论,可能还有其他的事情使这。事实上,这是可以想象的,这不是一个内存泄漏在所有。


最后,你的观察,这个问题更为严重,堆在大。对我来说,这仍然是一致的Reference/缓存相关的问题。

  • Reference对象的引用更比普通的GC的工作。

  • 当GC需要“休息”Reference,创造更多的工作;如处理参考队列。

  • 即使当它发生时,所产生的不可访问的对象也不一定会最早要到下一个GC周期收集。

所以我可以看到一个完整的参考6GB堆将大幅增加花费的时间百分比在GC…相比一个4GB的堆,并可能导致“GC开销限制”机制在早些时候踢。

但我认为这是一个偶然的症状而不是根本原因。

答案 (Answer) 2

There are a number of possibilities, perhaps some of which you've explored.

It's definitely a memory leak of some sort.

If your server has user sessions, and your user sessions aren't expiring or being disposed of properly when the user is inactive for more than X minutes/hours, you will get a buildup of used memory.

If you have one or more maps of something that your program generates, and you don't clear the map of old/unneeded entries, you could again get a buildup of used memory. For example, I once considered adding a map to keep track of process threads so that a user could get info from each thread, until my boss pointed out that at no point were finished threads getting removed from the map, so if the user stayed logged in and active, they would hold onto those threads forever.

You should try doing a load test on a non-production server where you simulate normal usage of your app by large numbers of users. Maybe even limit the server's memory even lower than usual.

Good luck, memory issues are a pain to track down.

有很多的可能性,也许你的探索。

这肯定是某种形式的内存泄漏。

如果你的服务器的用户会话,和你的用户会话过期或不被妥善处理,当用户处于非活动状态超过十分钟/小时,你会得到一个增强使用的内存。

如果你有一个或多个地图的东西,你的程序生成的,和你不清楚老/不必要的条目的地图,你可以再增加使用的内存。例如,我曾经考虑过加入地图跟踪以便用户可以从每个线程的线程获取信息的过程,直到我的老板指出,没有一点结束线程从映射中移除,如果用户在登录的和积极的,他们会抓住这些线程永远。

你应该试着做一个非生产服务器,你模仿你的应用程序的大量用户正常使用负载测试。甚至限制服务器的内存比平时更低。

祝你好运,记忆问题追踪疼痛。

答案 (Answer) 3

My guess is that you have automated import running which invokes some instance of ImportHandler. That handler is configured to make a backup of all the nodes it's going to update (I think this is default option), and since you have probably a lot of data in your data type, and since all of this is done in session you run out of memory. Try to find out which import job it is and disable backup for it.

HTH, Jan

我的猜测是,你有自动进口运行它调用的实例importhandler。该处理程序被配置为使所有节点要更新备份(我想这是默认选项),因为你可能有很多你的数据类型的数据,因为所有这一切是在会话你耗尽内存。试图找出哪些进口工作是和它禁用备份。

心连心,

答案 (Answer) 4

With a difficult debugging problem, you need to find a way to reproduce it. Only then will you be able to test experimental changes and determine if they make the problem better or worse. In this case, I'd try writing loops that rapidly create & delete server connections, that create a server connection and rapidly send it memory-expensive requests, etc.

After you can reproduce it, try reducing the heap size to see if you can reproduce it faster. But do that second since a small heap might not hit the "GC overhead limit" which means the GC is spending excessive time (98% by some measure) trying to recover memory.

For a memory leak, you need to figure out where in the code it's accumulating references to objects. E.g. does it build a Map of all incoming network requests? A web search https://www.google.com/search?q=how+to+debug+java+memory+leaks shows many helpful articles on how to debug Java memory leaks, including tips on using tools like the Eclipse Memory Analyzer that you're using. A search for the specific error message https://www.google.com/search?q=GC+overhead+limit+exceeded is also helpful.

The no-op finalize() methods shouldn't cause this problem but they may well exacerbate it. The doc on finalize() reveals that having a finalize() method forces the GC to twice determine that the instance is unreferenced (before and after calling finalize()).

So once you can reproduce the problem, try deleting those no-op finalize() methods and see if the problem takes longer to reproduce.

It's significant that there are many AbstractReferenceMap$WeakRef instances in memory. The point of a weak reference is to refer to an object without forcing it to stay in memory. AbstractReferenceMap is a Map that lets one make the keys and/or values be weak references or soft references. (The point of a soft reference is to try to keep an object in memory but let the GC free it when memory gets low.) Anyway, all those WeakRef instances in memory are probably exacerbating the problem but shouldn't keep the referenced Map keys/values in memory. What are they referring to? What else is referring to those objects?

一个调试困难的问题,你需要找到一个方法来复制它。只有那时你才能测试实验的变化,确定他们是否更好或更坏,使问题。在这种情况下,我会尝试写循环,快速创建和删除服务器连接,创建一个服务器连接并迅速把它存储昂贵的请求,等。

之后你可以复制它,尽量减少堆大小是否可以重现它更快。但做第二自一小堆不可能击中“GC开销限制”这意味着GC花费过多的时间(98%的一些措施)试图恢复记忆。

一个内存泄漏,你需要找出代码中的积累对对象的引用。例如,它建立一个地图的所有传入的请求? Webhttps://www.google.com/search?q =如何+ + + + +调试Java内存泄漏有许多有用的文章如何调试Java内存泄漏,包括使用工具提示Eclipse内存分析器你正在使用的。一种特定的错误消息搜索https://www.google.com/search?q = GC +费用+ +超过极限也是很有帮助的。

无OPfinalize()方法不应该导致这个问题但是他们很可能会加剧。在DOC上finalize()揭示了具有finalize()法军的气相色谱法两次确定该实例是无用的(之前和之后调用finalize())。

所以一旦你可以重现这个问题,请尝试删除那些没有OPfinalize()方法和问题是否需要更长的时间来重现。

它的显着,有很多AbstractReferenceMap$WeakRef实例存储。一点弱引用是指一个对象不强迫它留在记忆里。abstractreferencemap是一幅地图,让一把钥匙和/或值弱引用或软引用。(一点软参考是保持在内存中的对象,让GC自由时,记忆就低。)总之,所有这些实例在内存中之可能加剧了问题,但不应该参考地图键/值存储在存储器中。他们指的是什么?什么是指那些对象?

答案 (Answer) 5

You say that you have already tried jvisualvm, to inspect the machine. Maybe, try it again, like this:

  • This time look at the "Sampler -> Memory" tab.

  • It should tell you which (types of) objects occupy the most memory.

  • Then find out where such objects are usually created and removed.

你说你已经尝试jvisualvm,检查机。也许,再试一次,像这样:

  • Memory" tab.">这一次看“采样器->记忆”选项卡。

  • 它会告诉你它(类型)的对象占用内存最多。

  • 然后找出这样的对象通常是创建和删除。

答案 (Answer) 6

It appears that your memory leaks are emanating from your arrays. The garbage collector has trouble identifying object instances that were removed from arrays, therefore would not be collected for releasing of memory. My advice is when you do remove an object from an array, assign the former object's position to null, therefore the garbage collector can realize that it is a null object, and remove it. Doubt this will be your exact problem, but it is always good to know these things, and check if this is your problem.

It is also good to assign an object instance to null when you need to remove it/clean it up. This is because the finalize() method is sketchy and evil, and sometimes will not be called by the garbage collector. The best workaround for this is to call it (or another similar method) yourself. That way, you are assured that garbage cleanup was performed successfully. As Joshua Bloch said in his book: Effective Java, 2nd edition, Item 7, page 27: Avoid finalizers. "Finalizers are unpredictable, often dangerous and generally unnecessary". You can see the section here.

Because there is no code displayed, I cannot see if any of these methods can be useful, but it is still worth knowing these things. Hope these tips help you!

看来你的内存泄漏是从你的阵列发出的。垃圾收集器已识别的对象实例,从阵列的麻烦,因此不会收集释放内存。我的建议是,当你从数组中删除一个对象,将原对象的位置null因此,垃圾收集器可以意识到它是一个null对象,并将其拆下。怀疑这将是你的问题,但它是很好的知道这些事情,和检查,如果这是你的问题。

指定一个对象实例,这也是好的null当您需要删除它/它打扫干净。这是因为finalize()方法是粗略的和邪恶的,有时将不会被垃圾回收器。为这个最好的办法是给它自己(或者其他类似的方法)。这样,你放心,垃圾清理顺利进行。正如约书亚布洛赫在他的书中说:有效的Java项目,第二版,7,27页:避免终结器。”终结器是不可预知的,往往是危险的,一般不必要的”。你可以看到的部分在这里

因为没有显示源代码,我不知道如果任何这些方法可能是有用的,但它仍然是值得了解的这些东西。希望这些建议能够帮助你!

答案 (Answer) 7

Try a tool that locates the leaks in your source code such as plumbr

尝试一种工具,定位泄漏你的源代码等plumbr

答案 (Answer) 8

  • A lot of times 'weird' errors can be caused by java agents plugged into the JVM. If you have any agents running (e.g. jrebel/liverebel, newrelic, jprofiler), try running without them first.
  • Weird things can also happen when running JVM with non-standard parameters (-XX); certain combinations are known to cause problems; which parameters are you using currently?
  • Memory leak can also be in Magnolia itself, have you tried googling "magnolia leak"? Are you using any 3rd-party magnolia modules? If possible, try disabling/removing them.

The problem might be connected to just one part of your You can try reproducing the problem by "replaying" your access logs on your staging/development server.

If nothing else works, if it were me, I would do the following: - trying to replicate the problem on an "empty" Magnolia instance (without any of my code) - trying to replicate the problem on an "empty" Magnolia instance (without 3rd party modules) - trying to upgrade all software (magnolia, 3rd-party modules, JVM) - finally try to run the production site with YourKit and try to find the leak

  • 很多时候,怪异的误差可以通过Java代理JVM引起插入。如果你有任何代理上运行(例如,JRebel / liverebel,newrelic,试运行JProfiler),没有他们的第一。
  • 奇怪的事情也可以发生在运行中的JVM的非标准参数(- XX);某些组合是已知的导致的问题;你用当前的参数?
  • 内存泄漏也可以在木兰本身,你试过用谷歌搜索“木兰泄漏”?你使用任何第三方玉兰模块?如果可能的话,尝试禁用/删除它们。

这个问题可能被连接到的只是其中的一部分你可以尝试再现问题的“回放”你的访问日志暂存/开发服务器上。

如果没有别的工作,如果是我,我要做到以下几点: -试图复制的问题上的“空”木兰实例(没有我的代码) -试图复制的问题上的“空”木兰实例(无第三方模块) -试图升级所有的软件(木兰,第三方模块,JVM)最后尝试运行生产现场和YourKit和尝试

答案 (Answer) 9

As recommended above, I'd get in touch with the devs of Magnolia, but meanwhile:

You are getting this error because the GC doesn't collect much on a run

The concurrent collector will throw an OutOfMemoryError if too much time is being spent in garbage collection: if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown.

Since you can't change the implementation, I would recommend changing the config of the GC, in a way that runs less frequently, so it would be less likely to fail in this way.

Here is a example config just to get you started on the parameters, you would have to figure out your sweet spot. The logs of the GC will probably be of help for that

My VM params are as follow: -Xms=6G -Xmx=6G -XX:MaxPermSize=1G -XX:NewSize=2G -XX:MaxTenuringThreshold=8 -XX:SurvivorRatio=7 -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled -XX:+CMSPermGenSweepingEnabled -XX:CMSInitiatingOccupancyFraction=60 -XX:+HeapDumpOnOutOfMemoryError -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -Xloggc:logs/gc.log

正如上面推荐的,我会在木兰DEVS的触摸,但同时:

你得到这个错误是因为GC不收集多跑

如果太多的并发收集器将时间浪费在垃圾收集抛出OutOfMemoryError错误:如果超过98%的总时间花费在垃圾收集和少于2%的堆恢复,将抛出一个OutOfMemoryError。

既然你不能改变实现,我会建议改变GC的配置,一种是不频繁,因此不太可能在这样的失败。

下面是一个例子,只是为了让你开始配置参数,你必须找出你的甜蜜点。GC的日志可能会帮助,

我的虚拟机的参数如下: -Xms = 6G -Xmx = 6G - XX:maxpermsize=1G - XX:newsize = 2G - XX:maxtenuringthreshold = 8 - XX:survivorratio = 7 - XX:+ useconcmarksweepgc - XX:+ cmsclassunloadingenabled - XX:+ cmspermgensweepingenabled - XX:cmsinitiatingoccupancyfraction = 60 - XX:+ heapdumponoutofmemoryerror - XX:+ printgcdetails - XX:+ printgctimestamps - XX:+ printtenuringdistribution - xloggc:记录

本文翻译自StackoverFlow,英语好的童鞋可直接参考原文:http://stackoverflow.com/questions/22081505