본문으로 바로가기

Weak reference의 이해

category 개발이야기/Java 2017. 10. 22. 01:00
반응형

Weak reference (약한 참조)대한 글을 시작해 보려고 합니다.

먼저 자바(JVM)의 메모리를 구조를 이해해야 weak reference, soft reference, phantom reference를 보다 잘 이해 할 수 있습니다.

JVM에서 자바의 메모리가 어떤 구조를 갖는지는 다음번에 잘 정리해서 올리도록 하겠습니다.


예전에 제가 찾았던 weak reference에 대한 잘 정리된 글을 단순?? 번역해서 올립니다.


Strong references

First I need to start with a refresher on strong references. A strong reference is an ordinary Java reference, the kind you use every day. For example, the code:

StringBuffer buffer = new StringBuffer();

creates a new StringBuffer() and stores a strong reference to it in the variable buffer. Yes, yes, this is kiddie stuff, but bear with me. The important part about strong references -- the part that makes them "strong" -- is how they interact with the garbage collector. Specifically, if an object is reachable via a chain of strong references (strongly reachable), it is not eligible for garbage collection. As you don't want the garbage collector destroying objects you're working on, this is normally exactly what you want.

-> Strong references는 기존에 우리가 쓰는 일반적인 방법임.

위 예제처럼 StringBuffer를 선언하면 strongly reachable한 사슬로 연결되기 때문에 GC에 의해 collecting 되지 않는다.


When strong references are too strong

It's not uncommon for an application to use classes that it can't reasonably extend. The class might simply be marked final, or it could be something more complicated, such as an interface returned by a factory method backed by an unknown (and possibly even unknowable) number of concrete implementations. Suppose you have to use a class Widget and, for whatever reason, it isn't possible or practical to extend Widget to add new functionality.

What happens when you need to keep track of extra information about the object? In this case, suppose we find ourselves needing to keep track of each Widget's serial number, but the Widget class doesn't actually have a serial number property -- and because Widget isn't extensible, we can't add one. No problem at all, that's what HashMaps are for:

serialNumberMap.put(widget, widgetSerialNumber);

This might look okay on the surface, but the strong reference to widget will almost certainly cause problems. We have to know (with 100% certainty) when a particular Widget's serial number is no longer needed, so we can remove its entry from the map. Otherwise we're going to have a memory leak (if we don't remove Widgets when we should) or we're going to inexplicably find ourselves missing serial numbers (if we remove Widgets that we're still using). If these problems sound familiar, they should: they are exactly the problems that users of non-garbage-collected languages face when trying to manage memory, and we're not supposed to have to worry about this in a more civilized language like Java.

->widget 객체 사용시 확장이 불가능 하므로 hashmap을 사용하여 serial number를 부여 가능.

단 이때 widget은 strong reference 갖는데, 이때 문제가 발생. 예를 들어 더이상 해당 widget이 필요 없으면 해당 entry를 삭제하면 되나, 그렇지 않으면 memory leak이 발생함.

Another common problem with strong references is caching, particular with very large structures like images. Suppose you have an application which has to work with user-supplied images, like the web site design tool I work on. Naturally you want to cache these images, because loading them from disk is very expensive and you want to avoid the possibility of having two copies of the (potentially gigantic) image in memory at once.

Because an image cache is supposed to prevent us from reloading images when we don't absolutely need to, you will quickly realize that the cache should always contain a reference to any image which is already in memory. With ordinary strong references, though, that reference itself will force the image to remain in memory, which requires you (just as above) to somehow determine when the image is no longer needed in memory and remove it from the cache, so that it becomes eligible for garbage collection. Once again you are forced to duplicate the behavior of the garbage collector and manually determine whether or not an object should be in memory.

-> image를 cache에 로딩할 때 strong reference의 문제점. cache에서 안떨어져 나감..


Weak references

weak reference, simply put, is a reference that isn't strong enough to force an object to remain in memory. Weak references allow you to leverage the garbage collector's ability to determine reachability for you, so you don't have to do it yourself. You create a weak reference like this:

WeakReference<Widget> weakWidget = new WeakReference<Widget>(widget);

and then elsewhere in the code you can use weakWidget.get() to get the actual Widget object. Of course the weak reference isn't strong enough to prevent garbage collection, so you may find (if there are no strong references to the widget) that weakWidget.get() suddenly starts returning null.

To solve the "widget serial number" problem above, the easiest thing to do is use the built-in WeakHashMap class. WeakHashMap works exactly like HashMap, except that the keys (not the values!) are referred to using weak references. If a WeakHashMap key becomes garbage, its entry is removed automatically. This avoids the pitfalls I described and requires no changes other than the switch from HashMap to a WeakHashMap. If you're following the standard convention of referring to your maps via the Map interface, no other code needs to even be aware of the change.

-> weak references를 사용하면, GC가 reachability를 판단하는데 힌트를 줄수 있다.

위와 같은 형태로 weak reference를 생성하면 되며, get()함수로 해당 객체를 얻어 올수 있다. 다만 GC 대상이 될수 있기 때문에 갑자기 null이 반환되는 경우도 생긴다.

추가적으로 widget serial number 같은 경우 WeakHashMap을 사용하여, 따로 작업하지 않아도 메모리에서 해제되도록 할 수도 있다. (다른 코드에도 크게 영향을 주지 않음)

Reference queues

Once a WeakReference starts returning null, the object it pointed to has become garbage and the WeakReference object is pretty much useless. This generally means that some sort of cleanup is required; WeakHashMap, for example, has to remove such defunct entries to avoid holding onto an ever-increasing number of dead WeakReferences.

The ReferenceQueue class makes it easy to keep track of dead references. If you pass a ReferenceQueue into a weak reference's constructor, the reference object will be automatically inserted into the reference queue when the object to which it pointed becomes garbage. You can then, at some regular interval, process the ReferenceQueue and perform whatever cleanup is needed for dead references.

-> 일단 Weak reference에서 null을 반환하기 시작하면, 해당 object는 garbage로 판단되었다고 보면 된다. 따라서 cleanup 할 수 있는 code를 수행해야 한다.

Reference Queue를 weak reference 생성시 생성자에 넣으면, object을 상태를 tracking 해볼 수 있다. reference queue로 들어간다는건 해당 object가 gabage로 판단되었다는 의미이므로 queue 확인후 cleanup 하는 코드를 수행하면 된다.

Different degrees of weakness

Up to this point I've just been referring to "weak references", but there are actually four different degrees of reference strength: strong, soft, weak, and phantom, in order from strongest to weakest. We've already discussed strong and weak references, so let's take a look at the other two.

->실제로 weak references는 세가지 종류가 있다. soft, weak, phantom.

Soft references

soft reference is exactly like a weak reference, except that it is less eager to throw away the object to which it refers. An object which is only weakly reachable (the strongest references to it are WeakReferences) will be discarded at the next garbage collection cycle, but an object which is softly reachable will generally stick around for a while.

SoftReferences aren't required to behave any differently than WeakReferences, but in practice softly reachable objects are generally retained as long as memory is in plentiful supply. This makes them an excellent foundation for a cache, such as the image cache described above, since you can let the garbage collector worry about both how reachable the objects are (a strongly reachable object will never be removed from the cache) and how badly it needs the memory they are consuming.

-> weakly reachable 인 object는 다음 GC때 반드시 제거된다. 하지만 softly reachable인 경우 남아있는 메모리 량에 따라 삭제 여부가 결정된다. (여유 메모리가 적으면 삭제됨))

Phantom references

phantom reference is quite different than either SoftReference or WeakReference. Its grip on its object is so tenuous that you can't even retrieve the object -- its get() method always returns null. The only use for such a reference is keeping track of when it gets enqueued into aReferenceQueue, as at that point you know the object to which it pointed is dead. How is that different from WeakReference, though?

-> phantom reference는 soft reference와 weak reference와는 다르다. phantom reference의 경우 get() 사용시 언제나 null을 리턴하며, 보통 해당 object가 죽었는지 살았는지를 판단하기 위해서 사용한다.

The difference is in exactly when the enqueuing happens. WeakReferences are enqueued as soon as the object to which they point becomes weakly reachable. This is before finalization or garbage collection has actually happened; in theory the object could even be "resurrected" by an unorthodox finalize() method, but the WeakReference would remain dead. PhantomReferences are enqueued only when the object is physically removed from memory, and the get() method always returns null specifically to prevent you from being able to "resurrect" an almost-dead object.

-> 명확한 차이점은 reference queue에 engueuing 되는 시점인데, weak reference의 경우 객체가 weakly reachable 이라고 판단될때 enqueuing 된다. 아직 finalize()와 GC가 일어나지 않은 상태임.

이런경우 finalize() 내부가 잘못 구현되어 다시 strong reference를 갖도록 하는경우 객체가 부활 "resurrected" 할 수 있다. queue에 들어갔는데도 다시 객체가 살아나는 시스템 저하를 불러오는 멍청한 상태가...

하지만 phantom reference의 경우 해당 객체가 물리적 메모리에서 제거된 후에 enqueue 되기 때문에 get()함수에서 null이 반환되고 멍청한 finalize()때문에 다시 살아날 수 가 없다. (사실 phantom을 사용해서 finalize()를 대체하기 때문에 이런경우 finalize()함수를 override해서 사용하지 않음.)

What good are PhantomReferences? I'm only aware of two serious cases for them: first, they allow you to determine exactly when an object was removed from memory. They are in fact the only way to determine that. This isn't generally that useful, but might come in handy in certain very specific circumstances like manipulating large images: if you know for sure that an image should be garbage collected, you can wait until it actually is before attempting to load the next image, and therefore make the dreaded OutOfMemoryError less likely.

-> 언제쓸가? 그럼, 이 phantom reference를..

 첫번째로, 명확하게 메모리에서 제거되었는지 확인이 필요한 경우에 사용(아주 특수한 상황들), 예를 들어 대용량 이미지의 지속적인 로딩이 필요할 때, (이미지 로딩 -> 이미지 로딩이 -> 이미지 로딩..이런식으로) 다음 이미지 로딩전에 이전 이미지가 확실하게 메모리에서 제거되었는지 확인하고 제거 될때까지 기다렸다가 다음 이미지를 로딩하도록 코딩이 가능하다. 이러면 OOM이 일어 나는걸 조금은..덜 무서워 할 수 있다. (난 실제로 봤다..저가 단말에서 이미지 로딩 때문에 OOM이 일어나는걸...)

Second, PhantomReferences avoid a fundamental problem with finalization: finalize() methods can "resurrect" objects by creating new strong references to them. So what, you say? Well, the problem is that an object which overrides finalize() must now be determined to be garbage in at least two separate garbage collection cycles in order to be collected. When the first cycle determines that it is garbage, it becomes eligible for finalization. Because of the (slim, but unfortunately real) possibility that the object was "resurrected" during finalization, the garbage collector has to run again before the object can actually be removed. And because finalization might not have happened in a timely fashion, an arbitrary number of garbage collection cycles might have happened while the object was waiting for finalization. This can mean serious delays in actually cleaning up garbage objects, and is why you can get OutOfMemoryErrors even when most of the heap is garbage.

-> 두번째로 phantom reference를 사용해서 고질적인 finalization의 문제를 피할 수 있다. (위에서 잠깐 언급). 

finalize() 함수에서 다시 strong reference를 갖도록 코딩이 되어 있으면 해당 객체가 부활한다. "resurrect!!!" finalize()가 불렸다는건 garbage로 판단되었기 때문인데, 다시 살아났기 때문에 다음 GC cycles에 의해서 collecting된다. 

적어도 두번의 GC cycle을 거쳐야 collecting 되는것도 문제지만, 해당 부활된 reference가 삭제되기 위해서 finalize되는 시점도 언제가 될지 모르기 때문에 GC가 계속 해당 finalization을 기다려야 하고, 이는 GC가 동작하는데 심각한 지연을 유발한다. 또한 이는 GC가 계속 발생하는데도 OOM이 발생하는 원인이 되기도 한다.

With PhantomReference, this situation is impossible -- when a PhantomReference is enqueued, there is absolutely no way to get a pointer to the now-dead object (which is good, because it isn't in memory any longer). Because PhantomReference cannot be used to resurrect an object, the object can be instantly cleaned up during the first garbage collection cycle in which it is found to be phantomly reachable. You can then dispose whatever resources you need to at your convenience.

-> Phantom reference의 경우 메모리가 해제된 상태에서만 enqueue되기 때문에 finalize에서 resurrect 될수가 없다. 따라서 GC의 첫번째 cycle에 phantomly reachable 객체는 바로 collacting 될 수 있다. 

객체가 사라지고 나서 수행해야되는 이후 작업들을 finalize()에 넣는대신 Phantom reference를 이용하여 객체가 제거된것을 확인한 후에 수행하도록 코드를 수정할 수 있다. 

Arguably, the finalize() method should never have been provided in the first place. PhantomReferences are definitely safer and more efficient to use, and eliminating finalize() would have made parts of the VM considerably simpler. But, they're also more work to implement, so I confess to still using finalize() most of the time. The good news is that at least you have a choice.

-> finalize()를 쓰는것 보다 phantom reference를 사용하는것이 훨씬더 안전하고 효율적이며, VM을 simple하게 만들 수 있다.

Conclusion

I'm sure some of you are grumbling by now, as I'm talking about an API which is nearly a decade old and haven't said anything which hasn't been said before. While that's certainly true, in my experience many Java programmers really don't know very much (if anything) about weak references, and I felt that a refresher course was needed. Hopefully you at least learned a little something from this review.

출처: http://weblogs.java.net/blog/2006/05/04/understanding-weak-references

반응형

'개발이야기 > Java' 카테고리의 다른 글

Java의 동기화 Synchronized 개념 정리#1  (20) 2017.11.19
Java의 메모리 관리 - Weak, Soft, Phantom reference 예제  (0) 2017.10.29
람다의 내부동작 #2  (0) 2017.09.26
람다의 내부동작 #1  (0) 2017.09.24
Java 8 Comparator  (0) 2017.09.14