Here's my understanding of how sampler streaming works:
Streaming samplers get their efficiency from:
a. Loading the initial "attack" into RAM (say 100 ms), and playing it from RAM. In this model, all "possible" attack samples are loaded into RAM so they can be quickly accessed (which also means that you'll get a better "response" when playing - different from generic latency, but that's another discussion).
I believe that some samplers (Kontakt?) let you determine how much sample is preloaded. This lets you ramp up or ramp down your memory.
b. If the sound is longer than the initial attack (e.g., you hold the key down for 4 seconds), the remainder of the samples for the keys that are in use (being played) will be into RAM and played from there, then they'll be released by the sampler, or the OS (depending on the model).
In most cases only part of a long sample will stay in memory, but there are potential optimizations that may be made to help performance.
A piano patch with 3 levels of velocity (1 sample per level) will take up less RAM (non-streamed samples) than a patch with 7 levels of velocity. In fact, since attacks are where you get the greatest variation in "tone", some libraries use some tricks on the remainder of the sample (e.g., open the filter for higher velocity notes, but use the same underlying sample).
Some libraries come with "lite" versions of the patches.
A couple tricks (and I'll use Kontakt as an example):
I have a split with bass (24 notes) on the bottom, a grand piano in the middle (36 notes), and violin on top (28 notes).
Setup A: I use Cantabile to manage the splits.
Result = I load the attack samples for all three instruments across their entire range, regardless of whether I can play them (e.g., high piano notes , low violin notes). Let's say all three instruments had an 88 note range with one velocity - I'd have to load 264 samples for those attacks (more if you have velocity switching in any of the libraries).
Setup B: I edit all three patches so that I only have the samples for the ranges I need. I save them (must do that) and rename them. Cantabile simply maps to the channels without needing to have any range filtering.
Result: Only the initial samples for the ranges in the reduced patches are loaded. I only load 88 samples , reducing my RAM usage greatly.
There are other tricks (that I remember from my Emulator II days) like:
Stretching a sample across multiple keys -
Stretching a sample across multiple keys at extreme ends of the keyboard (where they'll be used less and less noticeable)
Turning off "release" samples, or overtones, or using patches without "sostonuto" and other "overlayed" samples.
Here's an article on this if you want to read more.