Thanks Grindstone for encouraging. I have read Allen's post 2 days ago. But the approach he uses is not by listening, but by calculating the value of inductors and capacitors which I may not want to do. My measured results and simulations can do a better job than a calculator.
I want to select the woofer/ midrange/ tweeter rolloff by listening irrespective of how they measure. For eg. the simulation/ calculator may tell me 0.8mH inductor for low pass, but by listening, I may prefer maybe a 0.35mH inductor as a correct rolloff. I know it won't be easy as it also depends on the music used for designing which itself can vary a lot. I am sure to have some fun this weekend.
I'm thinking that flying completely blind probably isn't going to be as efficient a use of your weekend (or, arguably, useful longer-term) as a weekend deserves

. My sense is that bipolar radiation isn't going to clarify your insights, but muddle them unless your room is almost anechoic.
What I'd do in your position (applaud the inquiry, BTW) is start at the lowest level. Before that, though--a question & comment.
Using the same program material? My suggestion is to keep the program material consistent and short selections. Pick things that elicit the extremes as well, you only really need a handful or fewer of selections. Pick things you sort of like but don't love--because this exercise will destroy your appreciation of things you completely love.
How fiddly it is depends on where you're crossing, as you know. If you're aiming to investigate between about 1.5k and 3.5kHz, you're sunk--leave that spot for last--that spot is crazy-touchy. Humans (some) can ferret-out 1/2 dB changes around there--look at the high-end JBL's that ship to Japan--they have 1/2 dB user adjustments in the "presence region".
Converge from the top and bottom, binary-search. Just like the eye doctor--is it better or worse? "Next." Keep notes.
I assume you have many many hours of ear-time on each driver but suggest you start with one exclusively to get your mental baseline for what each does (and doesn't do!). Listen to anything/everything on each for quite a while. And I wouldn't do it OB, even if that's your ultimate system config. This is about the process and making it efficient. What I'd do is slam-out a big, big test box and make provisions for mounting each AND both drivers. It can be styrofoam or whatever, too--just brace it & fill it up with polyfill or fiberglass or whatever so you're not focused on standing waves--just the drivers. If you have a choice (ie not scrounging existing materials), maybe pick whatever has the biggest Vas and that's your minimum Vb...I'd go 25% larger than that if it's not insane as a first thought. If they're old-old (big Vas), well, you have to fit it in the room so do what you can tolerate. For "extra credit", keep adding blocks/volume eaters until you find the best size for each driver--there will be one. What this gets you is the best the driver can do in that box--it separates those warts/deficiencies and drills the range of capabilities into your ears. Having an enclosure of some sort unloads a bunch of reverberant cues that could have you chasing down blind alleys. Your ears (and brain) already have a bunch of "free-air" knowledge about them--and you know it's not what you want because you're doing this experiment. Maybe you already have "standard" test boxes you can use, being an avid diy'er?
Some things just "don't go together", IMO. Kevlar and aluminum. Paper and plastic, etc. Just examples. The point is, be honest in assessment if you are trying to achieve something that's approximately impossible
Think about it like cooking. You are trying to make complementary things. In fact, if I gather correctly, the precise nature of the inquiry regards complementarity. Certain ingredient combinations just don't work. If you can identify those things first, you've saved a lot of time. I know you're hunting rolloffs, but to know desired rolloffs, you have to know the drivers really well by themselves. It may be that those two things just don't go together where you want them to for myriad reasons. Establishing if the objective is reasonably possible early saves you grief and time.
Next, give yourself a chance by doing the calculations. If you want to leave the mic alone, do so. My advice is to teach someone else how you want things measured and have them measure each step and save the files while you stay "blind" in your process. I think you have to calculate to have any chance at succeeding and making sense of it. Absent some structure, the conclusions become matters of chance. Pretend your task is a 2-way and start a fresh design at the ground level.
Do the basics. How far does each stay mostly pistonic? Are they going to be mounted within a quarter wavelength of the XO each other in the end? What are their levels? If not, there's your first problem, etc. If only the mics stay off-limits, sim your brains out from a fresh start like you just helicoptered-in and are seeing it for the first time. Ask yourself--would you even use those 2 drivers together in a 2-way designed to cover your bandwidth target for both, etc.
As always, this is all your call, but that's what I'd do if I were in your shoes. It'd just be so easy to get lost otherwise. I'll accept argument about the transferability to OB, but you might be uniquely qualified to make it after such an exercise. Yes the end-slopes might be different, but that, too, gets you working in a different realm and that has value, IMO. I still think it gets you a tiny bit of structure to investigate the blending--and whether it's even possible to suit your tastes with those parts. I assert that the process of the sims + the listening allows each impression to have a chance at meaning. If you can't hear up high and you're working up-high, enlist a helper. I've used my friends grandkids and they're ruthless

On matters of a particular instruments timbre/tone, I trust musicians more. FWIW & IMO.