>>> so here for each row in tab2 it is trying to find the matching record in tab1 - right?
Yep
>>> is that more likely as tables without indexes are not generally considered optimal as the inner table?
It doesn't really matter for this example, but remember there are multiple join types and also that columns can have statistics without the need for an index (if they have been manually created), but yes, you are 'probably' correct.
>>> why??? if tab1 has a histogram, then the number of matching rows for each key value from tab2 in tab1 can be easily determined either from frequency weight or range weight or frequency density from the histogram of tab1?
I think this is the bit I need to be clearer on. We're talking about compilation here, compilation takes place before execution.
At compilation time, it has to pick a query plan to pass to the execution phase. At compilation time (for the first example I gave with no stats for tab2) it has no knowledge of what the values are in tab2, so when it comes to picking a query plan it has to assume that the rows in tab2 could have any value for col1. By that I mean ANY value, it could contain any value legal for the data type, it's not restricted to the values in the histogram for tab1.col1.
If they have any value at all, then it then the best guess for each possible value is the total density.
The estimates originate during compilation, they are not amended during execution when the values in tab2 become known, those values becomes the actual values.
Even with a histogram for tab1.col1 this is of no help at all. Take the below, how would you think it could estimate a row count using the histogram?
tab2.col1 tab1.col1
Rows=1000 Rows=1000, total density = 0.38
(no histogram) 1 = 0.5
3 = 0.2
6 = 0.3