Difference between DBA_TABLES.num_rows and count(*)

One thing DBAs regularly do is to gather information on the schemas in the database for reporting. So we want to know how many rows there are in the tables of the schema “SIMON”. When doing so, one is tempted to query the DBA_TABLES view:

SQL> SELECT table_name, tablespace_name, num_rows
  2  FROM dba_tables WHERE owner='SIMON' ORDER BY num_rows DESC;

TABLE_NAME                     TABLESPACE_NAME                  NUM_ROWS
------------------------------ ------------------------------ ----------
HUGE_TABLE                     DZ_DATA01                         8227990
[..]

However, if we query the table directly, we get a different result:

SQL> SELECT count(*) FROM simon.HUGE_TABLE;
COUNT(*) ---------- 8230310

Why is that so? When reading through the description of the ALL_TABLES view, one finds the following note:

Columns marked with an asterisk (*) are populated only if you collect statistics on the table with the ANALYZE statement or the DBMS_STATS package.

NUM_ROWS just happens to be one of those columns. This means that this value is updated by DBMS_STATS. So it does not contain the current number of rows in the table but an approximation calculated the last time DBMS_STATS ran.

So if we want to update the value in the DBA_TABLES view (because it is more convenient for our report to use this view instead of querying every table), we just have a look at the Oracle documentation for the DBMS_STATS package and find that the following command does what we want:

SQL> exec dbms_stats.gather_schema_stats(ownname => 'SIMON');

Note that by default, Oracle sets the “estimate_percent” parameter of the gather_schema_stats procedure to DBMS_STATS.AUTO_SAMPLE_SIZE, which means that Oracle will approximate the number of rows in most cases. This means that the actual number of rows is estimated and not calculated. Please refer to the Oracle documentation for more information about the parameters of this procedure.

So after we gathered the stats (I executed gather_schema_stats with the “estimate_percent => NULL” parameter for the exact number of rows), lets have a look at the DBA_TABLES view again:

SQL> SELECT table_name, tablespace_name, num_rows
  2  FROM dba_tables WHERE owner='SIMON' ORDER BY num_rows DESC;

TABLE_NAME                     TABLESPACE_NAME                  NUM_ROWS
------------------------------ ------------------------------ ----------
HUGE_TABLE                     DZ_DATA01                         8230310

VoilĂ  , now we can use this value for reporting!

Hello world

My name is Simon Krenger, I am a Technical Account Manager (TAM) at Red Hat. I advise our customers in using Kubernetes, Containers, Linux and Open Source.

Elsewhere

  1. GitHub
  2. LinkedIn
  3. GitLab