python-oracledb/samples/sample_notebooks/2-Queries.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![Banner](images/banner.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Queries - SELECT and WITH Statements"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import oracledb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "un       = \"pythondemo\"\n",
    "pw       = \"welcome\"\n",
    "\n",
    "adminun  = \"system\"\n",
    "adminpw  = \"oracle\"\n",
    "\n",
    "cs       = \"localhost/orclpdb1\"\n",
    "\n",
    "connection = oracledb.connect(user=un, password=pw, dsn=cs)\n",
    "\n",
    "systemconnection = oracledb.connect(user=adminun, password=adminpw, dsn=cs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fetching Rows\n",
    "\n",
    "Documentation reference link: [Fetch Methods](https://python-oracledb.readthedocs.io/en/latest/user_guide/sql_execution.html#fetch-methods)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fetching single rows\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    \n",
    "    cursor.execute(\"select * from SampleQueryTab where id = :1\", [6])\n",
    "    \n",
    "    row = cursor.fetchone()\n",
    "    print(row)\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fetching all rows from a small table of not too big size\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "       \n",
    "    cursor.execute(\"select * from SampleQueryTab\")\n",
    "    \n",
    "    rows = cursor.fetchall()\n",
    "    \n",
    "    for r in rows:\n",
    "        print(r)\n",
    "      "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Fetching all rows from a big table\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    " \n",
    "    cursor.execute(\"select * from SampleQueryTab\")  # pretend this is big\n",
    "    \n",
    "    while True:\n",
    "        rows = cursor.fetchmany()  # get a batch rows (of size cursor.arraysize=100)\n",
    "        for r in rows:\n",
    "            print(r)\n",
    "        if len(rows) < cursor.arraysize:\n",
    "            break\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A key tuning goal is to reduce \"round-trips\"\n",
    "\n",
    "\n",
    "> \"A server round-trip is defined as the trip from the client to the server and back to the client.\"\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "\n",
    "![Round-trip](images/roundtrip.png)\n",
    "\n",
    "Round-trips affect performance and system scalability.\n",
    "\n",
    "Make every round-trip useful:\n",
    "\n",
    "- fetch multiple rows at a time\n",
    "- insert multiple rows at a time\n",
    "- don't select more data than needed\n",
    "- don't overuse commit or rollback"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Round-trip comparison - prefetchrows & arraysize\n",
    "\n",
    "Documentation reference link: [Tuning python-oracledb](https://python-oracledb.readthedocs.io/en/latest/user_guide/tuning.html)\n",
    "\n",
    "Regardless of which fetch method is used there are two tuning parameters that affect internal buffering of fetched rows. This alters the number of round-trips required to fetch all the query data from the database.  These parameters do not affect how, or when, rows are returned to the application itself.\n",
    "\n",
    "Here's a demo counting round-trips:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "def get_session_id(connection):\n",
    "    sid, = connection.cursor().execute(\"SELECT sys_context('userenv', 'sid') FROM dual\").fetchone()\n",
    "    return sid\n",
    "\n",
    "def get_round_trips(systemconnection, sid):\n",
    "    sql = \"\"\"SELECT ss.value\n",
    "             FROM v$sesstat ss, v$statname sn\n",
    "             WHERE ss.sid = :sid\n",
    "             AND ss.statistic# = sn.statistic#\n",
    "             AND sn.name LIKE '%roundtrip%client%'\"\"\"  \n",
    "    roundtrips, = systemconnection.cursor().execute(sql, [sid]).fetchone()\n",
    "    return roundtrips\n",
    "\n",
    "sid = get_session_id(connection)\n",
    "\n",
    "def do_query(connection, numrows, prefetchrows, arraysize):\n",
    "\n",
    "    roundtrips = get_round_trips(systemconnection, sid)\n",
    "\n",
    "    with connection.cursor() as cursor:\n",
    "        cursor.prefetchrows = prefetchrows\n",
    "        cursor.arraysize    = arraysize\n",
    "        cursor.execute(\"select * from all_objects where rownum <= :nr\", [numrows])\n",
    "        rows = cursor.fetchall()\n",
    "\n",
    "    roundtrips = get_round_trips(systemconnection, sid) - roundtrips\n",
    "\n",
    "    print(\"Number of rows: {:5}, prefetchrows {:4}  arraysize {:4}  roundtrips {:3} \".format(len(rows), prefetchrows, arraysize, roundtrips))\n",
    "\n",
    "\n",
    "# do_query(connection, number of rows, prefetchrows, arraysize)\n",
    "\n",
    "print(\"Default prefetch & arraysize values:\")\n",
    "do_query(connection,                1,            2,       100)\n",
    "do_query(connection,              100,            2,       100)\n",
    "do_query(connection,             1000,            2,       100)\n",
    "do_query(connection,            10000,            2,       100)\n",
    "\n",
    "print(\"\\n'Unknown' large number of rows:\")\n",
    "do_query(connection,            10000,            2,      1000)\n",
    "do_query(connection,            10000,         1000,      1000)\n",
    "\n",
    "print(\"\\n'Page' of rows:\")\n",
    "do_query(connection,               20,           20,        20)\n",
    "do_query(connection,               20,           21,        20)\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Your tuning goal is to reduce round-trips (thus improving performance) without using too much internal buffer memory.\n",
    "\n",
    "When selecting a huge number of rows, tuning `arraysize` is important. In general there is no need to set `prefetchrows` in this scenario.\n",
    "\n",
    "When selecting a known, small number of rows such as for 'paging' through a result set, then set `prefetchrows` to be one larger than the number of rows returned."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Querying LOB Columns\n",
    "\n",
    "Documentation reference link: [Using CLOB and BLOB Data](https://python-oracledb.readthedocs.io/en/latest/user_guide/lob_data.html)\n",
    "\n",
    "When fetching LOBs there are two modes that can be used:\n",
    "- Fetching as a \"locator\" for streaming use\n",
    "- Fetching directly as a string or buffer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fetching LOBs for streaming\n",
    "\n",
    "- this is the default\n",
    "- it requires more round trips and has more overhead\n",
    "- it is useful when very long LOBs are being used"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with connection.cursor() as cursor:\n",
    "\n",
    "    id = 1\n",
    "    textData = \"The quick brown fox jumps over the lazy dog\"\n",
    "    cursor.execute(\"truncate table TestClobs\")\n",
    "    cursor.execute(\"insert into TestClobs (IntCol, ClobCol) values (:1, :2)\", [id, textData])\n",
    "\n",
    "    cursor.execute(\"select ClobCol from TestClobs where IntCol = :ic1\", [id])\n",
    "    c, = cursor.fetchone()\n",
    "    print(\"CLOB length:\", c.size())\n",
    "    print(\"CLOB data:\", c.read())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Fetching CLOBs as String\n",
    "\n",
    "- this is much faster than the streaming method\n",
    "- LOBs are limited to 1 GB (and obviously must all be in memory at once)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "oracledb.defaults.fetch_lobs = False\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    id = 1\n",
    "    cursor.execute(\"select ClobCol from TestClobs where IntCol = :ic2\", [id])\n",
    "    c, = cursor.fetchone()\n",
    "\n",
    "    print(\"CLOB length:\", len(c))\n",
    "    print(\"CLOB data:\", c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The same `defaults` setting will return BLOBs as buffers."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Sample benchmark\n",
    "A sample benchmark shows the performance benefit of querying using the direct String method compared with using the default locator stream method. \n",
    "![LOB Benchmark](images/lob-benchmark.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fetching Numbers as Decimals\n",
    "\n",
    "Oracle's NUMBER format is Decimal but Python uses floating point so there can be conversion artifacts.\n",
    "\n",
    "Documentation reference link:  [Fetched Number Precision](https://python-oracledb.readthedocs.io/en/latest/user_guide/sql_execution.html#fetched-number-precision)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with connection.cursor() as cursor:\n",
    "    cursor.execute(\"select 0.1 as d1 from dual\")\n",
    "    v, = cursor.fetchone()\n",
    "    print('Value =', v, '\\tValue * 3 =', v * 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Fetching as Decimal may be useful."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import decimal\n",
    "\n",
    "oracledb.defaults.fetch_decimals = True\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    cursor.execute(\"select 0.1 as d2 from dual\")\n",
    "    v, = cursor.fetchone()\n",
    "    print('Value =', v, '\\tValue * 3 =', v * 3)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multiple Values in \"WHERE ... IN\" Clauses\n",
    "\n",
    "Documentation reference link: [Binding Multiple Values to a SQL WHERE IN Clause](https://python-oracledb.readthedocs.io/en/latest/user_guide/bind.html#binding-multiple-values-to-a-sql-where-in-clause)\n",
    "\n",
    "### A fixed number of binds\n",
    "\n",
    "It's important to use bind variables for security.  For small numbers of binds this is simple.  One placeholder must be used for each value in the IN clause."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = \"\"\"select name\n",
    "         from SampleQueryTab\n",
    "         where id in (:id1, :id2, :id3, :id4, :id5)\n",
    "         order by id\"\"\"\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    cursor.execute(sql,\n",
    "        id1=5, id2=7, id3=1, id4=3, id5=2)\n",
    "    for row, in cursor:\n",
    "        print(row)       "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "If statements are re-executed but the number of values varies, then pass None (i.e. NULL) for 'missing' values.  This lets the database execute the same statement text as before, which helps performance and scalability. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "with connection.cursor() as cursor:\n",
    "    cursor.execute(sql,\n",
    "        id1=2, id2=4, id3=1, id4=None, id5=None)\n",
    "    for row, in cursor:\n",
    "        print(row) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "When using character data and the size of the values varies between statement execution, then using `setinputsizes()` with the column size or maximum expected input data size can help reduce the number of SQL 'versions' of the statement used by the database optimizer, although this is not often an issue and it's not uncommon to see SQL statements with tens or low hundreds of versions.\n",
    "\n",
    "For example with the query:\n",
    "```\n",
    "sql = \"\"\"select name\n",
    "         from SampleQueryTab\n",
    "         where name in (:n1, :n2, :n3, :n4, :n5)\n",
    "         order by id\"\"\"\n",
    "```\n",
    "the statement\n",
    "```\n",
    "cursor.setinputsizes(n1=20, n2=20, n3=20, n4=20, n5=20)\n",
    "```\n",
    "\n",
    "indicates that each value bound will be a string of no more than 20 characters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "sql = \"\"\"select name\n",
    "         from SampleQueryTab\n",
    "         where name in (:n1, :n2, :n3, :n4, :n5)\n",
    "         order by id\"\"\"\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    cursor.setinputsizes(n1=20, n2=20, n3=20, n4=20, n5=20)\n",
    "    cursor.execute(sql,\n",
    "        n1='Anthony', n2='Barbie', n3='Bogus', n4=None, n5=None)\n",
    "    for row, in cursor:\n",
    "        print(row) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dynamically creating the statement\n",
    "\n",
    "If the number of bind values is not known, and the statement is never rexecuted, you can consider dynamically creating a statement."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bind_values = [5, 7, 1, 3, 2, 6]\n",
    "bind_names = [\":\" + str(i + 1) for i in range(len(bind_values))]\n",
    "sql = \"\"\"select name from SampleQueryTab where id in (%s)\"\"\" % (\",\".join(bind_names))\n",
    "print(sql, \"\\n\")\n",
    "with connection.cursor() as cursor:\n",
    "    cursor.execute(sql, bind_values)\n",
    "    for row, in cursor:\n",
    "        print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Binding using an Oracle type \n",
    "\n",
    "One solution when matching a huge number of values is to use the SQL `table()` clause and an Oracle type.  This works for up to 32K values."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "type_obj = connection.gettype(\"SYS.ODCINUMBERLIST\")\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    obj = type_obj.newobject()\n",
    "    obj.extend([3, 4, 7])\n",
    "    cursor.execute(\"\"\"select name\n",
    "                      from SampleQueryTab\n",
    "                      where id in (select * from table(:1))\"\"\",\n",
    "                [obj])\n",
    "    for row, in cursor:\n",
    "        print(row)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The presupplied types `SYS.ODCIVARCHAR2LIST` or `SYS.ODCIDATELIST` can similarly be used for VARCHAR2 and DATE, respectively."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Accessing object type information takes a few round-trips so the earlier methods are better for small numbers of values. If you use this solution in a more complex statement, check the optimizer plan is still OK. "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}