python-oracledb/samples/sample_notebooks/4-CSV.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "04c694cd",
   "metadata": {},
   "source": [
    "![Banner](images/banner.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b739ae65",
   "metadata": {},
   "source": [
    "# Loading and Unloading Data: Working with Comma Separated Values (CSV) files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2b027728",
   "metadata": {},
   "source": [
    "CSV is not a well-defined standard! "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "581e9486",
   "metadata": {},
   "source": [
    "\"Unhelpful\" (that's a joke) suggestions for Python programmers:\n",
    "- Don't use CSV files: Keep the data in the database.\n",
    "- Don't use Excel - use Oracle APEX\n",
    "- Use Oracle Data Pump to load CSV files into Oracle Database\n",
    "<hr>"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6cb0a244",
   "metadata": {},
   "source": [
    "Helpful suggestions:\n",
    "- Python's [\"csv\" module](https://docs.python.org/3/library/csv.html) has extensive reading and writing support"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9cda404e",
   "metadata": {},
   "outputs": [],
   "source": [
    "import oracledb"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "1d9c084a",
   "metadata": {},
   "outputs": [],
   "source": [
    "un = \"pythondemo\"\n",
    "pw = \"welcome\"\n",
    "cs = \"localhost/orclpdb1\"\n",
    "\n",
    "connection = oracledb.connect(user=un, password=pw, dsn=cs)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3ae66a62",
   "metadata": {},
   "source": [
    "## Reading CSV Files and Inserting Data into Oracle Database\n",
    "\n",
    "Set up the schema:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ed1517dd",
   "metadata": {},
   "outputs": [],
   "source": [
    "with connection.cursor() as cursor:\n",
    "    try:\n",
    "        cursor.execute(\"drop table t\")\n",
    "    except:\n",
    "        ;\n",
    "\n",
    "    cursor.execute(\"\"\"create table t (k number, \n",
    "                                      first_name varchar2(30), \n",
    "                                      last_name varchar2(30), \n",
    "                                      country varchar2(30))\"\"\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8dba72c1",
   "metadata": {},
   "source": [
    "Data in the external CSV file looks like:\n",
    "```\n",
    "1,Fred,Nurke,UK\n",
    "2,Henry,Crun,UK\n",
    "```\n",
    "\n",
    "The Python csv module has extensive functionality.  One sample is shown below.  For python-oracledb users the important points are to use `executemany()` and send batches of rows to the database.  Tuning in your environment will determine the best batch size."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bace97d8",
   "metadata": {},
   "outputs": [],
   "source": [
    "import csv\n",
    "\n",
    "# The batch size determines how many records are inserted at a time.\n",
    "# Adjust the size to meet your memory and performance requirements.\n",
    "batch_size = 10000\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    \n",
    "    sql = \"insert into t (k, first_name, last_name, country) values (:1, :2, :3, :4)\"\n",
    "    \n",
    "    # Predefine memory areas to match the table definition (or max data) to avoid memory re-allocs\n",
    "    cursor.setinputsizes(None, 30, 30, 30)\n",
    "\n",
    "    with open(\"csv/data1.csv\", \"r\") as csv_file:\n",
    "        csv_reader = csv.reader(csv_file, delimiter=',')\n",
    "        data = []\n",
    "        for line in csv_reader:\n",
    "            data.append((line[0], line[1], line[2], line[3]))   # e.g [('1', 'Fred', 'Nurke', 'UK')]\n",
    "            if len(data) % batch_size == 0:\n",
    "                cursor.executemany(sql, data)\n",
    "                data = []\n",
    "        if data:\n",
    "            cursor.executemany(sql, data)\n",
    "        connection.commit()\n",
    "\n",
    "print(\"Done\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4a0b4215",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Check the results\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "    sql = \"select * from t order by k\"\n",
    "    for r in cursor.execute(sql):\n",
    "        print(r)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7ff76bf5",
   "metadata": {},
   "source": [
    "Tuning database features may also be beneficial. For example, disabling logging and/or indexes."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "08f9bc4e",
   "metadata": {},
   "source": [
    "## Writing CSV Files from Queried Data\n",
    "\n",
    "This example shows just one way to write CSV files.  The important point for python-oracledb users is to tune `cursor.arraysize` for your data and network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "bbc9db48",
   "metadata": {},
   "outputs": [],
   "source": [
    "import time\n",
    "\n",
    "sql = \"select * from all_objects where rownum <= 10000\"\n",
    "\n",
    "with connection.cursor() as cursor:\n",
    "\n",
    "    start = time.time()\n",
    "\n",
    "    cursor.arraysize = 1000\n",
    "\n",
    "    with open(\"testwrite.csv\", \"w\", encoding=\"utf-8\") as outputfile:\n",
    "        writer = csv.writer(outputfile, lineterminator=\"\\n\")\n",
    "        results = cursor.execute(sql)\n",
    "        writer.writerows(results)\n",
    "\n",
    "    elapsed = time.time() - start\n",
    "    print(\"Writing CSV: 10000 rows in {:06.4f} seconds\".format(elapsed))  "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7b3ddf5e",
   "metadata": {},
   "source": [
    "If you change the arraysize and rerun the cell, the time taken may vary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "62a7ecfe",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Confirm the number of lines in the output file is correct\n",
    "\n",
    "import os\n",
    "\n",
    "r = os.system(\"wc -l testwrite.csv\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e812be8c-339e-4f14-9b5b-1ea5163cee93",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}