python-oracledb/samples/sample_notebooks/4-CSV.ipynb

262 lines
6.7 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"id": "04c694cd",
"metadata": {},
"source": [
"![Banner](images/banner.png)"
]
},
{
"cell_type": "markdown",
"id": "b739ae65",
"metadata": {},
"source": [
"# Loading and Unloading Data: Working with Comma Separated Values (CSV) files"
]
},
{
"cell_type": "markdown",
"id": "2b027728",
"metadata": {},
"source": [
"CSV is not a well-defined standard! "
]
},
{
"cell_type": "markdown",
"id": "581e9486",
"metadata": {},
"source": [
"\"Unhelpful\" (that's a joke) suggestions for Python programmers:\n",
"- Don't use CSV files: Keep the data in the database.\n",
"- Don't use Excel - use Oracle APEX\n",
"- Use Oracle Data Pump to load CSV files into Oracle Database\n",
"<hr>"
]
},
{
"cell_type": "markdown",
"id": "6cb0a244",
"metadata": {},
"source": [
"Helpful suggestions:\n",
"- Python's [\"csv\" module](https://docs.python.org/3/library/csv.html) has extensive reading and writing support"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cda404e",
"metadata": {},
"outputs": [],
"source": [
"import oracledb"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1d9c084a",
"metadata": {},
"outputs": [],
"source": [
"un = \"pythondemo\"\n",
"pw = \"welcome\"\n",
"cs = \"localhost/orclpdb1\"\n",
"\n",
"connection = oracledb.connect(user=un, password=pw, dsn=cs)"
]
},
{
"cell_type": "markdown",
"id": "3ae66a62",
"metadata": {},
"source": [
"## Reading CSV Files and Inserting Data into Oracle Database\n",
"\n",
"Set up the schema:"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "ed1517dd",
"metadata": {},
"outputs": [],
"source": [
"with connection.cursor() as cursor:\n",
" try:\n",
" cursor.execute(\"drop table t\")\n",
" except:\n",
" ;\n",
"\n",
" cursor.execute(\"\"\"create table t (k number, \n",
" first_name varchar2(30), \n",
" last_name varchar2(30), \n",
" country varchar2(30))\"\"\")"
]
},
{
"cell_type": "markdown",
"id": "8dba72c1",
"metadata": {},
"source": [
"Data in the external CSV file looks like:\n",
"```\n",
"1,Fred,Nurke,UK\n",
"2,Henry,Crun,UK\n",
"```\n",
"\n",
"The Python csv module has extensive functionality. One sample is shown below. For python-oracledb users the important points are to use `executemany()` and send batches of rows to the database. Tuning in your environment will determine the best batch size."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bace97d8",
"metadata": {},
"outputs": [],
"source": [
"import csv\n",
"\n",
"# The batch size determines how many records are inserted at a time.\n",
"# Adjust the size to meet your memory and performance requirements.\n",
"batch_size = 10000\n",
"\n",
"with connection.cursor() as cursor:\n",
" \n",
" sql = \"insert into t (k, first_name, last_name, country) values (:1, :2, :3, :4)\"\n",
" \n",
" # Predefine memory areas to match the table definition (or max data) to avoid memory re-allocs\n",
" cursor.setinputsizes(None, 30, 30, 30)\n",
"\n",
" with open(\"csv/data1.csv\", \"r\") as csv_file:\n",
" csv_reader = csv.reader(csv_file, delimiter=',')\n",
" data = []\n",
" for line in csv_reader:\n",
" data.append((line[0], line[1], line[2], line[3])) # e.g [('1', 'Fred', 'Nurke', 'UK')]\n",
" if len(data) % batch_size == 0:\n",
" cursor.executemany(sql, data)\n",
" data = []\n",
" if data:\n",
" cursor.executemany(sql, data)\n",
" connection.commit()\n",
"\n",
"print(\"Done\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "4a0b4215",
"metadata": {},
"outputs": [],
"source": [
"# Check the results\n",
"\n",
"with connection.cursor() as cursor:\n",
" sql = \"select * from t order by k\"\n",
" for r in cursor.execute(sql):\n",
" print(r)"
]
},
{
"cell_type": "markdown",
"id": "7ff76bf5",
"metadata": {},
"source": [
"Tuning database features may also be beneficial. For example, disabling logging and/or indexes."
]
},
{
"cell_type": "markdown",
"id": "08f9bc4e",
"metadata": {},
"source": [
"## Writing CSV Files from Queried Data\n",
"\n",
"This example shows just one way to write CSV files. The important point for python-oracledb users is to tune `cursor.arraysize` for your data and network."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "bbc9db48",
"metadata": {},
"outputs": [],
"source": [
"import time\n",
"\n",
"sql = \"select * from all_objects where rownum <= 10000\"\n",
"\n",
"with connection.cursor() as cursor:\n",
"\n",
" start = time.time()\n",
"\n",
" cursor.arraysize = 1000\n",
"\n",
" with open(\"testwrite.csv\", \"w\", encoding=\"utf-8\") as outputfile:\n",
" writer = csv.writer(outputfile, lineterminator=\"\\n\")\n",
" results = cursor.execute(sql)\n",
" writer.writerows(results)\n",
"\n",
" elapsed = time.time() - start\n",
" print(\"Writing CSV: 10000 rows in {:06.4f} seconds\".format(elapsed)) "
]
},
{
"cell_type": "markdown",
"id": "7b3ddf5e",
"metadata": {},
"source": [
"If you change the arraysize and rerun the cell, the time taken may vary."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62a7ecfe",
"metadata": {},
"outputs": [],
"source": [
"# Confirm the number of lines in the output file is correct\n",
"\n",
"import os\n",
"\n",
"r = os.system(\"wc -l testwrite.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "e812be8c-339e-4f14-9b5b-1ea5163cee93",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.6"
}
},
"nbformat": 4,
"nbformat_minor": 5
}