262 lines
6.7 KiB
Plaintext
262 lines
6.7 KiB
Plaintext
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "04c694cd",
|
|
"metadata": {},
|
|
"source": [
|
|
""
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "b739ae65",
|
|
"metadata": {},
|
|
"source": [
|
|
"# Loading and Unloading Data: Working with Comma Separated Values (CSV) files"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "2b027728",
|
|
"metadata": {},
|
|
"source": [
|
|
"CSV is not a well-defined standard! "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "581e9486",
|
|
"metadata": {},
|
|
"source": [
|
|
"\"Unhelpful\" (that's a joke) suggestions for Python programmers:\n",
|
|
"- Don't use CSV files: Keep the data in the database.\n",
|
|
"- Don't use Excel - use Oracle APEX\n",
|
|
"- Use Oracle Data Pump to load CSV files into Oracle Database\n",
|
|
"<hr>"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "6cb0a244",
|
|
"metadata": {},
|
|
"source": [
|
|
"Helpful suggestions:\n",
|
|
"- Python's [\"csv\" module](https://docs.python.org/3/library/csv.html) has extensive reading and writing support"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "9cda404e",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import oracledb"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "1d9c084a",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"un = \"pythondemo\"\n",
|
|
"pw = \"welcome\"\n",
|
|
"cs = \"localhost/orclpdb1\"\n",
|
|
"\n",
|
|
"connection = oracledb.connect(user=un, password=pw, dsn=cs)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "3ae66a62",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Reading CSV Files and Inserting Data into Oracle Database\n",
|
|
"\n",
|
|
"Set up the schema:"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "ed1517dd",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"with connection.cursor() as cursor:\n",
|
|
" try:\n",
|
|
" cursor.execute(\"drop table t\")\n",
|
|
" except:\n",
|
|
" ;\n",
|
|
"\n",
|
|
" cursor.execute(\"\"\"create table t (k number, \n",
|
|
" first_name varchar2(30), \n",
|
|
" last_name varchar2(30), \n",
|
|
" country varchar2(30))\"\"\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "8dba72c1",
|
|
"metadata": {},
|
|
"source": [
|
|
"Data in the external CSV file looks like:\n",
|
|
"```\n",
|
|
"1,Fred,Nurke,UK\n",
|
|
"2,Henry,Crun,UK\n",
|
|
"```\n",
|
|
"\n",
|
|
"The Python csv module has extensive functionality. One sample is shown below. For python-oracledb users the important points are to use `executemany()` and send batches of rows to the database. Tuning in your environment will determine the best batch size."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bace97d8",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import csv\n",
|
|
"\n",
|
|
"# The batch size determines how many records are inserted at a time.\n",
|
|
"# Adjust the size to meet your memory and performance requirements.\n",
|
|
"batch_size = 10000\n",
|
|
"\n",
|
|
"with connection.cursor() as cursor:\n",
|
|
" \n",
|
|
" sql = \"insert into t (k, first_name, last_name, country) values (:1, :2, :3, :4)\"\n",
|
|
" \n",
|
|
" # Predefine memory areas to match the table definition (or max data) to avoid memory re-allocs\n",
|
|
" cursor.setinputsizes(None, 30, 30, 30)\n",
|
|
"\n",
|
|
" with open(\"csv/data1.csv\", \"r\") as csv_file:\n",
|
|
" csv_reader = csv.reader(csv_file, delimiter=',')\n",
|
|
" data = []\n",
|
|
" for line in csv_reader:\n",
|
|
" data.append((line[0], line[1], line[2], line[3])) # e.g [('1', 'Fred', 'Nurke', 'UK')]\n",
|
|
" if len(data) % batch_size == 0:\n",
|
|
" cursor.executemany(sql, data)\n",
|
|
" data = []\n",
|
|
" if data:\n",
|
|
" cursor.executemany(sql, data)\n",
|
|
" connection.commit()\n",
|
|
"\n",
|
|
"print(\"Done\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "4a0b4215",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Check the results\n",
|
|
"\n",
|
|
"with connection.cursor() as cursor:\n",
|
|
" sql = \"select * from t order by k\"\n",
|
|
" for r in cursor.execute(sql):\n",
|
|
" print(r)"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7ff76bf5",
|
|
"metadata": {},
|
|
"source": [
|
|
"Tuning database features may also be beneficial. For example, disabling logging and/or indexes."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "08f9bc4e",
|
|
"metadata": {},
|
|
"source": [
|
|
"## Writing CSV Files from Queried Data\n",
|
|
"\n",
|
|
"This example shows just one way to write CSV files. The important point for python-oracledb users is to tune `cursor.arraysize` for your data and network."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "bbc9db48",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"import time\n",
|
|
"\n",
|
|
"sql = \"select * from all_objects where rownum <= 10000\"\n",
|
|
"\n",
|
|
"with connection.cursor() as cursor:\n",
|
|
"\n",
|
|
" start = time.time()\n",
|
|
"\n",
|
|
" cursor.arraysize = 1000\n",
|
|
"\n",
|
|
" with open(\"testwrite.csv\", \"w\", encoding=\"utf-8\") as outputfile:\n",
|
|
" writer = csv.writer(outputfile, lineterminator=\"\\n\")\n",
|
|
" results = cursor.execute(sql)\n",
|
|
" writer.writerows(results)\n",
|
|
"\n",
|
|
" elapsed = time.time() - start\n",
|
|
" print(\"Writing CSV: 10000 rows in {:06.4f} seconds\".format(elapsed)) "
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"id": "7b3ddf5e",
|
|
"metadata": {},
|
|
"source": [
|
|
"If you change the arraysize and rerun the cell, the time taken may vary."
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "62a7ecfe",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": [
|
|
"# Confirm the number of lines in the output file is correct\n",
|
|
"\n",
|
|
"import os\n",
|
|
"\n",
|
|
"r = os.system(\"wc -l testwrite.csv\")"
|
|
]
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"id": "e812be8c-339e-4f14-9b5b-1ea5163cee93",
|
|
"metadata": {},
|
|
"outputs": [],
|
|
"source": []
|
|
}
|
|
],
|
|
"metadata": {
|
|
"kernelspec": {
|
|
"display_name": "Python 3 (ipykernel)",
|
|
"language": "python",
|
|
"name": "python3"
|
|
},
|
|
"language_info": {
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"file_extension": ".py",
|
|
"mimetype": "text/x-python",
|
|
"name": "python",
|
|
"nbconvert_exporter": "python",
|
|
"pygments_lexer": "ipython3",
|
|
"version": "3.9.6"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 5
|
|
}
|