Further work on adjusting attribute, method and parameter names to be

consistent and to comply with PEP 8 naming guidelines; also adjust
implementation of #385 (originally done in pull request #549) to use the
parameter name `bypass_decode` instead of `bypassencoding`.
This commit is contained in:
Anthony Tuininga 2021-04-23 16:05:42 -06:00
parent ab6e6f06ef
commit 96f938286d
7 changed files with 157 additions and 161 deletions

View File

@ -52,7 +52,7 @@ Cursor Object
The DB API definition does not define this attribute.
.. method:: Cursor.arrayvar(data_type, value, [size])
.. method:: Cursor.arrayvar(typ, value, [size])
Create an array variable associated with the cursor of the given type and
size and return a :ref:`variable object <varobj>`. The value is either an
@ -587,19 +587,19 @@ Cursor Object
The DB API definition does not define this attribute.
.. method:: Cursor.var(dataType, [size, arraysize, inconverter, outconverter, \
typename, encodingErrors, bypassencoding])
.. method:: Cursor.var(typ, [size, arraysize, inconverter, outconverter, \
typename, encoding_errors, bypass_encoding])
Create a variable with the specified characteristics. This method was
designed for use with PL/SQL in/out variables where the length or type
cannot be determined automatically from the Python object passed in or for
use in input and output type handlers defined on cursors or connections.
The dataType parameter specifies the type of data that should be stored in
the variable. This should be one of the
:ref:`database type constants <dbtypes>`, :ref:`DB API constants <types>`,
an object type returned from the method :meth:`Connection.gettype()` or one
of the following Python types:
The typ parameter specifies the type of data that should be stored in the
variable. This should be one of the :ref:`database type constants
<dbtypes>`, :ref:`DB API constants <types>`, an object type returned from
the method :meth:`Connection.gettype()` or one of the following Python
types:
.. list-table::
:header-rows: 1
@ -642,17 +642,29 @@ Cursor Object
specified when using type :data:`cx_Oracle.OBJECT` unless the type object
was passed directly as the first parameter.
The encodingErrors parameter specifies what should happen when decoding
The encoding_errors parameter specifies what should happen when decoding
byte strings fetched from the database into strings. It should be one of
the values noted in the builtin
`decode <https://docs.python.org/3/library/stdtypes.html#bytes.decode>`__
function.
The bypassencoding parameter, if specified, should be passed as
boolean. This feature allows results of database types CHAR, NCHAR,
LONG_STRING, NSTRING, STRING to be returned raw meaning cx_Oracle
won't do any decoding conversion. See
:ref:`Fetching raw data <fetching-raw-data>` for more information.
The bypass_encoding parameter, if specified, should be passed as a
boolean value. Passing a `True` value causes values of database types
:data:`~cx_Oracle.DB_TYPE_VARCHAR`, :data:`~cx_Oracle.DB_TYPE_CHAR`,
:data:`~cx_Oracle.DB_TYPE_NVARCHAR`, :data:`~cx_Oracle.DB_TYPE_NCHAR` and
:data:`~cx_Oracle.DB_TYPE_LONG` to be returned as `bytes` instead of `str`,
meaning that cx_Oracle doesn't do any decoding. See :ref:`Fetching raw
data <fetching-raw-data>` for more information.
.. versionadded:: 8.2
The parameter `bypass_encoding` was added.
.. versionchanged:: 8.2
For consistency and compliance with the PEP 8 naming style, the
parameter `encodingErrors` was renamed to `encoding_errors`. The old
name will continue to work as a keyword parameter for a period of time.
.. note::

View File

@ -68,6 +68,8 @@ if applicable. The most recent deprecations are listed first.
- Replace with parameter name `keyword_parameters`
* - `keywordParameters` parameter to :meth:`Cursor.callproc()`
- Replace with parameter name `keyword_parameters`
* - `encodingErrors` parameter to :meth:`Cursor.var()`
- Replace with parameter name `encoding_errors`
* - `Cursor.fetchraw()`
- Replace with :meth:`Cursor.fetchmany()`
* - `Queue.deqMany`

View File

@ -26,6 +26,12 @@ Version 8.2 (TBD)
:meth:`cx_Oracle.SessionPool()` in order to permit specifying the size of
the statement cache during the creation of pools and standalone
connections.
#) Added parameter `bypass_decode` to :meth:`Cursor.var()` in order to allow
the `decode` step to be bypassed when converting data from Oracle Database
into Python strings
(`issue 385 <https://github.com/oracle/python-cx_Oracle/issues/385>`__).
Initial work was done in `PR 549
<https://github.com/oracle/python-cx_Oracle/pull/549>`__.
#) Threaded mode is now always enabled when creating connection pools with
:meth:`cx_Oracle.SessionPool()`. Any `threaded` parameter value is ignored.
#) Eliminated a memory leak when calling :meth:`SodaOperation.filter()` with a

View File

@ -288,7 +288,7 @@ or the value ``None``. The value ``None`` indicates that the default type
should be used.
Examples of output handlers are shown in :ref:`numberprecision`,
:ref:`directlobs` and :ref:`fetching-raw-data`. Also see samples such as `samples/TypeHandlers.py
:ref:`directlobs` and :ref:`fetching-raw-data`. Also see samples such as `samples/type_handlers.py
<https://github.com/oracle/python-cx_Oracle/blob/master/samples/type_handlers.py>`__
.. _numberprecision:
@ -347,82 +347,73 @@ See `samples/return_numbers_as_decimals.py
.. _fetching-raw-data:
Fetching Raw Data
---------------------
-----------------
Sometimes cx_Oracle may have problems converting data to unicode and you may
want to inspect the problem closer rather than auto-fix it using the
encodingerrors parameter. This may be useful when a database contains
records or fields that are in a wrong encoding altogether.
Sometimes cx_Oracle may have problems converting data stored in the database to
Python strings. This can occur if the data stored in the database doesn't match
the character set defined by the database. The `encoding_errors` parameter to
:meth:`Cursor.var()` permits the data to be returned with some invalid data
replaced, but for additional control the parameter `bypass_decode` can be set
to `True` and cx_Oracle will bypass the decode step and return `bytes` instead
of `str` for data stored in the database as strings. The data can then be
examined and corrected as required. This approach should only be used for
troubleshooting and correcting invalid data, not for general use!
It is not recommended to use mixed encodings in databases.
This functionality is aimed at troubleshooting databases
that have inconsistent encodings for external reasons.
For these cases, you can pass in the in additional keyword argument
``bypassencoding = True`` into :meth:`Cursor.var()`. This needs
to be used in combination with :ref:`outputtypehandlers`
The following sample demonstrates how to use this feature:
.. code-block:: python
#defining output type handlers method
def ConvertStringToBytes(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.STRING:
return cursor.var(str, arraysize=cursor.arraysize, bypassencoding = True)
# define output type handler
def return_strings_as_bytes(cursor, name, default_type, size,
precision, scale):
if default_type == cx_Oracle.DB_TYPE_VARCHAR:
return cursor.var(str, arraysize=cursor.arraysize,
bypass_decode=True)
#set cursor outputtypehandler to the method above
cursor = connection.cursor()
ursor.outputtypehandler = ConvertStringToBytes
# set output type handler on cursor before fetching data
with connection.cursor() as cursor:
cursor.outputtypehandler = return_strings_as_bytes
cursor.execute("select content, charset from SomeTable")
data = cursor.fetchall()
This will produce output as::
[(b'Fianc\xc3\xa9', b'UTF-8')]
This will allow you to receive data as raw bytes.
Note that last \xc3\xa9 is é in UTF-8. Since this is valid UTF-8 you can then
perform a decode on the data (the part that was bypassed):
.. code-block:: python
statement = cursor.execute("select content, charset from SomeTable")
data = statement.fetchall()
value = data[0][0].decode("UTF-8")
This will return the value "Fiancé".
This will produce output as:
If you want to save ``b'Fianc\xc3\xa9'`` into the database directly without
using a Python string, you will need to create a variable using
:meth:`Cursor.var()` that specifies the type as
:data:`~cx_Oracle.DB_TYPE_VARCHAR` (otherwise the value will be treated as
:data:`~cx_Oracle.DB_TYPE_RAW`). The following sample demonstrates this:
.. code-block:: python
[(b'Fianc\xc3\xa9', b'UTF-8')]
Note that last \xc3\xa9 is é in UTF-8. Then in you can do following:
.. code-block:: python
import codecs
# data = [(b'Fianc\xc3\xa9', b'UTF-8')]
unicodecontent = data[0][0].decode(data[0][1].decode()) # Assuming your charset encoding is UTF-8
This will revert it back to "Fiancé".
If you want to save ``b'Fianc\xc3\xa9'`` to database you will need to create
:meth:`Cursor.var()` that will tell cx_Oracle that the value is indeed
intended as a string:
.. code-block:: python
connection = cx_Oracle.connect("hr", userpwd, "dbhost.example.com/orclpdb1")
cursor = connection.cursor()
cursorvariable = cursor.var(cx_Oracle.STRING)
cursorvariable.setvalue(0, "Fiancé".encode("UTF-8")) # b'Fianc\xc4\x9b'
cursor.execute("update SomeTable set SomeColumn = :param where id = 1", param=cursorvariable)
At that point, the bytes will be assumed to be in the correct encoding and should insert as you expect.
with cx_Oracle.connect(user="hr", password=userpwd,
dsn="dbhost.example.com/orclpdb1") as conn:
with conn.cursor() cursor:
var = cursor.var(cx_Oracle.DB_TYPE_VARCHAR)
var.setvalue(0, b"Fianc\xc4\x9b")
cursor.execute("""
update SomeTable set
SomeColumn = :param
where id = 1""",
param=var)
.. warning::
This functionality is "as-is": when saving strings like this,
the bytes will be assumed to be in the correct encoding and will
insert like that. Proper encoding is the responsibility of the user and
no correctness of any data in the database can be assumed
to exist by itself.
The database will assume that the bytes provided are in the character set
expected by the database so only use this for troubleshooting or as
directed.
.. _outconverters:

View File

@ -1,75 +0,0 @@
# -*- coding: utf-8 -*-
import cx_Oracle
import sample_env
"The test below verifies that the option to work around saving and reading of inconsistent encodings works"
def ConvertStringToBytes(cursor, name, defaultType, size, precision, scale):
if defaultType == cx_Oracle.STRING:
return cursor.var(str, arraysize=cursor.arraysize, bypassencoding = True)
connection = cx_Oracle.connect(sample_env.get_main_connect_string())
cursor = connection.cursor()
cursor.outputtypehandler = ConvertStringToBytes
sql = 'create table EncodingExperiment (content varchar2(100), encoding varchar2(15))'
print('Creating experiment table')
try:
cursor.execute(sql)
print('Success, will attempt to add records')
except Exception as err:
# table already exists
print('%s\n%s'%(err, 'EncodingExperiment table exists... Will attempt to add records'))
# variable that we will test encodings against
unicode_string = 'I bought a cafetière on the Champs-Élysées'
# First test
windows_1252_encoded = unicode_string.encode('windows-1252')
# Second test
utf8_encoded = unicode_string.encode('utf-8')
sqlparameters = [(windows_1252_encoded, 'windows-1252'), (utf8_encoded, 'utf-8')]
sql = 'insert into EncodingExperiment (content, encoding) values (:content, :encoding)'
# cx_Oracle string variable in which we will store byte value and insert it as such
content_variable = cursor.var(cx_Oracle.STRING)
print('Adding records to the table: "EncodingExperiment"')
for sqlparameter in sqlparameters:
content, encoding = sqlparameter
# setting content_variable value to a byte value and instert it as such
content_variable.setvalue(0, content)
cursor.execute(sql, content=content_variable, encoding=encoding)
sql = 'select * from EncodingExperiment'
print('Fetching records from table EncodingExperiment')
result = cursor.execute(sql).fetchall()
for dataset in result:
content, encoding = dataset[0], dataset[1].decode()
decodedcontent = content.decode(encoding)
print('Is "%s" == "%s" ?\nResult: %s, (decoded from: %s)'%(decodedcontent, unicode_string, decodedcontent == unicode_string, encoding))
print('Finished testing, will attempt to drop the table "EncodingExperiment"')
# drop table after finished testing
sql = 'drop table EncodingExperiment'
try:
cursor.execute(sql)
print('Successfully droped table "EncodingExperiment" from database.')
except Exception as err:
print('Failed to drop table from the database, info: %s'%err)

View File

@ -0,0 +1,49 @@
#------------------------------------------------------------------------------
# Copyright (c) 2021, Oracle and/or its affiliates. All rights reserved.
#------------------------------------------------------------------------------
#------------------------------------------------------------------------------
# query_strings_as_bytes.py
#
# Demonstrates how to query strings as bytes (bypassing decoding of the bytes
# into a Python string). This can be useful when attempting to fetch data that
# was stored in the database in the wrong encoding.
#
# This script requires cx_Oracle 8.2 and higher.
#------------------------------------------------------------------------------
import cx_Oracle as oracledb
import sample_env
STRING_VAL = 'I bought a cafetière on the Champs-Élysées'
def return_strings_as_bytes(cursor, name, default_type, size, precision,
scale):
if default_type == oracledb.DB_TYPE_VARCHAR:
return cursor.var(str, arraysize=cursor.arraysize, bypass_decode=True)
with oracledb.connect(sample_env.get_main_connect_string()) as conn:
# truncate table and populate with our data of choice
with conn.cursor() as cursor:
cursor.execute("truncate table TestTempTable")
cursor.execute("insert into TestTempTable values (1, :val)",
val=STRING_VAL)
conn.commit()
# fetch the data normally and show that it is returned as a string
with conn.cursor() as cursor:
cursor.execute("select IntCol, StringCol from TestTempTable")
print("Data fetched using normal technique:")
for row in cursor:
print(row)
print()
# fetch the data, bypassing the decode and show that it is returned as
# bytes
with conn.cursor() as cursor:
cursor.outputtypehandler = return_strings_as_bytes
cursor.execute("select IntCol, StringCol from TestTempTable")
print("Data fetched using bypass decode technique:")
for row in cursor:
print(row)

View File

@ -1809,27 +1809,39 @@ static PyObject *cxoCursor_setOutputSize(cxoCursor *cursor, PyObject *args)
static PyObject *cxoCursor_var(cxoCursor *cursor, PyObject *args,
PyObject *keywordArgs)
{
static char *keywordList[] = { "type", "size", "arraysize",
"inconverter", "outconverter", "typename", "encodingErrors", "bypassencoding",
NULL };
static char *keywordList[] = { "typ", "size", "arraysize", "inconverter",
"outconverter", "typename", "encoding_errors", "bypass_decode",
"encodingErrors", NULL };
Py_ssize_t encodingErrorsLength, encodingErrorsDeprecatedLength;
const char *encodingErrors, *encodingErrorsDeprecated;
PyObject *inConverter, *outConverter, *typeNameObj;
Py_ssize_t encodingErrorsLength;
int size, arraySize, bypassDecode;
cxoTransformNum transformNum;
const char *encodingErrors;
cxoObjectType *objType;
int size, arraySize, bypassEncoding;
PyObject *type;
cxoVar *var;
// parse arguments
size = bypassEncoding = 0;
encodingErrors = NULL;
size = bypassDecode = 0;
arraySize = cursor->bindArraySize;
encodingErrors = encodingErrorsDeprecated = NULL;
inConverter = outConverter = typeNameObj = NULL;
if (!PyArg_ParseTupleAndKeywords(args, keywordArgs, "O|iiOOOz#p",
if (!PyArg_ParseTupleAndKeywords(args, keywordArgs, "O|iiOOOz#pz#",
keywordList, &type, &size, &arraySize, &inConverter, &outConverter,
&typeNameObj, &encodingErrors, &encodingErrorsLength, &bypassEncoding))
&typeNameObj, &encodingErrors, &encodingErrorsLength,
&bypassDecode, &encodingErrorsDeprecated,
&encodingErrorsDeprecatedLength))
return NULL;
if (encodingErrorsDeprecated) {
if (encodingErrors) {
cxoError_raiseFromString(cxoProgrammingErrorException,
"encoding_errors and encodingErrors cannot both be "
"specified");
return NULL;
}
encodingErrors = encodingErrorsDeprecated;
encodingErrorsLength = encodingErrorsDeprecatedLength;
}
// determine the type of variable
if (cxoTransform_getNumFromType(type, &transformNum, &objType) < 0)
@ -1861,10 +1873,9 @@ static PyObject *cxoCursor_var(cxoCursor *cursor, PyObject *args,
strcpy((char*) var->encodingErrors, encodingErrors);
}
// Flag that manually changes transform type to bytes
if (bypassEncoding) {
// if the decode step is to be bypassed, use the binary transform instead
if (bypassDecode)
var->transformNum = CXO_TRANSFORM_BINARY;
}
return (PyObject*) var;
}