Feed aggregator

Email Spoofing

Yann Neuhaus - Mon, 2019-07-15 11:07

Have you ever had this unhealthy sensation of being accused of facts that do not concern you? To feel helpless in the face of an accusing mail, which, because of its imperative and accusing tone, has the gift of throwing us the opprobrium?

This is the purpose of this particular kind of sextortion mail that uses spoofing, to try to extort money from you. A message from a supposed “hacker” who claims to have hacked into your computer. He threatens you with publishing compromising images taken without your knowledge with your webcam and asks you for a ransom in virtual currency most of the time.

Something like that:

 

Date:  Friday, 24 May 2019 at 09:19 UTC+1
Subject: oneperson
Your account is hacked! Renew the pswd immediately!
You do not heard about me and you are definitely wondering why you’re receiving this particular electronic message, proper?
I’m ahacker who exploitedyour emailand digital devicesnot so long ago.
Do not waste your time and make an attempt to communicate with me or find me, it’s not possible, because I directed you a letter from YOUR own account that I’ve hacked.
I have started malware to the adult vids (porn) site and suppose that you watched this website to enjoy it (you understand what I mean).
Whilst you have been keeping an eye on films, your browser started out functioning like a RDP (Remote Control) that have a keylogger that gave me authority to access your desktop and camera.
Then, my softaquiredall data.
You have entered passcodes on the online resources you visited, I intercepted all of them.
Of course, you could possibly modify them, or perhaps already modified them.
But it really doesn’t matter, my app updates needed data regularly.
And what did I do?
I generated a reserve copy of every your system. Of all files and personal contacts.
I have managed to create dual-screen record. The 1 screen displays the clip that you were watching (you have a good taste, ha-ha…), and the second part reveals the recording from your own webcam.
What exactly must you do?
So, in my view, 1000 USD will be a reasonable amount of money for this little riddle. You will make the payment by bitcoins (if you don’t understand this, search “how to purchase bitcoin” in Google).
My bitcoin wallet address:
1816WoXDtSmAM9a4e3HhebDXP7DLkuaYAd
(It is cAsE sensitive, so copy and paste it).
Warning:
You will have 2 days to perform the payment. (I built in an exclusive pixel in this message, and at this time I understand that you’ve read through this email).
To monitorthe reading of a letterand the actionsin it, I utilizea Facebook pixel. Thanks to them. (Everything thatis usedfor the authorities may helpus.)

In the event I do not get bitcoins, I shall undoubtedly give your video to each of your contacts, along with family members, colleagues, etc?

 

Users who are victims of these scams receive a message from a stranger who presents himself as a hacker. This alleged “hacker” claims to have taken control of his victim’s computer following consultation of a pornographic site (or any other site that morality would condemn). The cybercriminal then announces having compromising videos of the victim made with his webcam. He threatens to publish them to the victim’s personal or even professional contacts if the victim does not pay him a ransom. This ransom, which ranges from a few hundred to several thousand dollars, is claimed in a virtual currency (usually in Bitcoin but not only).

To scare the victim even more, cybercriminals sometimes go so far as to write to the victim with his or her own email address, in order to make him or her believe that they have actually taken control of his or her account. 

First of all, there is no need to be afraid of it. Indeed, if the “piracy” announced by cybercriminals is not in theory impossible to achieve, in practice, it remains technically complex and above all time-consuming to implement. Since scammers target their victims by the thousands, it can be deduced that they would not have the time to do what they claim to have done. 

These messages are just an attempt at a scam. In other words, if you receive such a blackmail message and do not pay, nothing more will obviously happen. 

Then, no need to change your email credentials. Your email address is usually something known and already circulates on the Internet because you use it regularly on different sites to identify and communicate. These sites have sometimes resold or exchanged their address files with different partners more or less scrupulous in marketing objectives.

If cybercriminals have finally written to you with your own email address to make you believe that they have taken control of it: be aware that the sender’s address in a message is just a simple display that can very easily be usurped without having to have a lot of technical skills. 

In any case, the way to go is simple: don’t panic, don’t answer, don’t pay, just throw this mail in the trash (and don’t forget to empty it regularly). 

On the mail server side, setting up certain elements can help to prevent this kind of mail from spreading in the organization. This involves deploying the following measures on your mail server:

  •       SPF (Sender Policy Framework): This is a standard for verifying the domain name of the sender of an email (standardized in RFC 7208 [1]). The adoption of this standard is likely to reduce spam. It is based on the SMTP (Simple Mail Transfer Protocol) which does not provide a sender verification mechanism. SPF aims to reduce the possibility of spoofing by publishing a record in the DNS (Domain Name Server) indicating which IP addresses are allowed or forbidden to send mail for the domain in question.
  •         DKIM (DomainKeys Identified Mail): This is a reliable authentication standard for the domain name of the sender of an email that provides effective protection against spam and phishing (standardized in RFC 6376 [2]). DKIM works by cryptographic signature, verifies the authenticity of the sending domain and also guarantees the integrity of the message.
  •       DMARC (Domain-based Message Authentication, Reporting and Conformance): This is a technical specification to help reduce email misuse by providing a solution for deploying and monitoring authentication issues (standardized in RFC 7489 [3]). DMARC standardizes the way how recipients perform email authentication using SPF and DKIM mechanisms.

 

REFERENCES

[1] S. Kitterman, “Sender Policy Framework (SPF),” ser. RFC7208, 2014, https://tools.ietf.org/html/rfc7208

[2] D. Crocker, T. Hansen, M. Kucherawy, “DomainKeys Identified Mail (DKIM) Signatures” ser. RFC6376, 2011,  https://tools.ietf.org/html/rfc6376

[3] M. Kuchewary, E. Zwicky, “Domain-based Message Authentication, Reporting and Conformance (DMARC)”, ser. RFC7489, 2015, https://tools.ietf.org/html/rfc7489

Cet article Email Spoofing est apparu en premier sur Blog dbi services.

Migrating your users from md5 to scram authentication in PostgreSQL

Yann Neuhaus - Thu, 2019-07-11 03:43

One of the new features in PostgreSQL 10 was the introduction of stronger password authentication based on SCRAM-SHA-256. How can you migrate your existing users that currently use md5 authentication to the new method without any interruption? Actually that is quite easy, as you will see in a few moments, but there is one important point to consider: Not every client/driver does already support SCRAM-SHA-256 authentication so you need to check that before. Here is the list of the drivers and their support for SCRAM-SHA-256.

The default method that PostgreSQL uses to encrypt password is defined by the “password_encryption” parameter:

postgres=# show password_encryption;
 password_encryption 
---------------------
 md5
(1 row)

Let’s assume we have a user that was created like this in the past:

postgres=# create user u1 login password 'u1';
CREATE ROLE

With the default method of md5 the hashed password looks like this:

postgres=# select passwd from pg_shadow where usename = 'u1';
               passwd                
-------------------------------------
 md58026a39c502750413402a90d9d8bae3c
(1 row)

As you can see the hash starts with md5 so we now that this hash was generated by the md5 algorithm. When we want this user to use scram-sha-256 instead, what do we need to do? The first step is to change the “password_encryption” parameter:

postgres=# alter system set password_encryption = 'scram-sha-256';
ALTER SYSTEM
postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
postgres=# select current_setting('password_encryption');
 current_setting 
-----------------
 scram-sha-256
(1 row)

From now on the server will use scram-sha-256 and not anymore md5. But what happens when our user wants to connect to the instance once we changed that? Currently this is defined in pg_hba.conf:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         md5

Even though the default is not md5 anymore the user can still connect to the instance because the password hash did not change for that user:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         md5

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> 

Once the user changed the password:

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> \password
Enter new password: 
Enter it again: 
postgres=> 

… the hash of the new password is not md5 but SCRAM-SHA-256:

postgres=# select passwd from pg_shadow where usename = 'u1';
                                                                passwd                               >
----------------------------------------------------------------------------------------------------->
 SCRAM-SHA-256$4096:CypPmOW5/uIu4NvGJa+FNA==$PNGhlmRinbEKaFoPzi7T0hWk0emk18Ip9tv6mYIguAQ=:J9vr5CQDuKE>
(1 row)

One could expect that from now on the user is not able to connect anymore as we did not change pg_hba.conf until now:

postgres@rhel8pg:/home/postgres/ [PGDEV] psql -h 192.168.22.100 -p 5433 -U u1 postgres
Password for user u1: 
psql (13devel)
Type "help" for help.

postgres=> 

But in reality that still works as the server now uses the SCRAM-SHA-256 algorithm. So once all the users changed their passwords you can safely switch the rule in pg_hba.conf and you’re done:

postgres=> \! grep u1 $PGDATA/pg_hba.conf
host    postgres        u1              192.168.22.1/24         scram-sha-256

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

You just need to make sure that all the users do not have a hash starting with md5 but the new one starting with SCRAM-SHA-256.

Cet article Migrating your users from md5 to scram authentication in PostgreSQL est apparu en premier sur Blog dbi services.

Huawei Dorado 6000 V3 benchmark

Yann Neuhaus - Wed, 2019-07-10 02:39

I had the opportunity to test the new Dorada 6000 V3 All-Flash storage system.
See what the all-new Dorado 6000 V3 All-Flash Storage system is capable as storage for your database system.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO”.
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two (aka this one here).

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

So in this post, we see, what are the results when we test a Huawei Dorado 6000V3 All-Flash storage system with these technics.

I uploaded all the files to a github repository: Huawei-Dorado6000V3-Benchmark.

Foreword

The setup was provided by Huawei in Shengsen, China. I’ve got remote access with a timeout at a certain point. Every test run runs for 10h, because of the timeout I was sometimes not able to capture all performance view pictures. That’s why some of the pictures are missing. Storage array and servers were provided free of charge, there was no exercise of influence from Huawei on the results or conclusion in any way.

Setup

4 Server were. provided, each with 4×16 GBit/s FC adapter direct connected to the storage systems.
There are 256 GByte of memory installed and 2x 14 Cores 2.6 GHz E5-2690 Intel CPUs.
Hyperthreading is disabled.
The 10 GBit/s network interfaces are irrelevant for this test here because all storage. traffic runs over FC.

The Dorado 6000 V3 System has 1 TByte of cache and 50x 900 GByte SSD from Huawei.
Deduplication was disabled.
Tests were made with and without compression.

Theoretical max speed

With 4x16GBit/s a maximal throughput of 64 GBit/s or 8 GByte/s is possible.
In IOPS this means we can transmit 8192 IOPS with 1 MByte block size or 1’048’576 IOPS with 8 KByte block size.
As mentioned in the title, this is theoretically or raw bandwidth, the usable bandwidth or payload is, of course, smaller: A FC-frames is 2112 bytes with 36 bytes of protocol overhead.
So in a 64 GBit/s FC network we can transfer: 64GBit/s / 8 ==> 8GByte/s * 1024 ==> 8192 MByte/s (raw) * (100-(36/2.112))/100 ==> 6795MByte/s (payload).

So we end up with a maximum of 6975 IOPS@1MByte or 869’841 IOPS@8KByte (payload) not included is the effect, that we are using multipathing* with 4x16GBit/s, which will also consume some power.

*If somebody out there has a method to calculate the overhead of multipathing in such a setup, please contact me!

Single-Server Results General

All single server tests were made on devices with enabled data compression. Unfortunately, I do not have the results from my tests with uncompressed devices for single server anymore, but you can see the difference in the multi-server section.

8 KByte block size

The 8 KByte block size tests on a single server were very performant.
What we can already tell, as higher the parallelity as better the storage performs. This is not really a surprise. Most storage systems work better, as higher the parallel access is.
Specialy for 1 thread, we see the differenc between having one disk in a diskgroup and be able to use 3967 IOPS or using e.g. 5 disks and 1 thread an be able to use 16700 IOPS.
The latency for all tests was great with 0.25ms to 0.4ms for reading operation and 0.1 to 0.4ms for write operations.
The 0.1ms for write is not that impressive, because it is mainly the performance of the write cache, but even when we exceeded the write cache we were not higher then 0.4ms

1 MByte block size

On the 1 MByte tests, we see, that we already hit the max speed with 6 devices (parallelity of 6) to 9 devices (parallelity 2).

As an example to interpret the graphic, when you have a look at the green line (6 devices), we reach the peak performance at a parallelity of 6.
For the dark blue line (7 devices) we hit the max peak at parallelity 4 and so on.

If we increase the parallelity over this point, the latency will grow or even the throughput will decrease.
For the 1 MByte tests, we hit a limitation at around 6280 IOPS. This is around 90% of the calculated maximum speed.

So if we go with Oracle ASM, we should bundle at least 5 devices together to a diskgroup.
We also see, that when we run a rebalance diskgroup we should go for a small rebalance power. A value smaller than 4 should be chosen, every value over 8 is counterproductive and will consume all possible I/O on your system and slow all databases on this server.
Monitoring / Verification

To verify the results, I am using dbms_io_calibration on the very same devices as the performance test was running. The expectation is, that we will see more or less the same results.

On large IO the measured 6231 IOPS by IO calibration is almost the same as measured by FIO (+/- 1%).
IO calibration measured 604K IOPS for small IO, which is significantly more than the +/- 340kw IOPS measured with FIO. This is explainable because IO calibration is working with the number of disks for the parallelity and I did this test with 20 disks instead. of 10. Sadly when I realized my mistake, I already had no more access to the system.

In the following pictures you see the performance view of the storage system with the data measured by FIO as an overlay. As we can see, the values for the IOPS matches perfectly.
The value for latency was lower on the storage part, which is explainable with the different points where we are measuring (once on the storage side, once on the server side).
All print screens of the live performance view of the storage can be found in the git repository. The values for Queue depth, throughput, and IOPS matched perfectly with the measured results.


Multi-Server Results with compression General

The tests for compressed and uncompressed devices were made with 3 parallel servers.

8 KByte block size

For random read with 8 KByte blocks, the IOPS increased almost linear from 1 to 3 nodes and we hit a peak of 655’000 IOPS with 10 devices / 10 threads. The answer time was between 0.3 and 0.45 ms.
For random write, we hit some kind of limitation at around 250k IOPS. We could not get a higher value than that which was kind of surprising for me. I would have expected better results here.
From the point, where we hit the maximum number of IOPS we see the same behavior as with 1 MByte blocks: More threads does only increase the answer time but does not get you better performance.
So for random write with 8 KByte blocks, the maximum numbers are around 3 devices and 10 threads or 10 devices and 3 threads or a parallelity of 30.
As long as we stay under this limit we see answer times between 0.15 and 0.5ms, over this limit the answer times can increase <10ms.
1 MByte block size

The multi-server tests show some interesting behavior with large reads on this storage system.
We hit a limitation at around 7500 to 7800 IOPS per second. For sequential write, we could achieve almost double this result with up to 14.5k IOPS.

Of course, I discussed all the results with Huawei to see their view on my tests.
The explanation for the way better performance on write then read was, with write we go straight to the 1 TByte big cache, for reading the system had to scratch everything from disk. This Beta-Firmware version did not have any read cache and that’s why the results were lower. All Firmwares starting from the end of February do have also read cache.
I go with this answer and hope to retest it in the future with the newest firmware, still thinking the 7500 IOPS is a little bit low even without read cache.
Multi-Server Results without compression

Comparing the results for compressed devices to uncompressed devices we see an increase of IOPS up to 30% and a decrease of latency at the same level for 8 KByte block size.
For 1 MByte sequential read, the difference was smaller with around 10%, for 1 MByte sequential write we could gain an increase of around 15-20%.

Multi-Server Results with high parallelity General

Because the tests with 3 servers did no max out the storage on the 8 KByte block size, I decided to do a max test with 4 parallel servers and with a parallelity from 1-100 instead of 1-10.
The steps were 1,5,10,15,20,30,40,50,75 and 100.
These tests were only performed on uncompressed devices.

8 KByte block size

It took 15 threads (per server) with 10 devices: 60 processes in total to reach the peak performance of the Dorado 6000V3 systems.
At this point, we reached 8 KByte random read 940k IOPS @0.637 ms. Remembering the answer, that this Firmware version does not have any read cache, this performance is achieved completely from the SSDs and could theoretically be even better with enabled read cache
If we increase the parallelity further, we see the same effect as with 1 MByte blocks: the answer time is increasing (dramatically) and the throughput is decreasing.

Depending on the number of parallel devices, we need between 60 parallel processes (with 10 devices) up to 300 parallel processes (with 3 parallel devices).

1 MByte block size

For the large IOs, we see the same picture as with 1 or 3 servers. A combined parallelity of 20-30 can max out the storage systems. So be very careful with your large IO tasks not to affect the other operations on the storage system.

Mixed Workload

After these tests, we know, the upper limit for this storage in single case tests. In a normal workload, we will never see only one kind of IO: There will always be a mixture of 8 KByte read & write IOPS side by side with 1 MByte IO. To simulate this picture, we create two FIO files. One creates approx: 40k-50k IOPS with random read and random write in a 50/50 split.
This will be our baseline, then we add approx. 1000 1 MByte IOPS every 60 seconds and see how the answer time reacts.


As seen in this picture from the performance monitor of the storage system the 1 MByte IOPS blocks had two effects on the smaller IOPS
The throughput of the small IOPS is decreasing
The latency is increasing.
In the middle of the test, we stop the small IOPS to see the latency of just the 1 MByte IOPS.

Both effects are expected and within the expected parameters: Test passed.

So with a base workload of 40k-50k IOPS, we can run e.g. backups in parallel with a bandwidth up to 5.5 GByte/s without interfering with the database work or we can do up to 5 active duplicates on the same storage without interfering with the other databases.

Summary

This storage system showed a fantastic performance at 8 KByte block size with very low latency. Especially the high number of parallel processes we can run against it before we hit the peak performance makes it a good choice to serving a large number of Oracle databases on it.

The large IO (1 MByte) performance for write operations was good but not that good compared with the excellent 8 KByte performance. The sequential read part is missing the read cache badly compared to the performance which is possible for writing. But even that is not on top of the line compared to other storage systems. Here I had seen other storage systems with a comparable configuration which were able to deliver up to 12k IOPS@1MByte.

Remember the questions from the first blog post:
-How many devices should I bundle into a diskgroup for best performance?
As many as possible.

-How many backups/duplicates can I run in parallel to my normal database workload without interfering with it?
You can run 5 parallel backup/duplicates with 1000 IOPS each without interferring a base line of 40-50k IOPS@8KByte

-What is the best rebalance power I can use on my system?
2-4 is absolutley enough for this system. More will slow down your other operations on the server.

Cet article Huawei Dorado 6000 V3 benchmark est apparu en premier sur Blog dbi services.

Storage performance benchmarking with FIO

Yann Neuhaus - Wed, 2019-07-10 02:18

Learn how to do storage performance benchmarks for your database system with the open source tool FIO.

Before you read

This is a series of different blog posts:
In the first blog post, I talk about “What you should measure on your database storage and why”.
The second blog post will talk about “How to do database storage performance benchmark with FIO” (aka this one here).
The third blog post will show “How good is the new HUAWEI Dorada 6000V3 All-Flash System for databases” measured with the methods and tools from post one and two.

The first two posts give you the theory to understand all the graphics and numbers I will show in the third blog post.

Install FIO

Many distributions have FIO in their repositories. On a Fedora/RHEL system, you can just use
yum install fio
and you are ready to go.

Start a benchmark with FIO

There are mainly two different ways to start a benchmark with FIO

Command line

Starting from the command line is the way to go when you just wanna have a quick feeling about the system performance.
I prefer to do more complex setups with job files. It is easier to create and debug.
Here a small example how to start a benchmark direct from the command line:
fio --filename=/dev/xvdf --direct=1 --rw=randwrite --refill_buffers --norandommap \
--randrepeat=0 --ioengine=libaio --bs=128k --rate_iops=1280 --iodepth=16 --numjobs=1 \
--time_based --runtime=86400 --group_reporting –-name=benchtest

FIO Job files

An FIO job file holds a [GLOBAL] section and one or many [JOBS] sections. This section holds the shared parameters which are used for all the jobs when you do not override them in the job sections.
Here is what a typical GLOBAL section from my files looks like:
[global] ioengine=libaio    #ASYNCH IO
invalidate=1       #Invalidate buffer-cache for the file prior to starting I/O.
                   #Should not be necessary because of direct IO but just to be sure ;-)
ramp_time=5        #First 5 seconds do not count to the result.
iodepth=1          #Number of I/O units to keep in flight against the file
runtime=60         #Runtime for every test
time_based         #If given, run for the specified runtime duration even if the files are completely read or written.
                   #The same workload will be repeated as many times as runtimeallows.
direct=1           #Use non buffered I/O.
group_reporting=1  #If set, display per-group reports instead of per-job when numjobs is specified.
per_job_logs=0     #If set, this generates bw/clat/iops log with per file private filenames.
                   #If not set, jobs with identical names will share the log filename.
bs=8k              #Block size
rw=randread        #I/O Type

Now that we have defined the basics, we can start with the JOBS section:
Example of single device test with different parallelity:


#
#Subtest: 1
#Total devices = 1
#Parallelity = 1
#Number of processes = devices*parallelity ==> 1*1 ==> 1
#
[test1-subtest1-blocksize8k-threads1-device1of1]     #Parallelity 1, Number of device: 1/1
stonewall                               #run this test until the next [JOB SECTION] with the “stonewall” keyword
filename=/dev/mapper/device01           #Device to use
numjobs=1                               #Create the specified number of clones of this job.
                                        #Each clone of job is spawned as an independent thread or process.
                                        #May be used to setup a larger number of threads/processes doing the same thing.
                                        #Each thread is reported separately: to see statistics for all clones as a whole
                                        #use group_reporting in conjunction with new_group.
#
#Subtest: 5
#Total devices = 1
#Parallelity = 5
#Number of processes = devices*parallelity ==> 1*5 ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of1]     #Parallelity 5, Number of device: 1/1
stonewall
numjobs=5
filename=/dev/mapper/device01

Example of multi device test with different parallelity:

#Subtest: 1
#Total devices = 4
#Parallelity = 1
#Number of processes = devices*parallelity ==> 4
#
[test1-subtest1-blocksize8k-threads1-device1of4]     # Parallelity 1, Number of device 1/4
stonewall
numjobs=1
filename=/dev/mapper/device01
[test1-subtest1-blocksize8k-threads1-device2of4]     # Parallelity 1, Number of device 2/4
numjobs=1
filename=/dev/mapper/device02
[test1-subtest1-blocksize8k-threads1-device3of4]     # Parallelity 1, Number of device 3/4
numjobs=1
filename=/dev/mapper/device03
[test1-subtest1-blocksize8k-threads1-device4of4]     # Parallelity 1, Number of device 4/4
numjobs=1
filename=/dev/mapper/device04
#
#Subtest: 5
#Total devices = 3
#Parallelity = 5
#Number of processes = devices*parallelity ==> 5
#
[test1-subtest5-blocksize8k-threads5-device1of3]     # Parallelity 5, Number of device 1/3
stonewall
numjobs=5
filename=/dev/mapper/device01
[test1-subtest5-blocksize8k-threads5-device2of3]     # Parallelity 5, Number of device 2/3
filename=/dev/mapper/device02
[test1-subtest5-blocksize8k-threads5-device3of3]     # Parallelity 5, Number of device 3/3
filename=/dev/mapper/device03

You can download a compelete set of FIO job files for running the described testcase on my github repository.
Job files list

To run a complete test with my job files you have to replace the devices. There is a small shell script to replace the devices called “replaceDevices.sh”

#!/bin/bash
######################################################
# dbi services michael.wirz@dbi-services.com
# Vesion: 1.0
#
# usage: ./replaceDevices.sh
#
# todo before use: modify newname01-newname10 with
# the name of your devices
######################################################
sed -i -e 's_/dev/mapper/device01_/dev/mapper/newname01_g' *.fio
sed -i -e 's_/dev/mapper/device02_/dev/mapper/newname02_g' *.fio
sed -i -e 's_/dev/mapper/device03_/dev/mapper/newname03_g' *.fio
sed -i -e 's_/dev/mapper/device04_/dev/mapper/newname04_g' *.fio
sed -i -e 's_/dev/mapper/device05_/dev/mapper/newname05_g' *.fio
sed -i -e 's_/dev/mapper/device06_/dev/mapper/newname06_g' *.fio
sed -i -e 's_/dev/mapper/device07_/dev/mapper/newname07_g' *.fio
sed -i -e 's_/dev/mapper/device08_/dev/mapper/newname08_g' *.fio
sed -i -e 's_/dev/mapper/device09_/dev/mapper/newname09_g' *.fio
sed -i -e 's_/dev/mapper/device10_/dev/mapper/newname10_g' *.fio

!!!After you replaced the filenames you should double check, that you have the correct devices, because when you start the test, all data on these devices is lost!!!

grep filename *.fio|awk -F '=' '{print $2}'|sort -u
/dev/mapper/device01
/dev/mapper/device02
/dev/mapper/device03
/dev/mapper/device04
/dev/mapper/device05
/dev/mapper/device06
/dev/mapper/device07
/dev/mapper/device08
/dev/mapper/device09
/dev/mapper/device10

To start the test run:

for job_file in $(ls *.fio)
do
    fio ${job_file} --output /tmp/bench/${job_file%.fio}.txt
done

Multiple Servers

FIO supports to do tests on multiple servers in parallel which is very nice! Often a single server can not max out a modern all-flash storage system, this could be of bandwidth problems (e.g. not enough adapters per server) or one server is just not powerful enough.

You need to start FIO in server mode on all machines you wanna test:
fio --server

Then you start the test with
fio --client=serverA,serverB,serverC /path/to/fio_jobs.file

Should you have a lot of servers you can put them in a file and use this as input for your fio command:


cat fio_hosts.list
serverA
serverB
serverC
serverD
...

fio --client=fio_hosts.list /path/to/fio_jobs.file

Results

The output files are not really human readable, so you can go with my getResults.sh script which formats you the output ready to copy/past to excel:


cd /home/user/Huawei-Dorado6000V3-Benchmark/TESTRUN5-HOST1_3-COMPR/fio-benchmark-output
bash ../../getResults.sh
###########################################
START :Typerandread-BS8k
FUNCTION: getResults
###########################################
Typerandread-BS8k
LATENCY IN MS
.399 .824 1.664 2.500 3.332 5.022 6.660 8.316 12.464 16.683
.392 .826 1.667 2.495 3.331 4.995 6.680 8.344 12.474 16.637
.397 .828 1.661 2.499 3.330 4.992 6.656 8.329 12.505 16.656
.391 .827 1.663 2.493 3.329 5.002 6.653 8.330 12.482 16.656
.398 .827 1.663 2.497 3.327 5.005 6.660 8.327 12.480 16.683
.403 .828 1.662 2.495 3.326 4.995 6.663 8.330 12.503 16.688
.405 .825 1.662 2.496 3.325 4.997 6.648 8.284 12.369 16.444
.417 .825 1.661 2.497 3.326 4.996 6.640 8.256 12.303 16.441
.401 .826 1.661 2.500 3.327 4.999 6.623 8.273 12.300 16.438
.404 .826 1.661 2.500 3.327 4.993 6.637 8.261 12.383 16.495
IOPS
2469 6009 5989 5986 5991 5966 5998 6006 6012 5989
5004 12000 11000 11000 11000 11000 11000 11000 12000 12000
7407 17000 18000 17000 17000 18000 18000 17000 17000 17000
10000 23000 23000 24000 23000 23000 24000 23000 24000 23000
12300 29000 29000 29000 30000 29900 29000 29000 30000 29900
14600 35900 35000 35000 36000 35000 35000 35000 35000 35900
16000 42100 41000 41000 42000 41000 42100 42200 42400 42500
16500 42100 41000 41900 42000 41000 42100 42400 42600 42500
19600 48000 47000 47900 47000 47900 48300 48300 48700 48600
21900 54000 53000 53900 53000 53000 54200 54400 54400 54400
###########################################
START :Typerandwrite-BS8k
FUNCTION: getResults
###########################################
Typerandwrite-BS8k
LATENCY IN MS
.461 .826 1.662 2.501 3.332 5.022 6.660 8.317 12.467 16.676
.457 .826 1.668 2.495 3.330 5.002 6.681 8.346 12.473 16.635
.449 .826 1.662 2.499 3.327 4.991 6.664 8.326 12.497 16.649
.456 .828 1.661 2.496 3.331 4.997 6.663 8.329 12.477 16.651
.460 .827 1.663 2.495 3.327 5.001 6.660 8.333 12.484 16.676
.463 .830 1.663 2.495 3.325 4.997 6.661 8.330 12.503 16.684
.474 .827 1.661 2.495 3.324 4.999 6.665 8.334 12.451 16.580
.469 .828 1.661 2.497 3.324 5.002 6.668 8.322 12.489 16.594
.471 .827 1.660 2.499 3.327 4.998 6.663 8.335 12.481 16.609
.476 .825 1.675 2.500 3.328 4.992 6.675 8.334 12.480 16.623
IOPS
2137 5997 5990 5985 5991 5966 5998 6005 6010 5992
4306 12000 11900 11000 11000 11000 11000 11000 12000 12000
6571 17000 17000 17000 18000 18000 17000 17000 17000 18000
8635 23900 23000 23000 23000 23000 23000 23000 24000 24000
10700 29000 29000 29000 30000 29900 29000 29000 30000 29000
12800 35900 35000 35000 36000 35000 35000 35000 35000 35900
14500 41000 41000 41000 42000 41000 41000 41000 42100 42200
14700 41000 41000 41900 42000 41900 41900 42000 42000 42100
16700 48000 48000 47900 47000 47000 47000 47900 47000 48100
18600 54100 53500 53900 53000 54000 53900 53900 53000 54100
...

Copy & paste the result into the excel template and you can have an easy over view of the results:
fio summary excel

Troubleshooting

If you’ve got a libaio error you have to install the libaio libraries:

fio: engine libaio not loadable
fio: failed to load engine
fio: file:ioengines.c:89, func=dlopen, error=libaio: cannot open shared object file: No such file or directory

yum install libaio-devel

Cet article Storage performance benchmarking with FIO est apparu en premier sur Blog dbi services.

Converting columns from one data type to another in PostgreSQL

Yann Neuhaus - Mon, 2019-07-08 00:19

Usually you should use the data type that best fits the representation of your data in a relational database. But how many times did you see applications that store dates or numbers as text or dates as integers? This is not so uncommon as you might think and fixing that could be quite a challenge as you need to cast from one data type to another when you want to change the data type used for a specific column. Depending on the current format of the data it might be easy to fix or it might become more complicated. PostgreSQL has a quite clever way of doing that.

Frequent readers of our blog might know that already: We start with a simple, reproducible test setup:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

What do we have here? A simple table with two columns: Column “a” is an integer and column “b” is of type text. For humans it seems obvious that the second column in reality contains a date but stored as text. What options do we have to fix that? We could try something like this:

postgres=# alter table t1 add column c date default (to_date('YYYYDDMM',b));
psql: ERROR:  cannot use column reference in DEFAULT expression

That obviously does not work. Another option would be to add another column with the correct data type, populate that column and then drop the original one:

postgres=# alter table t1 add column c date;
ALTER TABLE
postgres=# update t1 set c = to_date('YYYYMMDD',b);
UPDATE 3
postgres=# alter table t1 drop column b;
ALTER TABLE

But what is the downside of that? This will probably break the application as the column name changed and there is no way to avoid that. Is there a better way of doing that? Let’s start from scratch:

postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values ( 1, '20190101');
INSERT 0 1
postgres=# insert into t1 values ( 2, '20190102');
INSERT 0 1
postgres=# insert into t1 values ( 3, '20190103');
INSERT 0 1
postgres=# select * from t1;
 a |    b     
---+----------
 1 | 20190101
 2 | 20190102
 3 | 20190103
(3 rows)

The same setup as before. What other options do we have to convert "b" to a real date without changing the name of the column. Let's try the most obvious way and let PostgreSQL decide what to do:

postgres=# alter table t1 alter column b type date;
psql: ERROR:  column "b" cannot be cast automatically to type date
HINT:  You might need to specify "USING b::date".

This does not work as PostgreSQL in this case can not know how to go from one data type to another. But the “HINT” does already tell us what we might need to do:

postgres=# alter table t1 alter column b type date using (b::date);
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# 

For our data in the “b” column that does work. but consider you have data like this:

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'01-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (2,'02-JAN-2019');
INSERT 0 1
postgres=# insert into t1 values (3,'03-JAN-2019');
INSERT 0 1
postgres=# select * from t1;
 a |      b      
---+-------------
 1 | 01-JAN-2019
 2 | 02-JAN-2019
 3 | 03-JAN-2019
(3 rows)

Would that still work?

postgres=# alter table t1 alter column b type date using (b::date);;
ALTER TABLE
postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-02
 3 | 2019-01-03
(3 rows)

Yes, but in this case it will not:

DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 values (1,'First--January--19');
INSERT 0 1
postgres=# insert into t1 values (2,'Second--January--19');
INSERT 0 1
postgres=# insert into t1 values (3,'Third--January--19');
INSERT 0 1
postgres=# select * from t1;
 a |          b           
---+---------------------
 1 | First--January--19
 2 | Second--January--19
 3 | Third--January--19
(3 rows)

postgres=# alter table t1 alter column b type date using (b::date);;
psql: ERROR:  invalid input syntax for type date: "First--January--19"
postgres=# 

As PostgreSQL has no idea how to do the conversion this will fail, no surprise here. But still you have the power of doing that by providing a function that does the conversion in exactly the way you want to have it:

create or replace function f_convert_to_date ( pv_text in text ) returns date
as $$
declare
begin
  return date('20190101');
end;
$$ language plpgsql;

Of course you would add logic to parse the input string so that the function will return the matching date and not a constant as in this example. For demonstration purposes we will go with this fake function:

postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));;
ALTER TABLE
postgres=# \d t1
                 Table "public.t1"
 Column |  Type   | Collation | Nullable | Default 
--------+---------+-----------+----------+---------
 a      | integer |           |          | 
 b      | date    |           |          | 

postgres=# select * from t1;
 a |     b      
---+------------
 1 | 2019-01-01
 2 | 2019-01-01
 3 | 2019-01-01
(3 rows)

… and here we go. The column was converted from text to date and we provided the exact way of doing that by calling a function that contains the logic to do that. As long as the output of the function conforms to the data type you want and you did not do any mistakes you can potentially go from any source data type to any target data type.

There is one remaining question: Will that block other sessions selecting from the table while the conversion is ongoing?

postgres=# drop table t1;
DROP TABLE
postgres=# create table t1 ( a int, b text );
CREATE TABLE
postgres=# insert into t1 select a, '20190101' from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create index i1 on t1(a);
CREATE INDEX

In one session we will do the conversion and in the other session we will do a simple select that goes over the index:

-- first session
postgres=# alter table t1 alter column b type date using (f_convert_to_date(b));

Second one at the same time:

-- second session
postgres=# select * from t1 where a = 1;
-- blocks

Yes, that will block, so you should plan such actions carefully when you have a busy system. But this is still better than adding a new column.

Cet article Converting columns from one data type to another in PostgreSQL est apparu en premier sur Blog dbi services.

Telling the PostgreSQL optimizer more about your functions

Yann Neuhaus - Sun, 2019-07-07 05:29

When you reference/call functions in PostgreSQL the optimizer does not really know much about the cost nor the amount of rows that a function returns. This is not really surprising as it is hard to predict what the functions is doing and how many rows will be returned for a given set of input parameters. What you might not know is, that indeed you can tell the optimizer a bit more about your functions.

As usual let’s start with a little test setup:

postgres=# create table t1 ( a int, b text, c date );
CREATE TABLE
postgres=# insert into t1 select a,a::text,now() from generate_series(1,1000000) a;
INSERT 0 1000000
postgres=# create unique index i1 on t1(a);
CREATE INDEX
postgres=# analyze t1;
ANALYZE

A simple table containing 1’000’000 rows and one unique index. In addition let’s create a simple function that will return exactly one row from that table:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql;

What is the optimizer doing when you call that function?

postgres=# explain (analyze) select f_tmp (1);
                                         QUERY PLAN                                         
--------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..5.27 rows=1000 width=32) (actual time=0.654..0.657 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.047 ms
 Execution Time: 0.696 ms
(4 rows)

We know that only one row will be returned but the optimizer is assuming that 1000 rows will be returned. This is the default and documented. So, no matter how many rows will really be returned, PostgreSQL will always estimate 1000. But you have some control and can tell the optimizer that the function will return one row only:

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1;

Looking again at the execution plan again:

postgres=# explain (analyze) select f_tmp (1);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 ProjectSet  (cost=0.00..0.27 rows=1 width=32) (actual time=0.451..0.454 rows=1 loops=1)
   ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.003..0.004 rows=1 loops=1)
 Planning Time: 0.068 ms
 Execution Time: 0.503 ms
(4 rows)

Instead of 1000 rows we now do see that only 1 row was estimated which is what we specified when we created the function. Of course this is a very simple example and in reality you often might not be able to tell exactly how many rows will be returned from a function. But at least you can provide a better estimate as the default of 1000. In addition you can also specify a cost for your function (based on cpu_operator_cost):

create or replace function f_tmp ( a_id in int ) returns setof t1
as $$
declare
begin
  return query select * from t1 where a = $1;
end;
$$ language plpgsql
   rows 1
   cost 1;

If you use functions remember that you can give the optimizer more information and that there is a default of 1000.

Cet article Telling the PostgreSQL optimizer more about your functions est apparu en premier sur Blog dbi services.

SQL Server containers and docker network driver performance considerations

Yann Neuhaus - Fri, 2019-07-05 01:45

Few months ago I attended to the Franck Pachot session about Microservices and databases at SOUG Romandie in Lausanne on 2019 May 21th. He covered some performance challenges that can be introduced by Microservices architecture design and especially when database components come into the game with chatty applications. One year ago, I was in a situation where a customer installed some SQL Server Linux 2017 containers in a Docker infrastructure with user applications located outside of this infrastructure. It is likely an uncommon way to start with containers but anyway when you are immerging in a Docker world you just notice there is a lot of network drivers and considerations you may be aware of and just for a sake of curiosity, I proposed to my customer to perform some network benchmark tests to get a clear picture of these network drivers and their related overhead in order to design correctly Docker infrastructure from a performance standpoint.

The initial customer’s scenario included a standalone Docker infrastructure and we considered different approaches about application network configurations from a performance perspective. We did the same for the second scenario that concerned a Docker Swarm infrastructure we installed in a second step.

The Initial reference – Host network and Docker host network

The first point was to get an initial reference with no network management overhead directly from the network host. We used the iperf3 tool for the tests. This is a kind of tool I’m using with virtual environments as well to ensure network throughput is what we really expect and sometimes I got some surprises on this topic. So, let’s go back to the container world and each test was performed from a Linux host outside to the concerned Docker infrastructure according to the customer scenario.

The attached network card speed link of the Docker Host is supposed to be 10GBits/sec …

$ sudo ethtool eth0 | grep "Speed"
        Speed: 10000Mb/s

 

… and it is confirmed by the first iperf3 output below:

Let’s say that we tested the Docker host driver as well and we got similar results.

$ docker run  -it --rm --name=iperf3-server  --net=host networkstatic/iperf3 -s

 

Docker bridge mode

The default modus operandi for a Docker host is to create a virtual ethernet bridge (called docker0), attach each container’s network interface to the bridge, and to use network address translation (NAT) when containers need to make themselves visible to the Docker host and beyond. Unless specified, a docker container will use it by default and this is exactly the network driver used by containers in the context of my customer. In fact, we used user-defined bridge network but I would say it doesn’t matter for the tests we performed here.

$ ip addr show docker0
5: docker0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:70:0a:e8:7a brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
    inet6 fe80::42:70ff:fe0a:e87a/64 scope link
       valid_lft forever preferred_lft forever

 

The iperf3 docker container I ran for my tests is using the default bridge network as show below. The interface with index 24 corresponds to the veth0bfc2dc peer of the concerned container.

$ docker run  -d --name=iperf3-server -p 5204:5201 networkstatic/iperf3 -s
…
$ docker ps | grep iperf
5c739940e703        networkstatic/iperf3              "iperf3 -s"              38 minutes ago      Up 38 minutes                0.0.0.0:5204->5201/tcp   iperf3-server
$ docker exec -ti 5c7 ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
24: eth0@if25: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0
       valid_lft forever preferred_lft forever

[clustadmin@docker1 ~]$ ethtool -S veth0bfc2dc
NIC statistics:
     peer_ifindex: 24

 

Here the output after running the iperf3 benchmark:

It’s worth noting that the “Bridge” network adds some overheads with an impact of 13% in my tests but in fact, it is an expected outcome to be honest and especially if we refer to the Docker documentation:

Compared to the default bridge mode, the host mode gives significantly better networking performance since it uses the host’s native networking stack whereas the bridge has to go through one level of virtualization through the docker daemon.

 

When the docker-proxy comes into play

Next scenario we wanted to test concerned the closet network proximity we may have between the user applications and the SQL Server containers in the Docker infrastructure. In other words, we assumed the application resides on the same host than the SQL Server container and we got some surprises from the docker-proxy itself.

Before running the iperf3 result, I think we have to answer to the million-dollar question here: what is the docker-proxy? But did you only pay attention to this process on your docker host? Let’s run a pstree command:

$ pstree
systemd─┬─NetworkManager───2*[{NetworkManager}]
        ├─agetty
        ├─auditd───{auditd}
        ├─containerd─┬─containerd-shim─┬─npm─┬─node───9*[{node}]
        │            │                 │     └─9*[{npm}]
        │            │                 └─12*[{containerd-shim}]
        │            ├─containerd-shim─┬─registry───9*[{registry}]
        │            │                 └─10*[{containerd-shim}]
        │            ├─containerd-shim─┬─iperf3
        │            │                 └─9*[{containerd-shim}]
        │            └─16*[{containerd}]
        ├─crond
        ├─dbus-daemon
        ├─dockerd─┬─docker-proxy───7*[{docker-proxy}]
        │         └─20*[{dockerd}]

 

Well, if I understand correctly the Docker documentation, the purpose of this process is to enable a service consumer to communicate with the service providing container …. but it’s only used in particular circumstances. Just bear in mind that controlling access to a container’s service is massively done through the host netfilter framework, in both NAT and filter tables and the docker-proxy mechanism is required only when this method of control is not available:

  • When the Docker daemon is started with –iptables=false or –ip-forward=false or when the Linux host cannot act as a router with Linux kernel parameter ipv4.ip_forward=0. This is not my case here.
  • When you are using localhost in the connection string of your application that implies to use the loopback interface (127.0.0.0/8) and the Kernel doesn’t allow routing traffic from it. Therefore, it’s not possible to apply netfilter NAT rules and instead, netfilter sends packets through the filter table’s INPUT chain to a local process listening on the docker-proxy
$ sudo iptables -L -n -t nat | grep 127.0.0.0
DOCKER     all  --  0.0.0.0/0           !127.0.0.0/8          ADDRTYPE match dst-type LOCAL

 

In the picture below you will notice I’m using the localhost key word in my connection string so the docker-proxy comes into play.

A huge performance impact for sure which is about 28%. This performance drop may be explained by the fact the docker-proxy process is consuming 100% of my CPUs:

The docker-proxy operates in userland and I may simply disable it with the docker daemon parameter – “userland-proxy”: false – but I would say this is a case we would not encounter in practice because applications will never use localhost in their connection strings. By the way, changing the connection string from localhost to the IP address of the host container gives a very different outcome similar to the Docker bridge network scenario.

 

Using an overlay network

Using a single docker host doesn’t fit well with HA or scalability requirements and in a mission-critical environment I strongly guess no customer will go this way. I recommended to my customer to consider using an orchestrator like Docker Swarm or K8s to anticipate future container workload that was coming from future projects. The customer picked up Docker Swarm for its easier implementation compared to K8s.

 

After implementing a proof of concept for testing purposes (3 nodes included one manager and two worker nodes), we took the opportunity to measure the potential overhead implied by the overlay network which is the common driver used by containers through stacks and services in such situation. Referring to the Docker documentation overlay networks manage communications among the Docker daemons participating in the swarm and used by services deployed on it. Here the docker nodes in the swarm infrastructure:

$ docker node ls
ID                            HOSTNAME                    STATUS              AVAILABILITY        MANAGER STATUS      ENGINE VERSION
vvdofx0fjzcj8elueoxoh2irj *   docker1.dbi-services.test   Ready               Active              Leader              18.09.5
njq5x23dw2ubwylkc7n6x63ly     docker2.dbi-services.test   Ready               Active                                  18.09.5
ruxyptq1b8mdpqgf0zha8zqjl     docker3.dbi-services.test   Ready               Active                                  18.09.5

 

An ingress overlay network is created by default when setting up a swarm cluster. User-defined overlay network may be created afterwards and extends to the other nodes only when needed by containers.

$ docker network ls | grep overlay
NETWORK ID    NAME              DRIVER   SCOPE
ehw16ycy980s  ingress           overlay  swarm

 

Here the result of the iperf3 benchmark:

Well, the same result than the previous test with roughly 30% of performance drop. Compared to the initial reference, this is again an expected outcome but I didn’t imagine how important could be the impact in such case.  Overlay network introduces additional overhead by putting together behind the scene a VXLAN tunnel (virtual Layer 2 network on top of an existing Layer 3 infrastructure), VTEP endpoints for encapsulation/de-encapsulation stuff and traffic encryption by default.

Here a summary of the different scenarios and their performance impact:

Scenario Throughput (GB/s) Performance impact Host network 10.3 Docker host network 10.3 Docker bridge network 8.93 0.78 Docker proxy 7.37 0.71 Docker overlay network 7.04 0.68

 

In the particular case of my customer where SQL Server instances sit on the Docker infrastructure and applications reside outside of it, it’s clear that using directly Docker host network may be a good option from a performance standpoint assuming this infrastructure remains simple with few SQL Server containers. But in this case, we have to change the SQL Server default listen port with MSSQL_TCP_PORT parameter because using Docker host networking doesn’t provide port mapping capabilities. According to our tests, we didn’t get any evidence of performance improvement in terms of application response time between Docker network drivers but probably because those applications are not network bound here. But I may imagine scenarios where it can be. Finally, this kind of scenario encountered here is likely uncommon and I see containerized apps with database components outside the Docker infrastructure more often but it doesn’t change the game at all and the same considerations apply here … Today I’m very curious to test real microservices scenarios where database and application components are all sitting on a Docker infrastructure.

See you!

 

Cet article SQL Server containers and docker network driver performance considerations est apparu en premier sur Blog dbi services.

Using DbVisualizer to work with #Oracle, #PostgreSQL and #Exasol

The Oracle Instructor - Tue, 2019-07-02 09:01

As a Database Developer or Database Administrator, it becomes increasingly unlikely that you will work with only one platform.

It’s quite useful to have one single tool to handle multiple different database platforms. And that’s exactly the ambition of DbVisualizer.

As a hypothecial scenario, let’s assume you are a database admin who works on a project to migrate from Oracle to EDB Postgres and Exasol.

The goal might be to replace the corporate Oracle database landscape, moving the OLTP part to EDB Postgres and the DWH / Analytics part to Exasol.

Instead of having to switch constantly between say SQL Developer, psql and EXAplus, a more efficient approach would be using DbVisualizer for all three.

I created one connection for each of the three databases here for my demo:Now let’s see if statements I do in Oracle also work in EDB Postgres and in Exasol:

Oracle

EDB

Exasol

Works the same for all three! The convenient thing here is that I just had to select the Database Connection from the pull down menu while leaving the statement as it is. No need to copy & paste even.

What about schemas and tables?

Oracle

In EDB, I need to create a schema accordingly:

EDB

 

In Exasol, schema and table can be created in the same way:

Exasol

Notice that the data types got silently translated into the proper Exasol data types:

Exasol

There is no DBA_TABLES in Exasol, though:

Exasol

Of course, there’s much more to check and test upon migration, but I think you got an idea how a universal SQL Client like DbVisualizer might help for such purposes.

 

Categories: DBA Blogs

opt_estimate 4

Jonathan Lewis - Mon, 2019-07-01 07:18

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

Modifying pg_hba.conf from inside PostgreSQL

Yann Neuhaus - Sat, 2019-06-29 07:14

During one of the sessions from the last Swiss PGDay there was a question which could not be answered during the talk: Is it possible to modify pg_hba.conf from inside PostgreSQL without having access to the operating system? What everybody agreed on is, that there currently is no build-in function for doing this.

When you are on a recent version of PostgreSQL there is a view you can use to display the rules in pg_hba.conf:

postgres=# select * from pg_hba_file_rules ;
 line_number | type  |   database    | user_name |  address  |                 netmask                 | auth_method | options | error 
-------------+-------+---------------+-----------+-----------+-----------------------------------------+-------------+---------+-------
          84 | local | {all}         | {all}     |           |                                         | trust       |         | 
          86 | host  | {all}         | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          88 | host  | {all}         | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          91 | local | {replication} | {all}     |           |                                         | trust       |         | 
          92 | host  | {replication} | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          93 | host  | {replication} | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          94 | host  | {all}         | {mydb}    | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
(7 rows)

But there is nothing which allows you to directly modify that. When you are lucky and you have enough permissions there is a way to do it, though. First, lets check where pg_hba.conf is located:

postgres=# select setting from pg_settings where name like '%hba%';
           setting           
-----------------------------
 /u02/pgdata/DEV/pg_hba.conf

Having that information we can load that file to a table:

postgres=# create table hba ( lines text ); 
CREATE TABLE
postgres=# copy hba from '/u02/pgdata/DEV/pg_hba.conf';
COPY 93

Once it is loaded we have the whole content in our table (skipping the comments and empty lines here):

postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
(6 rows)

As this is a normal table we can of course add a row:

postgres=# insert into hba (lines) values ('host  all mydb  ::1/128                 trust');
INSERT 0 1
postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
 host  all mydb  ::1/128                 trust
(7 rows)

And now we can write it back:

postgres=# copy hba to '/u02/pgdata/DEV/pg_hba.conf';
COPY 94

Reading the whole file confirms that our new rule is there:

postgres=# select pg_read_file('pg_hba.conf');
                               pg_read_file                               
--------------------------------------------------------------------------
 # PostgreSQL Client Authentication Configuration File                   +
 # ===================================================                   +
 #                                                                       +
 # Refer to the "Client Authentication" section in the PostgreSQL        +
 # documentation for a complete description of this file.  A short       +
 # synopsis follows.                                                     +
 #                                                                       +
 # This file controls: which hosts are allowed to connect, how clients   +
 # are authenticated, which PostgreSQL user names they can use, which    +
 # databases they can access.  Records take one of these forms:          +
 #                                                                       +
 # local      DATABASE  USER  METHOD  [OPTIONS]                          +
 # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 #                                                                       +
 # (The uppercase items must be replaced by actual values.)              +
 #                                                                       +
 # The first field is the connection type: "local" is a Unix-domain      +
 # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,      +
 # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a     +
 # plain TCP/IP socket.                                                  +
 #                                                                       +
 # DATABASE can be "all", "sameuser", "samerole", "replication", a       +
 # database name, or a comma-separated list thereof. The "all"           +
 # keyword does not match "replication". Access to replication           +
 # must be enabled in a separate record (see example below).             +
 #                                                                       +
 # USER can be "all", a user name, a group name prefixed with "+", or a  +
 # comma-separated list thereof.  In both the DATABASE and USER fields   +
 # you can also write a file name prefixed with "@" to include names     +
 # from a separate file.                                                 +
 #                                                                       +
 # ADDRESS specifies the set of hosts the record matches.  It can be a   +
 # host name, or it is made up of an IP address and a CIDR mask that is  +
 # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that     +
 # specifies the number of significant bits in the mask.  A host name    +
 # that starts with a dot (.) matches a suffix of the actual host name.  +
 # Alternatively, you can write an IP address and netmask in separate    +
 # columns to specify the set of hosts.  Instead of a CIDR-address, you  +
 # can write "samehost" to match any of the server's own IP addresses,   +
 # or "samenet" to match any address in any subnet that the server is    +
 # directly connected to.                                                +
 #                                                                       +
 # METHOD can be "trust", "reject", "md5", "password", "scram-sha-256",  +
 # "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert".    +
 # Note that "password" sends passwords in clear text; "md5" or          +
 # "scram-sha-256" are preferred since they send encrypted passwords.    +
 #                                                                       +
 # OPTIONS are a set of options for the authentication in the format     +
 # NAME=VALUE.  The available options depend on the different            +
 # authentication methods -- refer to the "Client Authentication"        +
 # section in the documentation for a list of which options are          +
 # available for which authentication methods.                           +
 #                                                                       +
 # Database and user names containing spaces, commas, quotes and other   +
 # special characters must be quoted.  Quoting one of the keywords       +
 # "all", "sameuser", "samerole" or "replication" makes the name lose    +
 # its special character, and just match a database or username with     +
 # that name.                                                            +
 #                                                                       +
 # This file is read on server startup and when the server receives a    +
 # SIGHUP signal.  If you edit the file on a running system, you have to +
 # SIGHUP the server for the changes to take effect, run "pg_ctl reload",+
 # or execute "SELECT pg_reload_conf()".                                 +
 #                                                                       +
 # Put your actual configuration here                                    +
 # ----------------------------------                                    +
 #                                                                       +
 # If you want to allow non-local connections, you need to add more      +
 # "host" records.  In that case you will also need to make PostgreSQL   +
 # listen on a non-local interface via the listen_addresses              +
 # configuration parameter, or via the -i or -h command line switches.   +
                                                                         +
 # CAUTION: Configuring the system for local "trust" authentication      +
 # allows any local user to connect as any PostgreSQL user, including    +
 # the database superuser.  If you do not trust all your local users,    +
 # use another authentication method.                                    +
                                                                         +
                                                                         +
 # TYPE  DATABASE        USER            ADDRESS                 METHOD  +
                                                                         +
 # "local" is for Unix domain socket connections only                    +
 local   all             all                                     trust   +
 # IPv4 local connections:                                               +
 host    all             all             127.0.0.1/32            trust   +
 # IPv6 local connections:                                               +
 host    all             all             ::1/128                 trust   +
 # Allow replication connections from localhost, by a user with the      +
 # replication privilege.                                                +
 local   replication     all                                     trust   +
 host    replication     all             127.0.0.1/32            trust   +
 host    replication     all             ::1/128                 trust   +
 host  all mydb  ::1/128                 trust                           +
(1 row)

All you need to do from now on is to reload the configuration and you’re done:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Of course: Use with caution!

Cet article Modifying pg_hba.conf from inside PostgreSQL est apparu en premier sur Blog dbi services.

opt_estimate 3

Jonathan Lewis - Fri, 2019-06-28 07:12

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

Should the Oracle APEX Community Care About Autonomous Database?

Joel Kallman - Fri, 2019-06-28 06:14


This past week, Oracle announced the availability of Oracle APEX, SQL Developer Web and Oracle REST Data Services on Oracle Autonomous Database.  If you're in the APEX community, should you care?  I say "absolutely yes!", but not for the reasons you might suspect.

Autonomous Database is strategic to Oracle.  Just read the transcript from the recent Oracle quarterly earnings conference call and it will be obvious to you.  Autonomous is an advancement in technology that has significant investment from Oracle and very real benefits for customers.  It's a clear market differentiator - I do truly believe this, it's not merely my marketing spin.  And now, with the addition of Oracle APEX & SQL Developer Web & Oracle REST Data Services, I think this combination of technologies provides even more capabilities to this platform and even greater differentiation.  What other service provides elastic, autonomous capabilities, application design and proven low code application development, out-of-the-box?  Did I mention that this also happens to include the world's most popular database, Oracle Database?

The benefits of low code application development are real.  And Low Code + Autonomous Database is the ideal combination.  Low code is about reducing costs, delivering faster, with greater consistency, and being usable by a broader range of skill sets.  Some of the benefits of Autonomous Database are equivalent - less cost, instant availability, usable by others who may not be world-class experts.  It has been a long multi-year confluence of events that has brought us together here.

The APEX community is the envy of others at Oracle.  Even people who aren't APEX fans recognize the APEX community's passion.  But where did this come from?  Do people really get excited about a tool?  No.  They get excited about what they can do with a tool - how it helps them deliver a solution, and be successful.  A carpenter doesn't get passionate about his dual-slide compound miter saw because it's a cool tool.  He gets satisfaction about what he can actually do with that tool versus a hand saw.  When you get a pay raise or praise or a promotion because of what you've been able to deliver with APEX and ORDS and Oracle Database, that's a reason to get excited!  And I think that is ultimately the real story behind the enviable, tangible energy in the APEX community.  Countless people have had many great successes with this combination of technologies, and success begets success.

Let's say you're in the APEX community, you saw this announcement about APEX on Autonomous, but you're not interested in cloud.  Or, as Andre de Souza so eloquently stated on Twitter, "I know it’s big news, just does not affect 99,9% of current #orclapex developers I’m guessing."  Should you care?  I say yes, and here's why.  The great APEX community that I mention above, which has been so successful with APEX & ORDS & Oracle Database over the years, has become very large across the globe, and with not a lot of help from Oracle.  Make no mistake - Oracle does invest in APEX, millions of dollars every year.  But I still come across Oracle Database customers who have simply never heard of APEX.  This is because there has not been much promotion from Oracle marketing or public relations or even sales.  All of this is about to change.  Why?  Because APEX is on Autonomous Database, and Autonomous Database is strategic to Oracle.  You will probably see more communication and discussion from Oracle about APEX than probably the last 20 years combined.  Low code resonates with customers, APEX is proven, and everyone has application development needs.

How does this benefit someone in the APEX community?  Simple:

  1. Awareness and interest will rise by people who have never heard about APEX before, both existing on-premises customers and net new customers.
  2. There will be greater demand for APEX and database development talent.  If you have experience with APEX, with a proven track record of delivering solutions with APEX, you're a very attractive person.  Perhaps the rate you charge has now gotten a bit higher.  You'll certainly gain upward mobility.
  3. You'll no longer have to introduce someone to APEX for the very first time, or counter the claim that "it's not strategic."
  4. As our friends from Explorer UK say, with APEX, they "develop cloud ready applications".  And you've been doing this for years.  Don't be afraid to make this claim.  When and if you're ready for cloud, you're already out of the gate.  The same APEX apps you developed on-premises run and look and feel exactly the same in the cloud.  Who has been developing cloud-ready apps for years?  You!

So.  Even if you're not into "cloud" but into APEX, this announcement and these capabilities on Autonomous Database has material impact on you and everyone else in the APEX community.  Your skills and experience will become more valued, and we should expect the market and interest and demand to grow.

Everything is not perfect, and we on the APEX team still have a lot of very hard work ahead of us.  But these are exciting times and it's what we've labored on for the past 20 years, to get to this point.  For those who have been with the APEX community for so many years, congratulations!  You've bet on the right horse.  Just fasten your seat belt.

Windocks and K8s support

Yann Neuhaus - Fri, 2019-06-28 00:33

I got recently the 4.08 update from the Windocks team and I was very excited to evaluate some of new features. The first cool one I want to present in this blog concerns the Kubernetes support for deploying Windocks containers that will make my application deployment definitely easier. Let’s say you want to deploy your application that is tied to a Windocks container for SQL Server. In a previous blog post I explained why we are using Windocks in our context. So, with previous versions of Windocks, we had to write custom scripts to deploy applications on K8s that are tied to a Windocks. With the new version 4.08, this process may be simplified because both of applications and their related Windocks containers are directly deployable on K8s by using a YAML deployment file.

In fact, the new way consists in deploying a Windocks SQL Server proxy on K8s that works in conjunction with a Windocks Server. Once the SQL Server proxy deployed a corresponding Windocks container is spinning up with their specific parameters as shown in the picture below:

 

First of all, in order to make access secure between K8s and the Windocks Server authentication is required and we need to provide credential information that will be stored in the sql-proxy secret in K8s. SA password is also included in this secret and will be used to setup the SA account when the Windocks container will spin up.

$ kubectl create secret generic proxy-secrets --from-literal=WINDOCKS_REQUIRED_USERNAME='clustadmin' --from-literal=WINDOCKS_REQUIRED_PASSWORD='StrongPassword' --from-literal=WINDOCKS_REQUIRED_CONTAINER_SAPASSWORD=’sa_password'

 

The next step consists in deploying the Windocks SQL proxy by with the specific environment variables including WINDOCKS_REQUIRED_HOSTNAME (Windocks server name or IP Address), WINDOCKS_REQUIRED_IMAGE_NAME (Windocks based image used for container) and WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT (optional).

  • The Windocks SQL Proxy YAML file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: windocks-sql-proxy-secure 
  labels:
    app: sqlproxy-secure 
spec:
  replicas: 1 
  template:
    metadata:
      labels:
        app: sqlproxy-secure 
        tier: frontend
    spec:
      containers:
      - name: sqlproxy-secure-app 
        image: windocks/windocks-sql-server-proxy 
        imagePullPolicy: Always
        ports:
        - name: tcp-proxy
          containerPort: 3087
        - name: tls-proxy
          containerPort: 3088
        envFrom:
          - secretRef:
              name: proxy-secrets
        env:
          - name: PROJECT_ID
            value: project_id_for_GKE_deployment_optional
          - name: WINDOCKS_REQUIRED_HOSTNAME
            value: xx.xxx.xxx.xxx
          - name: WINDOCKS_REQUIRED_IMAGE_NAME
            value: 2012_ci
          - name: WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT
            value: "3087"

 

If we want to make the SQL Proxy pod accessible from outside a service is needed but this is not mandatory according to the context. Note that you may also use TLS connection to secure the network between K8s and the Windocks server.

  • The Windocks service YAML file
apiVersion: v1
kind: Service
metadata:
  name: windocks-sql-proxy-secure
  labels:
    app: sqlproxy-secure
    tier: frontend
spec:
  sessionAffinity: ClientIP
  type: LoadBalancer
  ports:
  - port: 3087
    name: tcp-proxy-secure-service
    targetPort: 3087
  - port: 3088
    name: tls-proxy-secure-service
    targetPort: 3088
  selector:
    app: sqlproxy-secure
    tier: frontend

 

Let’s give a try on my Azure infrastructure including an AKS cluster and a Windocks Server installed in an Azure VM. I also took the opportunity to create my own helm chart from the YAML files provided by the Windocks team. It will make my deployment easier for sure. Here the command I used to deploy my Windocks helm chart on my AKS cluster.

$ helm install --name windocks2012 --namespace dmk --set Windocks.Image=2012_ci --set Windocks.Port=3089 --set Windocks.PortSSL=3090 .

 

Deployment will be performed in a specific namespace named dmk and the 2012_ci image will be used as based image for my Windocks container. I will be able to connect to my Windocks container by using the 3089 port through the SQL Proxy deployed on K8s. After few seconds the following resources were deployed within my dmk namespace including a Windocks SQL Proxy pod and the Windocks SQL Proxy service.

$ kubectl get all -n dmk
NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fb8694m   1/1     Running   0          13m

NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
                 AGE
service/backend                                                 ClusterIP      10.0.126.154   <none>          80/TCP
                 8d
service/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   LoadBalancer   10.0.252.235   xx.xx.xxx.xxx   3089:30382/TCP,3090:30677/TCP   44m

NAME                                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   1         1         1            1           44m

NAME                                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fbdb5c96   1         1         1       44m

 

Once deployed, the SQL proxy will redirect all connections from 3089 port to the container port after spinning up the corresponding Windocks container on the Windocks server. We may get some details by taking a look at the SQL Proxy logs on K8s. As a reminder the container port is allocated dynamically by default by the Windocks server and the SQL proxy get it automatically for connection redirection.

…
Valid response for creating Windocks container
Container id is b1201aaaba3b4cd047953b624e541e26500024e42e6381936fc7b526b5596a99
Container port is 10001
Setting up tcp server
redirecting connections from 127.0.0.1:3089 to xx.xxx.xxx.xxx:10001 
…

 

Let’s try to connect by using mssql-cli and the external IP of the SQL Proxy service and the 3089 port. The connection redirect is effective and I can interact with my Windocks container on local port 10001:

master> SELECT top 1 c.local_net_address, c.local_tcp_port
....... FROM sys.dm_exec_connections as c; 
+---------------------+------------------+
| local_net_address   | local_tcp_port   |
|---------------------+------------------|
| 172.18.0.5          | 10001            |
+---------------------+------------------+

 

The Windocks container for SQL Server was spinning up my 3 testing databases as expected:

master> \ld+
+-------------------+-------------------------+-----------------------+------------------------------+
| name              | create_date             | compatibility_level   | collation_name               |
|-------------------+-------------------------+-----------------------+------------------------------|
| master            | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| tempdb            | 2019-06-27 20:04:04.273 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| model             | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| msdb              | 2012-02-10 21:02:17.770 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| AdventureWorksDbi | 2019-06-27 20:04:03.537 | 100                   | Latin1_General_100_CS_AS     |
| ApplixEnterprise  | 2019-06-27 20:04:04.477 | 90                    | SQL_Latin1_General_CP1_CI_AS |
| dbi_tools         | 2019-06-27 20:04:05.153 | 100                   | French_CS_AS                 |
+-------------------+-------------------------+-----------------------+------------------------------+

 

From the Windocks server, I may get a picture of provisioned containers. The interesting one in our case is referenced by the name k8s-windocks2012/xxxx:

PS F:\WINDOCKS\SQL2012> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
e9dbe5556b2f        2012_ci             ""                  29 minutes ago      Stopped             10002/              dab/Windocks-id:31432367-c744-4ae3-8248-cb3fb3d2792e
b1201aaaba3b        2012_ci             ""                  13 minutes ago      Started             10001/              k8s-windocks2012/Windocks-id:cfa58c38-d168-4c04-b4c8-12b0552b93ad

 

Well, in a nutshell a feature we will consider to integrate in our DevOps Azure pipeline for sure. Stay tuned, other blog posts will come later.

See you!

 

 

 

Cet article Windocks and K8s support est apparu en premier sur Blog dbi services.

work Agile in a GxP-regulated environment

Yann Neuhaus - Thu, 2019-06-27 09:24

On 4 June 2019 I followed an invitation to

wega-it’s Know-how & Networking Breakfast 2 2019 on “Agile Validation in GxP Projects”.

So they were to discuss Agility in the context of GxP regulation.

I had some earlier exposure to various kinds of compliance roles and topics, and my current work environment is in the highly regulated Pharma industry. So I was really wondering (and hoping for learning) how you can possibly bring the two points of view of Agility and GxP regulation together. The Literaturhaus Basel was to see some very special type of literature that day. Not a formal presentation but a role play performance between these two viewpoints, represented by Evelyne Daniel, an experienced GxP validation expert, and Mathias Fuchs, an equally experienced Scrum master, both from wega-IT. A very nice idea, very appropriate for the topic!

What is Compliance (GxP) and what is Agile?

Typically in GxP compliance we work along the so-called V-model. In its rigidness and wanted plannability of course it corresponds largely to the waterfall model of traditional software development. Opposed to this the Agile Manifesto (2001) criticizes the very basics of these traditional ways of working. Remember the iconic claims like “Working software over comprehensive Documentation” and “Responding to change over following a Plan”. But how would you ever get acceptance in the compliance world without full documentation and planning!?! When I quickly browsed the internet, I found a multitude of statements and proposals which would certainly merit a separate blog post. For this time, I will give a quick summary of the wega breakfast presentation and some touch points with our current working environment in the Pharma industry.

Although in my current work environment we are not actually driving GxP Validation projects, we are still subject to the very tight GxP regulation. In the processes of Change and Release Management, this is reflected in the rigid rules of testing and documentation, to just name the most obvious examples. Background, of course, is the definition of Compliance and its Validation: the goal is to “establish documented evidence” to assure compliance and quality etc. These requirements hold independently of the quality, completeness or even up-to-date status of the pre-defined processes and rules! Call this inflexible and cumbersome! Any adaptation (update!) of the processes and rules is very formal through the complicated administrative processes to be used and hence very slow. Consider this in our fast-moving (not only IT-) world!

What is an MVP?

A nice play of words was interjected in the stage discussion: the acronym MVP has a very clear meaning as a basic concept for both sides, just it is not the same: MVP = Master Validation Plan (in GxP Validation) versus Minimal Viable Product (in Agile or Lean Software Development).

How to bring them together?

Now how to bring the core aspirations of Agile Development like Customer focus, Flexibility, Speed into the Compliance world? A first inevitable step in the V-model world: break up the (dead?) lock between a complete finalization of User Requirements Specification and the setup of a complete Validation Plan prescribing all Qualification criteria (IQ, OQ, PQ). Definition of Done (DoD) plays a major role when trying to cut the end-to-end Development-Validation elephant into smaller pieces. Inclusion of Validation into the “daily” Development activities is another must, instead of adding Validation at the end of Development phases only. Yet another core principle from the Agile side is the ensurance of team Maturity and Mindset. Much-hailed Diversity is opposed to pure compliance-oriented expert teams, striving for innovation and creativity in the team.

WEGA breakfast - Agile Validation in GxP projects

Some basic approaches

The final answer on how to – methodically – combine or maybe rather “emulsify” Agility and Compliance Validation comes as no surprise: there is no one-size-fits-all method. Rather three obvious basic approaches were presented.

  1. introducing Agility right between the left (Specifications) and the right (Qualifications) arms of the V-model, probably using some kind of piloting or prototyping
  2. including Validation into the Agile Development, almost doing Validation in each Agile sprint
  3. appending V-model Validation at the end of an Agile development.

The above-mentioned end-to-end Development-to-Validation elephant has to be broken into smaller better manageable units. Each specific project situation will have its own possible and best way to do it.

Think innovative and creative!

Thanks to wega-informatik (www.wega-it.com)  for organizing this creative and informative event.

 

Cet article work Agile in a GxP-regulated environment est apparu en premier sur Blog dbi services.

Glitches

Jonathan Lewis - Wed, 2019-06-26 11:11

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles

Yann Neuhaus - Wed, 2019-06-26 10:23

Pour une fois un blog en français parce que concernant un événement Oracle en Français.
Aujourd’hui chez Oracle Suisse à Genève, il y’avait une présentation concernant la sécurité Oracle.
Il s’agissait de se mettre à la place d’un Hacker pour mieux appréhender leurs démarches et ensuite comment protéger les données dans une base de données Oracle.
Comprendre les intentions et moyens utilisés par les Hackers pour parvenir à leurs fins nous aide à mieux les combattre.

La séance était animée par Hakim Loumi – EMEA DB Security PM
Dans un premier temps le conférencier a montré une estimation de l’accroissement des données d’ici 2025. Waw 175 Zb

Et comment ces données pouvaient être sensitives

Impressionnant le nombre d’attaques dans le monde

Hakim a aussi montré pourquoi le Hacking était devenu un de fléaux les plus importants. Dans ce slide ci-dessous pour un investissement de moins de moins de 10$, une identité complète revenait en moyenne à 240$. Quelle rentabilité !!!!!

Evidemment la base de données est une cible principale des attaques

Le conférencier a ensuite présenté les principaux outils fournis par Oracle, principalement nécessitant l’option Advanced Security Option

Data Masking
Data Redaction
Oracle Firewall
Database vault

Et pour terminer sur ce joli slide

Conclusion

L’évenement étatit vraiment intéressant. Le conférencier par des exemples, des anecdotes simples a su capter l’attention du public. On retiendra surtout dans cette présentation que la sécurité des données, n’est pas que l’affaire du DBA. En effet c’est toute une chaines de procédures incluant tout le monde dans l’entreprise.

Cet article Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles est apparu en premier sur Blog dbi services.

Oracle Developer Tools - Do They Still Exist?

Andrejus Baranovski - Wed, 2019-06-26 01:55
People are frustrated about @OracleADF @JDeveloper on social media - "ADF boat has no captain", etc. I agree @Oracle is to blame big time for such lame handling of its own Developer Tools stack. @Oracle please wake up and spend some budget on @OracleADF. Read more:

Oracle VBCS - right now this tool gets the most of Oracle focus. Supposed to offer declarative #JavaScript development experience in the Cloud. Not well received by the community. Are there any VBCS customers, please respond if yes?

Oracle APEX - comes with a very strong community (mostly backed by DB folks). But is not strategic for Oracle. More likely will be used by PL/SQL guys then by Java or Web developers. 

Oracle JET - highly promoted by Oracle. Set of opensource #JavaScript libs, glued by Oracle layer. Nice, but can't be used as a direct replacement for @OracleADF, JET is UI layer only. Oracle folks often confuse community by saying - Oracle JET is a great option to replace ADF

Oracle Forms - still alive, but obviously can't be strategic Oracle platform. A few years ago, Oracle was promoting Forms modernization to @OracleADF

Summary - Oracle Developer tools offering is weak. Lack of Oracle investment into development tools - makes Oracle developers community shrink.

opt_estimate 2

Jonathan Lewis - Tue, 2019-06-25 14:22

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).

Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins.

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

SQLcl ALIAS – because you can’t remember everything.

The Anti-Kyte - Tue, 2019-06-25 08:47

I want to find out which file is going to hold any trace information generated by my database session. Unfortunately, I keep forgetting the query that I need to run to find out.
Fortunately I’m using SQLcl, which includes the ALIAS command.
What follows is a quick run-through of this command including :

  • listing the aliases that are already set up in SQLcl
  • displaying the code that an alias will execute
  • creating your own alias interactively
  • deleting an alias
  • using files to manage custom aliases

Whilst I’m at it, I’ll create the alias for the code to find that pesky trace file too.

In the examples that follow, I’m connected to an Oracle XE18c PDB using SQLcl 18.4 from my Ubuntu 16.4 LTS laptop via the Oracle Thin Client. Oh, and the Java details are :

Meet the ALIAS command

As so often in SQLcl, it’s probably a good idea to start with the help :

help alias

…which explains that :

“Alias is a command which allows you to save a sql, plsql or sqlplus script and assign it a shortcut command.”

A number of aliases are already included in SQLcl. To get a list of them simply type :

alias

…which returns :

locks
sessions
tables
tables2

If we want to see the code that will be run when an alias is invoked, we simply need to list the alias :

alias list tables

tables - tables <schema> - show tables from schema
--------------------------------------------------

select table_name "TABLES" from user_tables

Connected as HR, I can run the alias to return a list of tables that I own in the database :

Creating an ALIAS

To create an alias of my own, I simply need to specify the alias name and the statement I want to associate it with. For example, to create an alias called whoami :

alias whoami =
select sys_context('userenv', 'session_user')
from dual;

I can now confirm that the alias has been created :

alias list whoami
whoami
------

select sys_context('userenv', 'session_user')
from dual

…and run it…

I think I want to tidy up that column heading. I could do this by adding an alias in the query itself. However, alias does support the use of SQL*Plus commands…

alias whoami =
column session_user format a30
select sys_context('userenv', 'session_user') session_user
from dual;

…which can make the output look slightly more elegant :

A point to note here is that, whilst it is possible to include SQL*Plus statements in an alias for a PL/SQL block (well, sort of)…

alias whoami=set serverout on
exec dbms_output.put_line(sys_context('userenv', 'session_user'));

…when the alias starts with a SQL*Plus statement, it will terminate at the first semi-colon…

Where you do have a PL/SQL alias that contains multiple statement terminators (‘;’) you will need to run any SQL*Plus commands required prior to invoking it.
Of course, if you find setting output on to be a bit onerous, you can save valuable typing molecules by simply running :

alias output_on = set serverout on size unlimited

I can also add a description to my alias so that there is some documentation when it’s listed :

alias desc whoami The current session user

When I now list the alias, the description is included…more-or-less…

I’m not sure if the inclusion of the text desc whoami is simply a quirk of the version and os that I’m running on. In any case, we’ll come to a workaround for this minor annoyance in due course.

In the meantime, I’ve decided that I don’t need this alias anymore. To remove it, I simply need to run the alias drop command :

alias drop whoami


At this point, I know enough about the alias command to implement my first version of the session tracefile alias that started all this.
The query, that I keep forgetting, is :

select value
from v$diag_info
where name = 'Default Trace File'
/

To create the new alias :

alias tracefile =
select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File';

I’ll also add a comment at this point :

alias desc tracefile The full path and filename on the database server of the tracefile for this session

My new alias looks like this :

The aliases.xml file

Unlike the pre-supplied aliases, the code for any alias you create will be held in a file called aliases.xml.

On Windows, this file will probably be somewhere under your OS user’s AppData directory.
On Ubuntu, it’s in $HOME/.sqlcl

With no custom aliases defined the file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases/>

Note that, even though I have now defined a custom alias, it won’t be included in this file until I end the SQLcl session in which it was created.

Once I disconnect from this session, the file includes the new alias definition :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="tracefile">
<description><![CDATA[desc tracefile The full path and filename on the database server of the tracefile for this session
]]></description>
<queries>
<query>
<sql><![CDATA[select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File']]></sql>
</query>
</queries>
</alias>
</aliases>

Incidentally, if you’ve played around with SQLDeveloper extensions, you may find this file structure rather familiar.

The file appears to be read by SQLcl once on startup. Therefore, before I run SQLcl again, I can tweak the description of my alias to remove the extraneous text…

<description><![CDATA[The full path and filename on the database server of the tracefile for this session]]></description>

Sure enough, next time I start an SQLcl session, this change is now reflected in the alias definition :

Loading an alias from a file

The structure of the aliases.xml file gives us a template we can use to define an alias in the comfort of a text editor rather than on the command line. For example, we have the following PL/SQL block, which reads a bind variable :

declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
/

Rather than typing this in on the command line, we can create a file ( called pep_talk.xml) which looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="pep_talk">
<description><![CDATA[How are you feeling ? Usage is pep_talk <emotion>]]></description>
<queries>
<query>
<sql><![CDATA[
declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
]]></sql>
</query>
</queries>
</alias>
</aliases>

Now, we can load this alias from the file as follows :

alias load pep_talk.xml
Aliases loaded

We can now execute our new alias. First though, we need to remember to turn serveroutput on before we invoke it :

Once you’ve terminated your SQLcl session, the new alias will be written to aliases.xml.

Exporting custom aliases

There may come a time when you want to share your custom aliases with your colleagues. After all, it’s always useful to know where the trace file is and who doesn’t need a pep talk from time-to-time ?

To “export” your aliases, you can issue the following command from SQLcl :

alias save mike_aliases.xml

This writes the file to the same location as your aliases.xml :

You can then import these aliases to another SQLcl installation simply by sharing the file and then using the alias load command.

References

As you can imagine, there are a wide variety of possible uses for the ALIAS command.

As the original author of this feature, this post by Kris Rice is probably worth a read.
Jeff Smith has written on this topic several times including :

Menno Hoogendijk has an example which employs some Javascript wizardry which he has published on GitHub.

Right, back to my trace files.

Pages

Subscribe to Oracle FAQ aggregator