Tuesday, January 30, 2007

Various thought balloons #1

Not too many people visit this site, which is fine. It is an ongoing experiment in learning blogging, mucking around with ads etc.

You might have noticed the adsense ads are gone. I did a quick check on google and even the pages which used to be on google are gone now.
Seems I must be a spam blog or splog in Google's eyes. oh well. Does this look and feel like a splog?

I was struck over the weekend by the various blogs I read about databases. Sometimes I feel it is a pissing contest i.e. look how deeply we have delved into the internals... look at the complexity which we understand.
Reality is most people are looking for information, mostly to make their job easier or finding a distraction from their everyday life or wondering what is happening elsewhere.

Enough ranting.

There was a slight note in passing about Jim Gray on a blog I read, apparently he is missing after going sailing, I clicked the link and bingo. This is an awesome site. I have already more than 10 presentations and documents to read and some more websites to visit.

Go check it out. There was a recent debate about disk bandwidth, there is plenty of fuel on rekindle to fire from Jim Gray's website.

Have Fun

Paul

Tuesday, January 16, 2007

Just works right.

In my day job a reasonable percentage of my time I tend to have to check the same things, in some cases fix or monitor the same problem.
I have a need for automation and I spend time making scripts so I don't have to type, using dynamic sql is a good example. I dislike GUIs which force too much of the workflow onto you by not handling the general case well.

If you don't know the power of dynamic sql, it is sql which generates sql.

Example:


select 'alter session kill session '''||sid||','||serial#||''';'
from v$session
where username is not null and last_call_et > 3600;

Produces a list of SQL commands to kill all non-Oracle sessions who are idle for more than hour Note: Use sqlnet.expire_time=60 instead

alter session kill session '24,34567';
etc...

So my script library is a spawling bunch of sql and sh scripts, I find myself not remembering the syntax of the command (I know the table/view and command I need) but I just go and grep for the command or table and 9 times out of 10 I find a script I have already written.

If I combine several scripts into one, I now have a tool. Suddenly what took 2 mins to run and diagnose takes 10 secs, I am investigating the why sooner. No stuffing around getting the info I need.

When I am working I tend to listen to dance music, apparently the specific genre is vocal trance. As I have mentioned in the past I use Pandora as their range of music is superior to anything I have here. Other times I listen to podcasts, they just go in the background and I continue working, occasionally stopping and rewinding a bit (using the slider) to listen to a bit.

This podcast was good. I would recommend you too listen to it, a couple of times.

I love unix and the power of the pipe. Locate and grep can find files and contents faster than anything Windows XP can do.
I hate how PC Linux still exposes too much underlining hardware/software conflicts. The expectation is you will drop everything and debug the device driver or even better patch and recompile the kernel.
I am a techy I can handle it, noone else in my family, including my wife who is a techy as well can be bothered with it. I don't use linux now, I use cygwin which has all the unix I need.

The message from the podcast was really to ask what the user wants, what task is the user trying to achieve. Just make it work.

Have Fun

Wednesday, January 03, 2007

Great article about Enron and disclosure

I found this article on Enron and puzzles via econlog and whilst it is a long read it is extremely interesting.

Basically the information is in plain sight that Enron was in trouble.

Here is the student report that Malcolm Gladwell. A little bit of Google produced the pdf.

Perhaps a good rule of thumb is to always get the cashflow against profit and also profit against tax paid.

Whilst Malcolm mentions that Enron was unaffected by short-sellers, it was in the end. The short seller who spent hours checking 10Q and 10K filings could have ridden Enron into insolvency, keeping in the effect the whole amount which was short-sold.

Update:

I read the student report on Enron (ENE) and it reads like any other financial advice/analysis ie. slightly slippery, hedging bets, the short term recommendation was to sell. However the students remained convinced of Enron's strategy and remained neutral longer term.

They did in fact discover the possibility of earnings manipulation however chose to disregard it. In the reports word "no cause for concern".

Unfortunately any analysis now is tainted by hindsight. Enron was not some much a energy company more an "energy bank", where 88.9% of its revenue (but only 28.6%) of its income came from it financial arm. So like a bank it lived and died by the risk and cost of capital plus unlike a bank, there was no reserve bank to provide cash when the reserves are gone.

Have Fun

Paul

Monday, December 04, 2006

MySQL Recipes: promoting a slave to master or changing masters

In the corporate world, promoting a slave to a master requires many years hard work, obtaining a suitable business degree, getting an MBA, and having a taste for promotion. The alternative is getting that break as an entrepreneur where you start as the master of your own destiny and end being the master when the money runs out or you take the money and move on.

In MySQL land, failovers for redundancy, disaster recovery or load balancing are performed by master databases and slave databases, the most popular method using binlog replication.

Note: This commands are valid for MySQL 3.23,4.0,4.1,5.0 and 5.1

For older versions:

  1. replace RESET MASTER with FLUSH MASTER.
  2. replace RESET SLAVE with FLUSH SLAVE.

In a simple configuration you might have one master and one slave.

Recipe to promote a slave to a master for simple replication.

On Master:

  1. FLUSH LOGS;

On Slave:

  1. STOP SLAVE;
  2. RESET MASTER;
  3. Optional: CHANGE MASTER TO MASTER_HOST='';

More complex setups:

If you have a chain of replication using a slave as a pseudo master for another slave. Kind of like having a king with a local sheriff taking orders and then telling the serf.

eg: master M1 -> slave S1 -> slave S2

Note: Slave S1 has --log-slave-updates enabled

Recipe to promote a slave to a master for simple chained replication.

On Master M1:

  1. FLUSH LOGS;

On Slave S1:

  1. STOP SLAVE;
  2. RESET MASTER;
  3. Optional: CHANGE MASTER TO MASTER_HOST='';

On Slave S2:

  1. STOP SLAVE;
  2. RESET SLAVE;
  3. START SLAVE;

If you have a multiple slave replication with one master and two or more slaves.

eg: slave S2 <- master M1 -> slave S1

Recipe to promote a slave to a master for multiple slave replication.

Note: Slave S2 is going to use the newly promoted slave S1 as its master.

On Master M1:

  1. FLUSH LOGS;

On Slave S1:

  1. STOP SLAVE;
  2. RESET MASTER;
  3. Optional: CHANGE MASTER TO MASTER_HOST='';

On Slave S2:

  1. STOP SLAVE;
  2. CHANGE MASTER TO MASTER_HOST = 'SLAVE S1 hostname';
  3. RESET SLAVE;
  4. START SLAVE;

Verification:

Once you have used one the recipes, and you are keen to taste the results, run the command

SHOW SLAVE STATUS;

Make sure that the binlog for the master is actually the correct one.

One of the joys of using MySQL is the simple commands required to perform task which can require some fancy footwork on other RDBMS's.

There are a couple more methods of replication which aren't covered here. Also see the MySQL Replication FAQ. The methods are the same, but the formatting of the procedure(s) are less than ideal.

Have Fun

Paul

Monday, October 16, 2006

SQLserver performance tricks

I came across this post at Adam's website on using TOP and ORDER BY to achieve intermediate materialization of inline view.
I promised Adam in the comments I would pass it on and I have, via email to my DBA mates who I am sure haven't seen this technique and now here.

This is a very good technique, I was going to type trick however a trick is something which seems to conjue up an image of magic, once you use this trick yourself enough it will become a skill.

Adam's article whilst not mentioning it also shows the iterative processes of tuning, at times the steps can be large and seem weird to the outsider i.e. how did you go from here to there without all those intermediate steps.

I tried Adam's technique on MYSQL 5.0, maybe I am missing something with MySQL however it couldn't get MySQL to use the derived table on the join... sigh.
I noticed the documentation (MySQL calls them derived tables not inline views) mentions the fact they are always evaluated for optimisation!?!



Simple join

mysql> explain select a.userid,count(*) from ratings a
-> join ratings b on a.movieid = b.movieid
-> group by a.userid;
+----+-------------+-------+--------------------+--------------------+---------+------------------+--------
| id | type | possible_keys | key | key_len | ref | rows | Extra
+----+-------------+-------+--------------------+--------------------+---------+------------------+--------
| 1 | index | ix_ratings_movieid | ix_ratings_userid | 5 | NULL | 100165 | |
| 1 | ref | ix_ratings_movieid | ix_ratings_movieid | 5 | movie1.a.movieid | 28 | Using where;
Using index|
+----+-------------+-------+-------+--------------------+--------------------+---------+------------------+

2 rows in set (0.00 sec)

Using INLINE view

mysql> explain select a.userid,b.theCount from ratings a
-> join
-> (select movieid,count(*) theCount from ratings b group by movieid) b on a.movieid = b.movieid
-> ;
+----+-------------+------------+-------+--------------------+--------------------+---------+--------
| id | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+--------------------+--------------------+---------+--------
| 1 | | NULL | NULL | NULL | NULL | 1682 | |
| 1 | ref | ix_ratings_movieid | ix_ratings_movieid | 5 | b.movieid | 28 | Using where |
| 2 | index | NULL | ix_ratings_movieid | 5 | NULL | 100165 | Using index |
+----+-------------+------------+-------+--------------------+--------------------+---------+--------
3 rows in set (0.11 sec)


Using INLINE with ORDER BY n LIMIT m

mysql> explain select a.userid,b.theCount from ratings a
-> join
-> (select movieid,count(*) theCount
-> from ratings group by movieid order by movieid limit 2147483647) b on a.movieid = b.movieid
-> ;
+----+-------------+------------+-------+--------------------+--------------------+---------+---------
| id | type | possible_keys | key | key_len | ref | rows | Extra |
+----+-------------+------------+-------+--------------------+--------------------+---------+---------
| 1 | NULL | NULL | NULL | NULL | 1682 | |
| 1 | ref | ix_ratings_movieid | ix_ratings_movieid | 5 | b.movieid | 28 | Using where |
| 2 | index | NULL | ix_ratings_movieid | 5 | NULL | 100165 | Using index |
+----+-------------+------------+-------+--------------------+--------------------+---------+--------
3 rows in set (0.11 sec)



Have Fun

Friday, September 29, 2006

Infrequent blog posting

Seems a while since I have posted here and recently I haven't posted anything here about databases. But I continue to write and post blog articles. I have been writing some articles for my new employer over at Pythian

I went through a period where I haven't had a lot of time to keep up-to-date with what has been going on and some of the sources which I had been using as a quick summary have descended badly, ahh you get what people vote for.

Prepare for a brain dump...

It is interesting to see the change in content, and also what it takes to get an article promoted to the first couple of pages, very interesting how quickly people form blocs or cliques which provide almost guaranteed high rank.

In a way purchase decisions are like voting, however unlike a slight peer pressure (hard to define is it being cool, being a "friend") your personal money is on the line. There has been plenty of work done on purchasing decisions and money management where your money is on the line as opposed to other people's money.

A long line of economists from Adam Smith to Hayek use of the idea that it is impossible to determine how people act or what priorities or self-interest plays in buying and selling, hence the impossibility of trying to set a price, I agree even agent based modeling whilst better does not capture the essence of each purchase or sale decision. Stop however before we proceed to take that idea use the relativist idea that everything is subjective. What they were saying is that the price contains many many decisions reflecting the opinions of many people.
If at a point in time you can start to model and predict purchases decisions of a majority of people, or sufficient number for a profitable outcome you have gained an edge.

I wonder if this is still true where the decision has no effect on the hip-pocket (free voting) or hidden, for example taxpayer subsidies or taxpayer funded services. There is plenty of real evidence that you get the lowest common denominator or excessive freeriders (tragedy of the commons).

Would a subscription model enable news aggregator websites a cleaner picture, would the voting be just limited to the subscribers (who are paying) and therefore fit a certain demography model? What about the places like Yahoo who use the concept of futures or shares for different areas to get a feel for the buzz.

Can we use data from purchasing decisions ala Amazon to track what is hot, most books whilst not too expensive provide at least one insight what is happening. I am not sure about you but Amazon is still a personal thing. So most of my purchases from Amazon are my purchases not a family or friend.

What other sources exist?
do new academic papers provide a clue to what could be hot or what has been hot? The purchase decision isn't so much monetary, more fundamental, normally a large investment in time by the writer(s) and also the reviewers.
Unfortunately those decisions are guided like any media by editors.

So what other sources exist?

Have Fun

Saturday, September 02, 2006

Little gem found whilst websurfing

Do you still websurf?

Or is it all google and RSS and aggregators?

Is what you read presented to you as the result of a popularity contest at digg, reddit, or even google (page rank is a popularity contest)?
I am beginning to understand, partially as some of those sources signal to noise ratio decreases and partially from watching how articles become popular and how there is a huge demand for Search Engine Optimization (SEO).

I was bouncing from one article to another, seeing something interesting and opening a firefox tab to read or bookmark for later digestion and came across an interesting article or transcript of Bruce Sterling about pervasive computing. A gem!

I have listened to a couple of Bruce Sterling's presentations via podcasts from ITconversations
and old Bruce has a way with words. I rate his presentations highly and feel they are actually just as effective, just hearing the voice as I guess seeing the presentation.

I am in mind expansion mode at the moment, and it is time to learn more again.

Ever see people use the word "Grok" before? If you don't the background to the word check out Robert Heinlein "Stranger in a strange land". I am not Grok-ing stuff at the moment, I am purely sucking in new concepts before I can begin to understand.

Have Fun

Monday, August 14, 2006

Great complex systems thinking diagram

Go check out this website called Idiagram: The Art of Insight and Action specifically this page which not only shows a great diagram of complex systems thinking but also the impressive way to blend java-script with graphics to present a freeform user driven diagram.

I found this great website via the Anecdote blog entry

Have Fun

Paul

Thursday, August 03, 2006

Design or Evolve?

Here is an interesting couple of articles from Steve Jurvetson about weighting up the benefits of designing your algorithms or letting them evolve. Read the comments as well as they are pretty good.

  1. The Dichotomy of Design and Evolution

  2. Recapitulation in Nested Evolutionary Dynamics
More work is required in using Bayesian reasoning on evolved algorithms, essentially using the power of Bayesian to determine the set of information which you don't know and continuing working on whittling that set down.
Directed evolution would potentially help as well, dependent on understanding to the point of exhausion what your assumptions are in setting the direction or guidelines or rules (framework)

Have Fun

Paul

Howto: copy of a production mysql database using mysqlhotcopy backup

Do you need to see what the data was like yesterday, last week in your tables?

Here are the steps to create a copy of a mysql database on the same machine, with no downtime.

Assumption: You have made backups using mysqlhotcopy and are using myisam tables.
Caveat: Make sure you copy the backup into the new directory, otherwise you will nuke the current database. You have been warned.
  1. Create a new directory in your mysql data directory (/var/lib/mysql): mkdir new_db
  2. Copy the backup you want into that new directory: cp /backup_dir/current_db/* new_db/
  3. Log into mysql and create the new database: CREATE DATABASE IF NOT EXISTS new_db;
  4. If any tables have been created since the backup you will have to create them in new_db, using this command: CREATE TABLE IF NOT EXIST new_db.table_name like current_db.table_name;
  5. Check/Validate all the tables in new_db are ok using command: CHECK TABLE new_db.table_name;
You can use this method to create a copy to check for changes between now and the last backup, this might be for recovery reasons or audit reasons.

The reason I created this post was that I spent at least 30 minutes looking using google for a solution to this and whilst there were plenty articles about backups, nothing mentioned making a copy locally, on the same system.

Have Fun

Paul

Wednesday, August 02, 2006

How inflation happens or What inflation is

So what is inflation, how does inflation happen?

It is not increasing prices, that is the effect of the cause.

The cause is an increase in the amount of money in the system.

If you think of prices as ratios eg. a kilo bag of potatoes cost a certain number of dollars.

Prices are signals, signals of demand and supply.

Recently Australian bananas have been really expensive due to a couple of reasons
  1. Cyclone destroyed a large growing region in North Queensland.
  2. Imports are restricted due to quarantine requirements.
Hence the supply was restricted by demand remained the same, so what happens to the price? the ratio? it increases, so it took more dollars to buy the limited bananas.

OK, so what is inflation, and what does banana prices have to do with the increase in prices of a whole swag of goods and services. Prices are still ratios so something has happened to either the supply of goods and services, or the supply of money on the other hand.

There is a increase in the money in the system (monetary inflation) which is causes the same or fewer goods to increase in price.

Where is this money coming from? it is happening because there is a demand for money which the Reserve Bank of Australia (RBA) is supplying to maintain its monetary policy (the interest rate).

Australia doesn't live in a closed world, we are affected by other countries demand for our goods as well. Hence when China, Japan, South Korea, Europe and the US want our commodities they will bid the price up, sending the signal that supply is restricted. This leads to a long chain of price signaling all the way through the supply chain.

Why is there such as demand for money at the moment. The RBA has had to increase the interest rate again to try and dampen the demand. See this statement.

This is rational behaviour from investors and everyone, if you believe that prices are going to be higher in the future then the goods and services are cheaper now then in the future...
The money you have now is going to be worth less in the future, so you exchange for something else.

If the price signal mechanism is constrained due to regulation, monopolies, or the supply is inelastic, eg. it takes 5-10 years to get a mine going, it takes 10 years to be a good doctor.
Then the rational thing to do is to demand as much money now to buy any goods and services which are available now or borrow to build to satisfy that demand. This is arbitration at its best.

I have said this in the past, the RBA is acting like an overbearing nanny! It believes it can calm down the arbitration process by making money more expensive. They are looking into the future and predicting some of the current demand is transient (caused by other arbitrators) and there will be a massive over-investment in supply.

Why is it rational to take as much money as you can get and buy goods or services?
Why is it rational to get rid of any money you get as quickly as possible?

Steps for being rational:
  1. I download this spreadsheet from the RBA
  2. I add some percentage changes over the last year for M1 (Currency and Currency in the bank) raw money.
  3. If I start at Jan 2006, I see that last year M1 increased by 6.95%!!
  4. Interest rates are 5.50%!!
  5. That is a negative interest rate! You leave the money in the bank and you will get nothing, worse you will have something which is potentially worth less at the end of the year.
More recently the month over month percentage have been crazy!
If you check in May 2006 year over year M1 increased by 8.98%!!

Have Fun

Paul

Friday, July 21, 2006

Gapminder visualization of demographic info

Hat-tip to Catallarchy for this awesome visualizer of demographic info.

The ability to move through time takes a snapshot in time and turns it into a good indication of behaviour.

Try a movie (playing the data) of children per women (fertility rate) against life expectancy.

or

Child Mortality against Fertility rate.

People are rational.

Have Fun

Paul

Trackback

Monday, July 17, 2006

RSS and the subscribe model

There has been a idea (a meme) building for a while around the use of RSS (or atom) as a possible way of sharing and propagating information both internally and externally in a business.

Adam Bosworth has a great podcast on this subject when he was talking about Google and his version of a distributed DNS-like database.

As I was reviewing by blog aggregator (sage) for firefox I found another link to a page dedicated about RSS in the enterprise.

RSS builds on the acceptance of XML as a basis for sharing information between companies, there are various formats out there for your standard B2B solution. With Oracle getting into the vertical integration game and almost every major software vendor pushing Service Oriented Architecture (SOA) I am sure it won't be long before RSS in some form appears as a solution for retrieving data from diverse data sources.

This is the database replication model where RSS tells you what has changed so you get everything incrementally which is faster. You also could only get what you want to receive, pulling that data you want, or even what you are allowed to see, which provides security and privacy.

The possibilities are big... what about a way where you take one or more feeds add some value and pass that new feed onto as a new RSS. This is what people call mashups or remixing.
For all the database people this is like a view on one or more tables, your VIEW of the data which is feed to you.

From a monitoring point of view, wouldn't it be nice to be able to pull the data you want and forget the rest.

The last interesting idea is based on a metaphor from this article if you pass around XML, you could pass S-expressions, pass around s-expressions and you can pass not only data but code. I had an idea or thought, after reading that article, that maybe the data in a database is really code after all, the tables being just functions (or relations in the original sense), the data model actually reflective the function of the business (the business model) in abstraction.

So you could in fact use RSS not only to share data or a view of data but also share code or functions.

Have Fun

Friday, July 07, 2006

SQL Server 2005: Creating V$ACTIVE_SESSION_HISTORY

Are you are DBA like me who works on both Oracle and SQL server (and MySQL)?

Do you miss Oracle performance views like v$session, v$sql and in Oracle 10G v$active_session_history?

I have created a series of tables and procedures which provide roughly the same functionality in SQL server 2005, basically mirroring the same view that is in Oracle 10G. I will backport the stuff to SQL server 2000 at some stage.

These scripts use the new dynamic management views (DMV) available to SQL Server 2005.
Specifically,
  1. sys.dm_exec_requests
  2. sys.dm_exec_sessions
I have tried to map the columns in v$active_session_history to columns from the DMVs, whilst keeping the overhead of selecting the data under control, which means limiting the number of joins. So unfortunately there is no rowwait data included in this iteration.


The steps to install this on your SQL Server 2005 instance
  1. Copy and paste these scripts to a sql file.
  2. Open up SQL Server Management Studio.
  3. Select the instance and database where you want to create the tables. I suggest MSDB as the database or a dba specific database if you already have one.
  4. Open the file as a query.
  5. Execute the query. This will create the view, table and two stored procedures.
  6. Create a job which executes the stored procedure usp_ins_sql2005_ASH every n secs. It would be prudent to start with 3-5 secs and see how much overhead that causes.
  7. Create a job which executes the stored procedure usp_purge_sql2005_ASH. This stored procedure takes a parameter for the number of hours to keep.

Note: Check the growth of your table to determine how much data you want to keep. As busy systems are likely to have many rows per sample.

E.g. Sampling every 2 secs will create 30 samples per minute, 1800 samples per hour. If you have on average 5 sessions running or waiting per sample that will be 9000 rows per hour.

I will provide some useful reports from the data you have collected in the next article.

The scripts are attached at the end of this article. Due to html parsing < as a tag you will have to check the purge script before running.


/* Recreating V$ACTIVE_SESSION_HISTORY in SQL Server 2005
Version: 0.1
Created: Paul Moen 2006
*/


drop view uv_active_session_history
go
create view uv_active_session_history
as
select
getdate() as sample_time,
req.session_id,
req.sql_handle as sql_id,
req.plan_handle as sql_plan_hashvalue,
req.database_id,
req.user_id,
req.command as sql_opcode,
req.status as session_state,
req.blocking_session_id as blocking_session,
req.wait_type as event,
req.wait_time,
sess.program_name as program,
sess.client_interface_name as module
from sys.dm_exec_requests req join sys.dm_exec_sessions sess
on req.session_id = sess.session_id
where req.user_id <> 1
go

drop table active_session_history
go
select top 0 * into active_session_history
from uv_active_session_history
go

drop procedure usp_ins_sql2005_ASH
go
create procedure usp_ins_sql2005_ASH
as
insert into active_session_history
select * from uv_active_session_history
go

drop procedure usp_purge_sql2005_ASH
go

create procedure usp_purge_sql2005_ASH
@hour int
as
delete from active_session_history
where sample_time < dateadd(hh,-1*@hour,getdate())
go

Thursday, June 29, 2006

SQLserver 2005 query efficiency sql scripts

Until recently if you wanted to determine what queries or stored procedures were the least efficient you would have to capture all SQL either via SQL Profiler as described in these early posts on automating SQL Profiler and tuning using SQL Profiler or have a scheduled job which dumped the sql running from sysprocesses.

Now with the new dynamic performance views available in SQLserver 2005 you can run queries which allow you to determine these poor performers without much extra work.

As I noted in my last entry, the SQLserver 2005 team have a series of blogs and this inspired me to read and port my existing Oracle scripts which determine query efficiency to SQLserver 2005.
One of my favourite Oracle scripts uses the number of logical reads per execution as a good sign of poorly performing sql. Logical reads per execution is also a reasonable estimation of CPU per execution.

Normally on a poorly performing system I tend to follow these steps, in this case rewritten to use SQLserver 2005 new dynamic views.
  1. Quickly check taskmanager or perfmon to verify that the CPU or IO hog is in fact SQLserver and not IIS or SQL fulltext indexing services (or something else).
  2. Check for contention, is there one process blocking all others.
  3. Run the script to find sql with the highest elapsed time per execution.
  4. Run the script to find sql with the highest physical reads (PIO) per execution.
  5. Run the script to find sql with the highest logical reads (LIO) per execution.
So here are the efficiency scripts:

rem SQL Efficiency by Elapsed Time. Paul Moen 2006

select qs.creation_time
, qs.execution_count "Exec"
, qs.total_elapsed_time "Elapsed"
, total_physical_reads "PIO"
, total_logical_reads "LIO"
, round(qs.total_elapsed_time/qs.execution_count,1) "Time/Exec"
, round(qs.total_physical_reads/qs.execution_count,1) "PIO/Exec"
, round(qs.total_logical_reads/qs.execution_count,1) "LIO/Exec"
, st.text
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
where qs.execution_count > 0
order by "Time/Exec" desc

rem SQL Efficiency by Physical Reads per execution. Paul Moen 2006

select qs.creation_time
, qs.execution_count "Exec"
, qs.total_elapsed_time "Elapsed"
, total_physical_reads "PIO"
, total_logical_reads "LIO"
, round(qs.total_elapsed_time/qs.execution_count,1) "Time/Exec"
, round(qs.total_physical_reads/qs.execution_count,1) "PIO/Exec"
, round(qs.total_logical_reads/qs.execution_count,1) "LIO/Exec"
, st.text
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
where qs.execution_count > 0
order by "PIO/Exec" desc


rem SQL Efficiency by Logical Reads per execution
which is good estimate for CPU/execution. Paul Moen 2006

select qs.creation_time
, qs.execution_count "Exec"
, qs.total_elapsed_time "Elapsed"
, total_physical_reads "PIO"
, total_logical_reads "LIO"
, round(qs.total_elapsed_time/qs.execution_count,1) "Time/Exec"
, round(qs.total_physical_reads/qs.execution_count,1) "PIO/Exec"
, round(qs.total_logical_reads/qs.execution_count,1) "LIO/Exec"
, st.text
from sys.dm_exec_query_stats as qs
cross apply sys.dm_exec_sql_text(qs.sql_handle) as st
where qs.execution_count > 0
order by "LIO/Exec" desc

Sunday, June 25, 2006

SQLserver 2005 team blogs

I have been scanning some blogs by different teams involved with SQLserver 2005.
Here is a short list
  1. SQLserver 2005 Database engine tips
  2. SQLserver 2005 Query Optimizer
  3. SQLserver Query optimization - talking about SQL/TSQL optimization
  4. SQLserver storage Engine
There is a bit of cross linking happening so some articles are repeated on one or more blogs, nothing like helping the SEO along.

This article from the database engine blog about finding the top N worse SQL brought a (bemused) smile to my face. Finally SQLserver has views like Oracle, no more do you need to run SQL profiler 24x7 or some script which captures the SQL running every n secs to have an historical record of what has been running.

Guess this means I can start porting my Oracle scripts from using v$sql, v$sqlarea and in Oracle 10G R2 v$active_session_history to using sys.dm_exec_query_stats, sys.dm_exec_sql_text and sys.dm_exec_query_plan.

The trouble with plenty of relational databases has been that lack of exposure of the metadata/catalog of the database and the data within that catalog. Until recently plenty of the internal stuff in SQLserver had to be queried using various DBCC calls. Similarly, this is the same stuff that MySQL versions prior to 5 have with SHOW TABLE STATUS and SHOW FULL PROCESSLIST etc.

There is no nice way to see what the spread of data within a column is. It is good that these vendors are exposing this to the outside world via SQL rather than a tool with requires the output to be further parsed.

Have Fun

Friday, June 16, 2006

Interesting web articles

I am getting back into the groove, catching up on all those interesting blog entries and web articles which I missed in Canada, not because I didn't have internet access, just my whole RSS feed reader wasn't there, neither were my bookmarks/favourites. My setup is the result of many hours of filtering which I don't need to spend time remembering (unless I need to).

I am working from home in the interim, which has been great, depending on what I am doing I enjoy working or studying to some music. As I have mentioned in the past pandora provides a great resource of music. I tend to listen to a genre called vocal trance which is basically dance music, other times it might be classical, jazz or even metal in various forms. I like my music to be packed with plenty of action and depth.

Here are couple of articles worth reading:
  1. Mosh Pit Learning
  2. Python Sudoku Solver
I find the area on constraint programming interesting, there is a prolog Sudoku solver as well which uses a similar method using prolog constraints to solve the puzzles. The idea that the program moves forward and reduces the problem set might be useful in tuning situations where the need to eliminate non-issues (sometimes in a hurry) is important.

Lots of developers have issues with using constraints in databases, mainly relating to error handling. That is kinda weird as the database model is really an expression of a business or application model. i.e. a sale must have a product and a customer.
Foreign key, unique and check constraints allow the database to police those business rules (or constraints) so that the data remains accurate, without depending on any application.

In some sense the data model provides insight into the business. What problem (for the customer) is business trying to solve...

Have Fun

Sunday, June 11, 2006

Back to Sydney

My three weeks in Canada is almost over.

I am killing a couple of hours in Vancouver before heading back to Sydney.

If you get a chance go and have a read of the Pythian blog. Like I said before these blokes and gals are smart. Christo is a damn fine Starcraft player as well. I thought I was reasonable, but hey 6 minutes and its over is really really good.

I am going to being spending time getting the Australian office happening and then Pythian goes 24x5!!

My time is limited again.

Craig Sham. Orapref has done some interesting work on performance forecasting. This I believe is the next tuning wave, after ratios,waits and response time.
Hotsos is doing some fine stuff with queuing theory as well.

Enough for now. I will be back on Aussie time in about 20 hours

Have Fun

Sunday, June 04, 2006

Kaplan-Meier analysis

The blokes at Pythian are damn smart.
Paul Vallee mentioned the use of a statistical analysis of survival. Specifically Kaplan-Meier survival analysis.
After Paul explained how he was using it, I thought of other areas where you could use it. It comes down to how or what you class as being survival and death.

  1. Database performance. At what point does the performance hang (or die) and what is the mean time of survival. Survival could be a reasonable response time for a known query.
  2. Stock survival. The ASX whilst having large blue chip stocks continues a lot of speculative mining and biotech startups. It would be interesting to run some survival analysis over them.
This tool doesn't tell you what is required to survive and that would be heavily influenced by choice of the survival indications.

Anyway I have to go. Going to play computer games all afternoon. Should be good.

Have Fun

Sunday, May 28, 2006

First week in Canada or cleaning internet cafe computers

Finally I have a chance to use a computer without the threat of spyware,adware and rootkits. I thought I could use a internet cafe a couple of blocks from where I am staying, but the last visit required 40 minutes of downloading spybot, ad-aware, firewalls and sysinternals rootkit revealer just to make the machine ok to use. It was affected with spyware and adware badly.

Anyway I chose to come into the Pythian office and use the computer I have been given for the duration of my stay as I know it is secure.
So it is Saturday morning Canadian (EDT) time just before lunchtime.

The first week has been busy, the learning curve steep, but I have jumped into handling the daily processes and alerts as a way to get up to speed as quickly as possible. Doing actual work as opposed to reading documentation and having meetings (whilst important) is a much faster way to "get a grip" on the new work environment, processes, clients etc.

The system that Pythian have for handling the large number of clients and larger number of databases, client -> databases is a one to many relationship :) is very very good.
Like any successful company, systems and processes are one of the keys to allowing the company to expand and also provide a method of storing or maintaining what is called organizational learning or knowledge. This way if someone leaves the company, the knowledge doesn't leave as well.

I won't post much in way of the new direction and database related posts until I get back to Australia.

More later, have fun.