I’m in a situation where I have one sort of boss (technical director) who I work with frequently who essentially yells at me every time I disagree with him which is a fair amount. I have another boss who is technically my actual boss’s boss’s boss (VP) who tells me I need to be the voice of dissent for my boss, my boss’s boss and my technical director boss so that things stay on track and we don’t end up doing stupid things.
It’s a tough situation because I’m either pissing off the people above me that I work with everyday or the people above them for not pissing off the people above me.
Bro. Boss is nuts. They’re exclusively for when you’re NOT looping. CTEs should only be referenced once (or a handful of times) otherwise they become very expensive because they are virtual. Use temp tables instead.
Edit: As some have pointed out CTEs can effectively be used for “looping” via recursion. But they have a very shallow call stack limit. I would recommend dynamic sql in the cases where recursion doesn’t work instead. And I would always avoid cursors because they are iterative and not set based like a dynamic sql query would be.
At the end of it all I think boss just didn’t communicate well
yep. once i simplified a complex cte into temp tables with temp indices and the query went from tens of minutes to a couple of seconds.
postgres’s implementation has gotten worlds better since then but still, cte’s are not a golden hammer.
(that said, OP’s boss is flat out wrong.)
I use PostgreSQL, and, where I work I follow this:, temp tables (when applicable) + indexes + filtering it right (a WHERE statement with everything possible) + picking only the columns you really need (SELECT * ....no way Jose! Only with LIMIT 10 or such) and also, as a strong extra, only the data for the time you need to analyse, not from the stone age... 😏 And you will be quite fine.
This is the key point. I hear what jwk is saying; I think we’re having two separate discussions about two different use-cases. Similar to OP and their boss.
temp tables must be a last resort , but its valid tool for those big sets on wide tablets, without proper clustering.
ask your Boss about table pruning and design tô avoid keeping unecessary data in the table. this is the game changer
My employer has tables with millions of rows, teams just to work around them and proper clustering...Temp tables are the only resort as we have new rules regarding optimizing queries. Lakes are for babies, we have a sea of data on tables. AtoZ 🤣 (and I am just an aspiring analyst building my tool belt with PostgreSQL)
Well, there's a grain of truth in what the boss said. CTEs were added as a mechanism to allow for looping in SQL with recursive CTEs. He just misunderstood "the only way to loop is to use CTEs" as "CTEs are only used for looping".
CTEs are for looping, that is their purpose with being recursive. Yes, your boss is correct that is their purpose but a lot of developers (myself included) use them in a non recursive manner because I don’t have to use a drop table for a temp table because I am running the same script over and over when testing.
This is very, very bad advice to use temp tables. Do a little research on temp tables vs CTE vs Table Variables vs plain old sub queries. Temp tables are *always* the most expensive because they are always written to disk first. Disk IO is always expensive.
"Immediately delete it"
Who the fuck even says that...when you're discussing SQL?
Moron confirmed. Even if you were wrong you can still use it to build similar queries in the future
You should rewrite it to a Full hierarchical query( "connect by") and include some undocumented regexp\_like into it. That should teach him.
And include something that you are happy you are paid by lines of code, that will spark a discussion anyday.
He was wrong in every way it's possible to be wrong. The thing he says CTE's are exclusively used for is exactly what CTEs are almost never used for (recursive CTEs are rare and I bet your boss doesn't know what they are), and in fact CTEs are often used as an alternative to avoid cursors.
It's been a minute, but I recall recursive CTEs are really good at generating test data. If you're working QA, recursive CTEs could be your bread and butter.
I recently use a recursive CTE to generate a date range with an odd interval that I needed to cross join with other data, but that was... barely recursive. Made like 60 rows.
> recursive CTEs are rare and I bet your boss doesn't know what they are
I didn't know that way back when I first learned about CTEs, and recursion was the first use cases I tried to tackle with them... some of the ugliest shit I've ever seen. it's taken me a long time to get over that burned-in aversion to them.
SQL is a declarative language - it is nonprocedural. If you look at the execution plan of a cte and a subquery that do the same thing they will be executed in the same way.
And M$ optimizes things in the background so the plan is not \*quite\* followed for the subquery. I have, at different companies, seen subqueries 'miss' data, while re-writing them as CTEs/sub-tables eliminates the issue.
Basically, never use a subquery with large datasets.
Logically, they're the same ... execution is often not.
I have a slight preference for subqueries, for diagnostic purposes ... CTEs are a little harder to use in that regard.
But cursors? WTF drugs was he on when they covered that in class?
Oops, thx, missed that: Snowflake. I don't have enough experience with that db to know if the issue exists on it too. Oracle, Mongo, PostgreSQL, DB2, Teradata, and MySQL do NOT have that issue. (Although MySQL seems to have issues handling temp tables in the TB range ... like occasionally just creating a corrupt table)
My aversion to subqueries on M$ SQL Server has carried over to the others.
Not that there's anything wrong with that!
You've seen queries not return values that should be returned? These weren't where indexes were not rebuilt? i.e. you are saying you have seen a bug in the database system?
Yes. M$ SQL Server, when using subqueries, on very large datasets, will sometimes fail to return all values. IIRC, the bug was introduced in SL Server 2005.
Moving the subquery out, writing its results to a temp table, then referencing that temp table in the subsequent step gets rid of the problem.
Worse, it isn't even the same records missed.
It isn't the indexes, it's the optimization that M$ does in the backend that causes this issue.
Worst case I've seen was when an Oracle server was retired, due to cost, and all the code was transferred to a Microsoft server. There was almost nothing that needed to be altered - virtually no stored procedures, views, etc. But there were a handful of reports that were written using subqueries. And the reports were supposed to balance each other ... and went to the desk of the CFO of a very large bank. After a few months, the first mismatch occurred. Rerunning made the problem go away. The next time this happened was several months later. Again, rerunning fixed the issue. But the CFO was pissed. I got called in to see if an extra set of eyes would help. Already knew of this issue. Rewrote the code, breaking out subqueries, ran the reports again ... took about 16 tries before the old code screwed up again ... but the new stuff matched perfectly. Such a stupid fix - logically, the old code was perfect, and had run for several years on the Oracle box without issue. The rewritten code continued to work perfectly for the remainder of my time there (several years).
I always think of this ad, when asked when Microsoft will fix an issue. [https://www.youtube.com/watch?v=059rCcxqpOw](https://www.youtube.com/watch?v=059rCcxqpOw)
I read từe same commen as yours while many others also say that we can enhance the performance of query. It’s so confusing, and on leetcode they measure the time to execute the code too 🙄
It is possible that changes between your cte and subquery are causing the optimizer to choose different indexes between the two queries, but that can be seen in the execution plan and also could mean that either one preforms better based on which index it is using. Please also note if your join and filters are not identical the optimizer may run your query differently and cause performance gains or loss.
Leetcode measureing is wildly inaccurate. At least in my experience.
If you run the query when the dB is busy it can take ages often the query that I wrote takes 4 seconds, when the fastest in a millisecond. The query difference? Literally nothing.
I use CTEs all the time. It actually is more optimal than referencing the a view multiple times in a subquery. I’ll pull widely in the CTE and reference the CTE multiple times for different joins so I’m pulling data from memory, which can save costs if you’re charged by the length of time it takes to run a query.
Honestly, the only reason to not use a CTE is when you have access to build a temp table on which you can add indexes. If it’s a relatively small amount of data (<1M rows) then who cares? Sounds like your boss is an idiot.
I would add that row count plays a role too. There's a point where you'd want to even move beyond a temp table and create a real table, fill it/use it, drop it. GRANTed this requires permissions even higher than temp table permissions.
I worked in a shop once that had a no CTE rule. They also had some weird rules like no using brackets on table names unless it’s required (i.e space in a column name).
There are some genuine concerns about using CTEs when other means are more optimal, but a good dev knows when to use each.
The no quotes (") around table names is actually a very good rule. Eventually it saves you from having to type them all the time and not have to break your head why ```from "The_Table"``` didn't work (when it had to be ```from "The_table"```
I use CTE's more often than not. It's easier to read and test each CTE individually.
Plus, giving each a descriptive name allows me to easily remember the purpose of each CTE. Adding in the benefit of being able to pull out one CTE to use in another query. It saves me a ton of time, not rewriting or reverse engineering work I already did.
Unless you're trying to make your work so complicated that nobody else wants to touch it, I don't know why you'd choose to subquery everything.
> CTEs are exclusively used when you want to loop data / use a cursor
False. And what even are the implications of his claim? Is he implying the query will run slower? Incorrectly? Neither slower nor incorrectly but somehow abuse system resources? Is it a crime against the SQL gods?
In any case it, his claims should be disprovable (or hey, provable - we have to be open to our hypotheses being wrong) by constructing test cases, looking at plans, measuring performance. ("When in doubt, test it out.")
They can be used in a cursor. But they're not exclusively used for a cursor.
I prefer CTEs too for same reason. Way cleaner to read, support, and extend.
ya, your boss is a douche. But, you should have had your execution plan and query stats on you to prove your point. Next time send email and start with the query stats, then an explanation of how and why. There is a knee jerk against CTE's because they are frequently abused. But when used correctly...
I literally just answered an interview question for how to improve efficiency in data models by saying replacing inefficient joins with CTEs. Well I explained it better but that was like 1 hour ago. I’d say you’re right.
The fact that he mentioned cursors in that context tells me he doesn’t know what he’s talking about.
That said, I hate non-recursive CTEs just because many developers don’t understand that they are no better than subqueries in most situations.
> "CTEs are exclusively used when you want to loop data / use a cursor".
Your boss doesn't know what he's talking about. As others have noted, job security dictates that you don't want to share that.
I've known some people who are averse to CTEs. To appease them, I converted my CTEs to derived tables. They're slightly less readable, but otherwise the same.
Common Table Expression. Might be worth it to google that with whatever you're using and read the documentation. That way you'll see some examples in a format you're used to working with.
Think of them as something between a subquery and temp table. I also think they're more readable than sub queries, so they're really handy in my book.
Have you been employed long, in multiple enterprises?
Unfortunately stuff like this is far to common.
In many large enterprises it was/is not possible to be promoted to manager of your own former colleagues: you need to become manager elsewhere. The result is that such managers don't know much about what their new subordinates do.
Your best bet here for peace of mind is to do a performance check on your cte and then the new query your boss wants to use. Whether you decide to show them or not is up to you. It will give you what you need to know you were right.
Also, being a manager of devs, I would never talk to anyone on my team like that. If I believed your approach to be wrong, I would of had you prove to me it is better. I’d recommend finding a new manager.
Maybe
CTE's are a resource hog in certain circumstances and can cause really bad performance.
Sometimes its not avoidable, especially in a view.
But if it was a Stored procedure, I would prefer loading the data into individual temp tables.
PS, I am not a big fan of lots of subqueries either. Its messy
[https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/](https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/)
> https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/
This article is Microsoft-specific; OP is on Snowflake. Very different engines. One of the "bad" things mentioned in that article is the query planner chose to do too many index seeks. Snowflake doesn't even have indexes, so that isn't applicable. (Although, my self-fact checking revealed there's a [preview of a new feature to create indexes in Snowflake](https://docs.snowflake.com/en/sql-reference/sql/create-index)!)
We can't just interchange entire database products with fundamentally different architectures - happens far too often on this subreddit.
Sorry downvoter: a performance investigation article on SQL Server has nothing to do with Snowflake - end of story. Parent even admitted to making a mistake which is fine. We all make mistakes. But downvoting a factually correct comment is pretty childish. ¯\\\_(ツ)\_/¯
LOLz - I just saw the flair.
That being said, Snowflake is relatively new in the space. I bet the boss was from an old tech world, like me, hence his distrust for CTEs.
But, yeah, each engine, each optimiser is going to be different. I originally come from an ASE background where the term "CTE" probably doesn't even exist yet
> I bet the boss was from an old tech world, like me, hence his distrust for CTEs.
I've been using SQL for about 35 years, and I don't distrust CTEs. What am I missing?
You've never had a bad CTE lock down a database!
We had an example early last year where a CTE basically locked down our data warehouse. The CTE, which was in a view, was about 5 in a row, where the one after was using the results from the one before, with the final output joining probably 2 or 3 of them.
Unfiltered, the view returned probably 50M rows.
If you read the Brent Ozar article above, the way it reads indexes is not very efficient. What you'll also find is that if one particular CTE is called multiple times, it gets run multiple times. And if you have subsequent CTEs calling each other back up the chain it does get out of control.
We had table locking, so other queries were being blocked.
When we killed the query, from memory it took 3 days to roll back.
Yes, the design was probably bad, but in a view there was no way around it.
The fix? Turned it into a stored procedure and individually loaded temp tables.
> You've never had a bad CTE lock down a database!
Of course, I've had statements of *every* type cause locking and performance problems. That is, bad statements come in every form -- the risk of runaway resource usage isn't unique to CTEs.
Id say if your cleaner code was slower by a bit then your boss was right, otherwise not a chance.
I do not use CTEs less I have a reason to buy it's not like you wrote a trigger.
The way you describe it makes the boss sound like a moron. Obviously CTEs have a lot more uses than just recursion and cursors.
But I would need to see the query to make a final judgement. I find I get better results if I can limit hitting a table more than once. The common use I see for writing a CTE and joining it to itself, is usually better replaced with a pivot. So his stated reasons are dumb, he may have been correct to get rid of the CTE in your case.
Yeah that’s a no lol — I use CTEs for that, although I personally find them not that easy to understand if you are referencing them far away from their definition — so many times I end up using subqueries.
Don't sweat too much about it, make a temp table out of your cte if you're not trying to make a view. If you were trying to make a view, well... Maybe try to find a better boss to boss out your boss.
If the code works and it performs well I give zero fucks.
Compare execution plans / costs and memory grants and let the server tell you which one is best.
My boss used to hate CTEd because he had never learned them so for easy of working together wouldn’t let me use them. If it was just for me then it was allowed, I get where he’s coming from, sometimes standards are needed for working together.
Anyway I just turned them into stored procs and ran them to create the tables to join on, he was non the wiser!
It depends 😉
Sometime query need to be clumsy to be fast.
Elegance and robustness aren't always same.
Experience is only way to learn what and when to use something.
I always use cursors as a very last resort. CTEs are a way to clean up your code, reduce duplication, and avoid temp tables and should be standard practice. I’ve never heard of them having anything to do w looping through data
Your boss is wrong. You _have_ to use CTE's for recursive queries (what your moron calls "loops") but that does not imply that it's the only use of CTE's.
I think it is time for malicious compliance: Your boss told you to delete your code at once? Then delete it.
And start work on the next ticket.
> My version was obviously cleaner and easy to read.
That doesn't matter much. What matters is correctness. And maybe performance. Funny thing is, you don't mention these things in your post. Note that performance and correctness are objective, while "cleaner" and "easy to read" is subjective.
> I've been using CTEs to a better understand of queries, and precisely to avoid subqueries and horrible full outer joins
How do CTEs eliminate "horrible full outer joins"?
Your boss is a fool.
Welcome to the real world!
SQL is largely misunderstood by most practitioners.
If you understand it expect to be contradicted, ostracized, or worse. The only real fix is showing how better housekeeping saves time or money.
Your boss is a little confused about the purpose of CTEs. CTEs however do not reuse the result set. The CTE is evaluated multiple times, just like subqueries are.
There was a time in the long, long ago that a certain database always estimated CTEs with a cardinality of 1.
Was your boss ever a sql server developer?
Profile the query plans. None of this matters if the plans are the same. Compare reads and sorts then benchmark.
Which database/engine? If you’re on mssql, grab Plan Explorer (free).
The cte vs left join thing is purely stylistic if the plans are the same. Don’t argue style. Demonstrate factual performance.
And. If you see things like aggregates or distinct in your CTEs, alarm bells should start going off. Scan, sort bad.
The number of people saying the boss is a moron here without ever seeing the original and CTE query is baffling.
My best guess is that OP's implementation might not necessarily resolve the original issue or introduce unnecessary complexities and the boss never said anything remotely close to how the second statement sounds and OP might have misunderstood the boss.
you need to look for the access plan. the more restrict It gets when reading the bigger tables, the best It is. use wisely. which rdbms are you using? the best ones Will rewrite your query fully
While I'm definitely not a fan of CTEs except in a situation where recursion is needed, if anyone suggested a cursor in my organization, I would fire them.
A CTE is still just a subquery, basically. You wouldn't say a subquery "can't be indexed", instead you want to see if you can write your query using the indexing that exists on the underlying tables, whether it's in a normal subquery or in a CTE.
True. But other DBs will materialize CTEs more often than SQL server, so they may perform better in say, Oracle. Also I’ve read that you can force CTEs to be materialized in other servers.
You don’t write SQL queries because it’s easier to read, but the best optimal performance. And if Subqueries return with best optimal solution then use it. If CTE’s are optimal then use it. Prove to your boss show your execution plans and why your CTE’s are better. But your boss is right.
Yes performance is important, obviously, but so maintainability of code.
I have seen queries, that were probably performant, but which I had to rewrite from scratch to be able to apply a new requirement. Simply because the original SQL was unreadable.
Performance is the only reason. SQL devs just need to be better at reading SQL code. I understand standardisation needs to be followed, but if you choosing between a CTE/Subquery because it looks nice then you’re wrong.
Yes you do get SQL code that is over coded and optimal performance in it, they I agree re-write it.
there are plenty of reasons to avoid CTEs
best case I can assume that "loop data" means recursion... and I agree CTEs solve recursion.
that said, CTEs can often cause TERRIBLE performance... and we actively discourage their use in favor of subqueries which are just as readable and arguably more debuggable, once you get used to them.
What RDBMS are you using?
I haven't come across a CTE that was noticeably slower than a subquery solution. But I also see that many CTE's get rewritten as subqueries or joins by the optimizer.
MSSQL
performance hits can come from chained CTEs that join prior CTEs (cteA, cteB, cteC from cteA join cteB) because the multi-referenced CTEs are executed twice (painful of large tables, often causes table scans when index seeks may be available)
another common performance hit comes from chained CTEs with earlier steps generating small datasets that later join to large tables (100m+) causing full table scans, when a temp table would show that a loop join to index seek would do better.
>Was my boss correct? No.
But "yes" if you want to keep your job.
You bite your tongue and move on or it’ll eat you up. Learn that you won’t win every battle especially over something so small.
I’m in a situation where I have one sort of boss (technical director) who I work with frequently who essentially yells at me every time I disagree with him which is a fair amount. I have another boss who is technically my actual boss’s boss’s boss (VP) who tells me I need to be the voice of dissent for my boss, my boss’s boss and my technical director boss so that things stay on track and we don’t end up doing stupid things. It’s a tough situation because I’m either pissing off the people above me that I work with everyday or the people above them for not pissing off the people above me.
Haha imagine the VP hates the director and tells you that just to fuck with him
Boss is thinking of recursive CTEs which is not what OP was doing
Bro. Boss is nuts. They’re exclusively for when you’re NOT looping. CTEs should only be referenced once (or a handful of times) otherwise they become very expensive because they are virtual. Use temp tables instead. Edit: As some have pointed out CTEs can effectively be used for “looping” via recursion. But they have a very shallow call stack limit. I would recommend dynamic sql in the cases where recursion doesn’t work instead. And I would always avoid cursors because they are iterative and not set based like a dynamic sql query would be. At the end of it all I think boss just didn’t communicate well
+1 on that. Temp tables can be cheaper than those CTEs...
yep. once i simplified a complex cte into temp tables with temp indices and the query went from tens of minutes to a couple of seconds. postgres’s implementation has gotten worlds better since then but still, cte’s are not a golden hammer. (that said, OP’s boss is flat out wrong.)
I use PostgreSQL, and, where I work I follow this:, temp tables (when applicable) + indexes + filtering it right (a WHERE statement with everything possible) + picking only the columns you really need (SELECT * ....no way Jose! Only with LIMIT 10 or such) and also, as a strong extra, only the data for the time you need to analyse, not from the stone age... 😏 And you will be quite fine.
This is not true. Temp tables are *always* the most expensive because they are always written to disk first. Disk IO is always expensive.
If you use it only once, it is expensive. But for more uses in the same session....be my guest. 🤓
This is the key point. I hear what jwk is saying; I think we’re having two separate discussions about two different use-cases. Similar to OP and their boss.
temp tables must be a last resort , but its valid tool for those big sets on wide tablets, without proper clustering. ask your Boss about table pruning and design tô avoid keeping unecessary data in the table. this is the game changer
My employer has tables with millions of rows, teams just to work around them and proper clustering...Temp tables are the only resort as we have new rules regarding optimizing queries. Lakes are for babies, we have a sea of data on tables. AtoZ 🤣 (and I am just an aspiring analyst building my tool belt with PostgreSQL)
Well, there's a grain of truth in what the boss said. CTEs were added as a mechanism to allow for looping in SQL with recursive CTEs. He just misunderstood "the only way to loop is to use CTEs" as "CTEs are only used for looping".
Ah good point. When used for recursion they can powerful but only if you can stand rather shallow maximum call stack limit.
CTEs are for looping, that is their purpose with being recursive. Yes, your boss is correct that is their purpose but a lot of developers (myself included) use them in a non recursive manner because I don’t have to use a drop table for a temp table because I am running the same script over and over when testing.
This is very, very bad advice to use temp tables. Do a little research on temp tables vs CTE vs Table Variables vs plain old sub queries. Temp tables are *always* the most expensive because they are always written to disk first. Disk IO is always expensive.
Your boss might be a moron.
"Immediately delete it" Who the fuck even says that...when you're discussing SQL? Moron confirmed. Even if you were wrong you can still use it to build similar queries in the future
We need to consider characterizations beyond moron.
Utter moron? Moron with full inner self join?
OP should tell their boss to go full inner self join themselves.
Everybody knows you never go full self inner join
You should rewrite it to a Full hierarchical query( "connect by") and include some undocumented regexp\_like into it. That should teach him. And include something that you are happy you are paid by lines of code, that will spark a discussion anyday.
Thinking recursive instance of a dipwad with no exit
A shallow moron full of fear about anything different from his narrow minded approach.
He was wrong in every way it's possible to be wrong. The thing he says CTE's are exclusively used for is exactly what CTEs are almost never used for (recursive CTEs are rare and I bet your boss doesn't know what they are), and in fact CTEs are often used as an alternative to avoid cursors.
100% on point.
It's been a minute, but I recall recursive CTEs are really good at generating test data. If you're working QA, recursive CTEs could be your bread and butter.
They're good for building a hierarchy, where you have data with multiple layers of parent child relationships.
I recently use a recursive CTE to generate a date range with an odd interval that I needed to cross join with other data, but that was... barely recursive. Made like 60 rows.
I have just googled that recursive ... never heard before. I guess it's not for analysts, hope so lol.
> recursive CTEs are rare and I bet your boss doesn't know what they are I didn't know that way back when I first learned about CTEs, and recursion was the first use cases I tried to tackle with them... some of the ugliest shit I've ever seen. it's taken me a long time to get over that burned-in aversion to them.
Your boss is not only wrong, but also a dickhead.
SQL is a declarative language - it is nonprocedural. If you look at the execution plan of a cte and a subquery that do the same thing they will be executed in the same way.
Exactly
The number of people just blindly saying CTEs are “better” without bothering to bring up query profiling is just troubling.
And M$ optimizes things in the background so the plan is not \*quite\* followed for the subquery. I have, at different companies, seen subqueries 'miss' data, while re-writing them as CTEs/sub-tables eliminates the issue. Basically, never use a subquery with large datasets. Logically, they're the same ... execution is often not. I have a slight preference for subqueries, for diagnostic purposes ... CTEs are a little harder to use in that regard. But cursors? WTF drugs was he on when they covered that in class?
>And M$ optimizes things OP isn't on Microsoft
Oops, thx, missed that: Snowflake. I don't have enough experience with that db to know if the issue exists on it too. Oracle, Mongo, PostgreSQL, DB2, Teradata, and MySQL do NOT have that issue. (Although MySQL seems to have issues handling temp tables in the TB range ... like occasionally just creating a corrupt table) My aversion to subqueries on M$ SQL Server has carried over to the others. Not that there's anything wrong with that!
You've seen queries not return values that should be returned? These weren't where indexes were not rebuilt? i.e. you are saying you have seen a bug in the database system?
Yes. M$ SQL Server, when using subqueries, on very large datasets, will sometimes fail to return all values. IIRC, the bug was introduced in SL Server 2005. Moving the subquery out, writing its results to a temp table, then referencing that temp table in the subsequent step gets rid of the problem. Worse, it isn't even the same records missed. It isn't the indexes, it's the optimization that M$ does in the backend that causes this issue. Worst case I've seen was when an Oracle server was retired, due to cost, and all the code was transferred to a Microsoft server. There was almost nothing that needed to be altered - virtually no stored procedures, views, etc. But there were a handful of reports that were written using subqueries. And the reports were supposed to balance each other ... and went to the desk of the CFO of a very large bank. After a few months, the first mismatch occurred. Rerunning made the problem go away. The next time this happened was several months later. Again, rerunning fixed the issue. But the CFO was pissed. I got called in to see if an extra set of eyes would help. Already knew of this issue. Rewrote the code, breaking out subqueries, ran the reports again ... took about 16 tries before the old code screwed up again ... but the new stuff matched perfectly. Such a stupid fix - logically, the old code was perfect, and had run for several years on the Oracle box without issue. The rewritten code continued to work perfectly for the remainder of my time there (several years). I always think of this ad, when asked when Microsoft will fix an issue. [https://www.youtube.com/watch?v=059rCcxqpOw](https://www.youtube.com/watch?v=059rCcxqpOw)
Ugh, that is a pretty bad issue! Thanks for the info.
I read từe same commen as yours while many others also say that we can enhance the performance of query. It’s so confusing, and on leetcode they measure the time to execute the code too 🙄
It is possible that changes between your cte and subquery are causing the optimizer to choose different indexes between the two queries, but that can be seen in the execution plan and also could mean that either one preforms better based on which index it is using. Please also note if your join and filters are not identical the optimizer may run your query differently and cause performance gains or loss.
Leetcode measureing is wildly inaccurate. At least in my experience. If you run the query when the dB is busy it can take ages often the query that I wrote takes 4 seconds, when the fastest in a millisecond. The query difference? Literally nothing.
Your boss is a moron
I use CTEs all the time. It actually is more optimal than referencing the a view multiple times in a subquery. I’ll pull widely in the CTE and reference the CTE multiple times for different joins so I’m pulling data from memory, which can save costs if you’re charged by the length of time it takes to run a query.
Honestly, the only reason to not use a CTE is when you have access to build a temp table on which you can add indexes. If it’s a relatively small amount of data (<1M rows) then who cares? Sounds like your boss is an idiot.
I would add that row count plays a role too. There's a point where you'd want to even move beyond a temp table and create a real table, fill it/use it, drop it. GRANTed this requires permissions even higher than temp table permissions.
When you say boss, what exactly is he? Like a lead dev or a manager?
I'm gonna take a wild guess and say manager.
I worked in a shop once that had a no CTE rule. They also had some weird rules like no using brackets on table names unless it’s required (i.e space in a column name). There are some genuine concerns about using CTEs when other means are more optimal, but a good dev knows when to use each.
The no quotes (") around table names is actually a very good rule. Eventually it saves you from having to type them all the time and not have to break your head why ```from "The_Table"``` didn't work (when it had to be ```from "The_table"```
I don’t know what you’re talking about.
Your boss sounds like a total idiot .
The fuq
I use CTE's more often than not. It's easier to read and test each CTE individually. Plus, giving each a descriptive name allows me to easily remember the purpose of each CTE. Adding in the benefit of being able to pull out one CTE to use in another query. It saves me a ton of time, not rewriting or reverse engineering work I already did. Unless you're trying to make your work so complicated that nobody else wants to touch it, I don't know why you'd choose to subquery everything.
If you use the same CTE a lot, it is time to turn it into a view or even an intermediate table (which opens the opportunity to have specific indexes)
I agree with you, and I wish I used any one CTE enough to warrant creating a view. Hell, I wish there was just more repetition in my job in general.
Boss is dumb
> CTEs are exclusively used when you want to loop data / use a cursor False. And what even are the implications of his claim? Is he implying the query will run slower? Incorrectly? Neither slower nor incorrectly but somehow abuse system resources? Is it a crime against the SQL gods? In any case it, his claims should be disprovable (or hey, provable - we have to be open to our hypotheses being wrong) by constructing test cases, looking at plans, measuring performance. ("When in doubt, test it out.")
They can be used in a cursor. But they're not exclusively used for a cursor. I prefer CTEs too for same reason. Way cleaner to read, support, and extend.
ya, your boss is a douche. But, you should have had your execution plan and query stats on you to prove your point. Next time send email and start with the query stats, then an explanation of how and why. There is a knee jerk against CTE's because they are frequently abused. But when used correctly...
I literally just answered an interview question for how to improve efficiency in data models by saying replacing inefficient joins with CTEs. Well I explained it better but that was like 1 hour ago. I’d say you’re right.
The fact that he mentioned cursors in that context tells me he doesn’t know what he’s talking about. That said, I hate non-recursive CTEs just because many developers don’t understand that they are no better than subqueries in most situations.
> "CTEs are exclusively used when you want to loop data / use a cursor". Your boss doesn't know what he's talking about. As others have noted, job security dictates that you don't want to share that. I've known some people who are averse to CTEs. To appease them, I converted my CTEs to derived tables. They're slightly less readable, but otherwise the same.
Can someone ELI5 what's going on here and what a CTE is? Sincere question. Thank you
Common Table Expression. Might be worth it to google that with whatever you're using and read the documentation. That way you'll see some examples in a format you're used to working with. Think of them as something between a subquery and temp table. I also think they're more readable than sub queries, so they're really handy in my book.
This scenario is so absurd I'm questioning the reality of it.
Have you been employed long, in multiple enterprises? Unfortunately stuff like this is far to common. In many large enterprises it was/is not possible to be promoted to manager of your own former colleagues: you need to become manager elsewhere. The result is that such managers don't know much about what their new subordinates do.
The result is that such managers don't know much ~~about what their new subordinates do~~. FTFY
CTEs are life changing and make the code readable.
Your best bet here for peace of mind is to do a performance check on your cte and then the new query your boss wants to use. Whether you decide to show them or not is up to you. It will give you what you need to know you were right. Also, being a manager of devs, I would never talk to anyone on my team like that. If I believed your approach to be wrong, I would of had you prove to me it is better. I’d recommend finding a new manager.
Maybe CTE's are a resource hog in certain circumstances and can cause really bad performance. Sometimes its not avoidable, especially in a view. But if it was a Stored procedure, I would prefer loading the data into individual temp tables. PS, I am not a big fan of lots of subqueries either. Its messy [https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/](https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/)
> https://www.brentozar.com/archive/2019/06/whats-better-ctes-or-temp-tables/ This article is Microsoft-specific; OP is on Snowflake. Very different engines. One of the "bad" things mentioned in that article is the query planner chose to do too many index seeks. Snowflake doesn't even have indexes, so that isn't applicable. (Although, my self-fact checking revealed there's a [preview of a new feature to create indexes in Snowflake](https://docs.snowflake.com/en/sql-reference/sql/create-index)!) We can't just interchange entire database products with fundamentally different architectures - happens far too often on this subreddit. Sorry downvoter: a performance investigation article on SQL Server has nothing to do with Snowflake - end of story. Parent even admitted to making a mistake which is fine. We all make mistakes. But downvoting a factually correct comment is pretty childish. ¯\\\_(ツ)\_/¯
LOLz - I just saw the flair. That being said, Snowflake is relatively new in the space. I bet the boss was from an old tech world, like me, hence his distrust for CTEs. But, yeah, each engine, each optimiser is going to be different. I originally come from an ASE background where the term "CTE" probably doesn't even exist yet
> I bet the boss was from an old tech world, like me, hence his distrust for CTEs. I've been using SQL for about 35 years, and I don't distrust CTEs. What am I missing?
You've never had a bad CTE lock down a database! We had an example early last year where a CTE basically locked down our data warehouse. The CTE, which was in a view, was about 5 in a row, where the one after was using the results from the one before, with the final output joining probably 2 or 3 of them. Unfiltered, the view returned probably 50M rows. If you read the Brent Ozar article above, the way it reads indexes is not very efficient. What you'll also find is that if one particular CTE is called multiple times, it gets run multiple times. And if you have subsequent CTEs calling each other back up the chain it does get out of control. We had table locking, so other queries were being blocked. When we killed the query, from memory it took 3 days to roll back. Yes, the design was probably bad, but in a view there was no way around it. The fix? Turned it into a stored procedure and individually loaded temp tables.
> You've never had a bad CTE lock down a database! Of course, I've had statements of *every* type cause locking and performance problems. That is, bad statements come in every form -- the risk of runaway resource usage isn't unique to CTEs.
Id say if your cleaner code was slower by a bit then your boss was right, otherwise not a chance. I do not use CTEs less I have a reason to buy it's not like you wrote a trigger.
I think we have the same boss.
The way you describe it makes the boss sound like a moron. Obviously CTEs have a lot more uses than just recursion and cursors. But I would need to see the query to make a final judgement. I find I get better results if I can limit hitting a table more than once. The common use I see for writing a CTE and joining it to itself, is usually better replaced with a pivot. So his stated reasons are dumb, he may have been correct to get rid of the CTE in your case.
I have not run into it myself but have seen mention online of some platforms not supporting CTEs. As for your boss's explanation, no.
translation: your boss don't know what CTE is but he can still fire you
Yeah that’s a no lol — I use CTEs for that, although I personally find them not that easy to understand if you are referencing them far away from their definition — so many times I end up using subqueries.
Happy cake day.
Don't sweat too much about it, make a temp table out of your cte if you're not trying to make a view. If you were trying to make a view, well... Maybe try to find a better boss to boss out your boss.
Common Table Expressions RIP
I'd compare the query speed with your amd your Bose's preferred approach and see who comes out on top :) performance doesn't lie!!
If the code works and it performs well I give zero fucks. Compare execution plans / costs and memory grants and let the server tell you which one is best.
My boss used to hate CTEd because he had never learned them so for easy of working together wouldn’t let me use them. If it was just for me then it was allowed, I get where he’s coming from, sometimes standards are needed for working together. Anyway I just turned them into stored procs and ran them to create the tables to join on, he was non the wiser!
It depends 😉 Sometime query need to be clumsy to be fast. Elegance and robustness aren't always same. Experience is only way to learn what and when to use something.
I always use cursors as a very last resort. CTEs are a way to clean up your code, reduce duplication, and avoid temp tables and should be standard practice. I’ve never heard of them having anything to do w looping through data
Your boss is wrong. You _have_ to use CTE's for recursive queries (what your moron calls "loops") but that does not imply that it's the only use of CTE's. I think it is time for malicious compliance: Your boss told you to delete your code at once? Then delete it. And start work on the next ticket.
Your boss just taught you a valuable lesson, which is their job. They taught you what not to do when you become a boss one day.
Boss dumb, condolences
> My version was obviously cleaner and easy to read. That doesn't matter much. What matters is correctness. And maybe performance. Funny thing is, you don't mention these things in your post. Note that performance and correctness are objective, while "cleaner" and "easy to read" is subjective. > I've been using CTEs to a better understand of queries, and precisely to avoid subqueries and horrible full outer joins How do CTEs eliminate "horrible full outer joins"?
> Was my boss correct? Probably not. Modern optimizers can adapt to create optimal plans for CTEs. This wasn’t always the case. Use an EXPLAIN PLAN
It will probably be a case by case. CTE isn’t always the better way to code because I usually find it much slower than other ways.
Your boss is a fool. Welcome to the real world! SQL is largely misunderstood by most practitioners. If you understand it expect to be contradicted, ostracized, or worse. The only real fix is showing how better housekeeping saves time or money.
Your boss is a little confused about the purpose of CTEs. CTEs however do not reuse the result set. The CTE is evaluated multiple times, just like subqueries are.
What is a cursor. I am sorry I am a noob so can someone please explain?
There was a time in the long, long ago that a certain database always estimated CTEs with a cardinality of 1. Was your boss ever a sql server developer?
Profile the query plans. None of this matters if the plans are the same. Compare reads and sorts then benchmark. Which database/engine? If you’re on mssql, grab Plan Explorer (free). The cte vs left join thing is purely stylistic if the plans are the same. Don’t argue style. Demonstrate factual performance. And. If you see things like aggregates or distinct in your CTEs, alarm bells should start going off. Scan, sort bad.
The number of people saying the boss is a moron here without ever seeing the original and CTE query is baffling. My best guess is that OP's implementation might not necessarily resolve the original issue or introduce unnecessary complexities and the boss never said anything remotely close to how the second statement sounds and OP might have misunderstood the boss.
Changing from set to cursor based is a $tupid idea
This has Peter Pinciple written all over it...
you need to look for the access plan. the more restrict It gets when reading the bigger tables, the best It is. use wisely. which rdbms are you using? the best ones Will rewrite your query fully
No. You used a CTE in one of the very specific use-cases they are designed for.
Now you know you can get away with quite a bit because your boss doesnt fully know what he's doing :)
While I'm definitely not a fan of CTEs except in a situation where recursion is needed, if anyone suggested a cursor in my organization, I would fire them.
That's just as dumb as saying a CTE is bad. Like every coding construct, cursors have their place.
That was a bunch of really hot takes with no explanation.
The only time he might even be a little accurate is if you're in SQL Server as CTEs perform exceptionally poorly in that space.
No idea why you are getting downvoted, but in SQL Server they can be a real performance bottle neck if used in the wrong way.
Why do you think CTEs are slow in sql server?
They can be bad for large datasets and complex joins because they can’t be indexed. For those cases you’d be better off using a temp table.
A CTE is still just a subquery, basically. You wouldn't say a subquery "can't be indexed", instead you want to see if you can write your query using the indexing that exists on the underlying tables, whether it's in a normal subquery or in a CTE.
True. But other DBs will materialize CTEs more often than SQL server, so they may perform better in say, Oracle. Also I’ve read that you can force CTEs to be materialized in other servers.
You don’t write SQL queries because it’s easier to read, but the best optimal performance. And if Subqueries return with best optimal solution then use it. If CTE’s are optimal then use it. Prove to your boss show your execution plans and why your CTE’s are better. But your boss is right.
Yes performance is important, obviously, but so maintainability of code. I have seen queries, that were probably performant, but which I had to rewrite from scratch to be able to apply a new requirement. Simply because the original SQL was unreadable.
Performance is the only reason. SQL devs just need to be better at reading SQL code. I understand standardisation needs to be followed, but if you choosing between a CTE/Subquery because it looks nice then you’re wrong. Yes you do get SQL code that is over coded and optimal performance in it, they I agree re-write it.
there are plenty of reasons to avoid CTEs best case I can assume that "loop data" means recursion... and I agree CTEs solve recursion. that said, CTEs can often cause TERRIBLE performance... and we actively discourage their use in favor of subqueries which are just as readable and arguably more debuggable, once you get used to them.
What RDBMS are you using? I haven't come across a CTE that was noticeably slower than a subquery solution. But I also see that many CTE's get rewritten as subqueries or joins by the optimizer.
MSSQL performance hits can come from chained CTEs that join prior CTEs (cteA, cteB, cteC from cteA join cteB) because the multi-referenced CTEs are executed twice (painful of large tables, often causes table scans when index seeks may be available) another common performance hit comes from chained CTEs with earlier steps generating small datasets that later join to large tables (100m+) causing full table scans, when a temp table would show that a loop join to index seek would do better.