Friday, March 30, 2012

One large database or a few smaller databases?

Does it make sense to break up data into a few smaller (identical)
databases to keep from having hundreds of millions of rows in a few
tables in one single database?
I ask because I've got a website that's pumping about 500k new records
into 3 or 4 tables per day. In several months I'll probably be near 400
or 500 million rows in some of them.
Users to the website query the data, and so far, it's working well
(with only a few million rows in each). My indexes look good and my
queries only join 2 or 3 tables at most, but I am mostly doing
aggregations on lots of data (AVGs, SUMs, etc.) so I only return a few
rows at a time.
My app is set up to handle the data being in separate identical
databases, but I'd like to keep it all in the same place to keep things
more simple (for backups & upgrades, etc..)
But my fear is when I start to reach half a billion rows my query times
are going to suck, even if I cram more memory into my database server
(dual xeon 3.2 with 2GB of memory, room to grow to 8GB).
Anyone have any experience with this much data?
Thanks!!
You should not split up your solution into more than one database. A
good approach would be to use the analysis services from Sql Server as
you are requireing a lot of aggregation data which could be
precalculated. Therefore these cubes will provide a much feaster
response than quering just an OLTP database.
HTH, Jens Suessmeyer.
|||Jens, thanks for the quick reply. I'm using SQL 2005 standard... Does
that I mean need to upgrade to Enterprise?
Forgive my lack of knowledge on AS and data cubes, but are they kept in
real-time? Ideally the queries would be doing in "near-time" to when
the data has been inserted.
Thanks for your help.
|||No, AS is already included in SQL Server 2k5, but has to be instaleld
if not.
You sure can design the cube that it can fallback on non aggregated
realtime data , but perhaps the Analysis Services newsgroup can help
you further Within this they should know was is possible in your case.
HTH, Jens Suessmeyer.
|||On 28 Jan 2006 11:29:29 -0800, Jeff Turner wrote:

>Does it make sense to break up data into a few smaller (identical)
>databases to keep from having hundreds of millions of rows in a few
>tables in one single database?
>I ask because I've got a website that's pumping about 500k new records
>into 3 or 4 tables per day. In several months I'll probably be near 400
>or 500 million rows in some of them.
>Users to the website query the data, and so far, it's working well
>(with only a few million rows in each). My indexes look good and my
>queries only join 2 or 3 tables at most, but I am mostly doing
>aggregations on lots of data (AVGs, SUMs, etc.) so I only return a few
>rows at a time.
>My app is set up to handle the data being in separate identical
>databases, but I'd like to keep it all in the same place to keep things
>more simple (for backups & upgrades, etc..)
>But my fear is when I start to reach half a billion rows my query times
>are going to suck, even if I cram more memory into my database server
>(dual xeon 3.2 with 2GB of memory, room to grow to 8GB).
>Anyone have any experience with this much data?
>Thanks!!
Hi Jeff,
I agree with Jens - keep it in one database.
In addition to Jens' suggestion to use analysis server, here are some
other ideas that might help you gain speed as the amount of data grows.
1. Use indexed views. If your queries are usually for aggregates, they
could prove to be an excellent way to speed up the process, since they
are precalculated. Because SQL Server has to modify each indexed view as
the base data changes, indexed views do cause a slowdown for data
modifications, though. Don't use indexed views if you need millisecond
performance on insert and update operations.
2. Consider the use of partitioned views (SQL Server 2000) or table
partitioning (SQL Server 2005).
Hugo Kornelis, SQL Server MVP
|||Hi Jeff,
Some extra thoughts:
1. The optimizer is going to do better work for you if it's in one database.
2. Backups/restores become much more complicated if you have to keep
multiple databases in sync. Consider what you might have to do to restore to
a point in time. There are methods but it just starts to complicate things.
HTH,
Greg
"Jeff Turner" <zigjst@.gmail.com> wrote in message
news:1138476569.362520.240210@.f14g2000cwb.googlegr oups.com...
> Does it make sense to break up data into a few smaller (identical)
> databases to keep from having hundreds of millions of rows in a few
> tables in one single database?
> I ask because I've got a website that's pumping about 500k new records
> into 3 or 4 tables per day. In several months I'll probably be near 400
> or 500 million rows in some of them.
> Users to the website query the data, and so far, it's working well
> (with only a few million rows in each). My indexes look good and my
> queries only join 2 or 3 tables at most, but I am mostly doing
> aggregations on lots of data (AVGs, SUMs, etc.) so I only return a few
> rows at a time.
> My app is set up to handle the data being in separate identical
> databases, but I'd like to keep it all in the same place to keep things
> more simple (for backups & upgrades, etc..)
> But my fear is when I start to reach half a billion rows my query times
> are going to suck, even if I cram more memory into my database server
> (dual xeon 3.2 with 2GB of memory, room to grow to 8GB).
> Anyone have any experience with this much data?
> Thanks!!
>
|||Thanks to everyone for replying.
One thing I should point out is the data in these tables is broken down
by users to the website.
For example, if I did split this up by database, I could have one
database that holds all of the records for users with last names of
A-G. They would never have a need to query against the data in the
other databases.
So the practical minded side of me says, why bother running queries for
an A-G user in tables that hold everyone's data, when I could
physically have it narrowed down already.
Does that make sense? Or is SQL just smart & efficient enough with
indexing that the performance gain would be negligable compared to my
backup/maintenance costs.
Thanks again!

No comments:

Post a Comment