Discussion:
Table to Hold Data in a centralized way.
(too old to reply)
Shapper
2012-11-27 01:50:08 UTC
Permalink
Hello,

On a SQL 2008 database I need a table to hold information as follows:

1 - Files (PDF, TXT, HTML, ZIP, PNG, MP4, ...)

2 - Text (Insert through a form in the application)

3 - HTML or other types of markup (Insert through a form in the application)

4 - VIDEOS (HTML 5 Video (1 to 3 files), You Tube Video (Markup)), ...

5 - IMAGES (PNG, GIF, ...)

6 - AUDIO (HTML 5 AUDIO (1 to 2 files).

So I ended up with some problems:

A - One item can have 3 files (An HTML 5 Video should have 3 files);

B - One type can be inserted in different ways (A TEXT can be inserted through a form or a file)

And when later, that item is edited, I need to be able to know how it was inserted to display the same UI.

C - One item can be local or from a online service:

Local Video: HTML 5 VIDEO with 3 files.

Remote Video: HTML Markup for HTML Video.

So I have a few types of content, different sources and different ways of being inserted (UI).

The approach I am planing is the following:

create table dbo.FilesPacks
(
Id int identity not null
constraint FilesPacks_Id_PK primary key clustered (Id),
Created datetime not null,
);

create table dbo.Files
(
Id int identity not null
constraint Files_Id_PK primary key clustered (Id),
PackOfFilesId int not null,
Data varbinary (max) filestream null
constraint Files_Data_DF default (0x),
[Key] uniqueidentifier rowguidcol not null
constraint Files_Key_U unique,
MimeAndInfo nvarchar (200) not null
) filestream_on [STORAGE];

So I have:

A) FilesPacks.

A file pack will allows me to create a Video with 3 Files or a PDF with one file.

B) Files - Using FileStream.

This allows me to hold small files, large files, text and html (in byte form), etc.

The data does not count to the database size.

As disadvantage I see one: Poor update performance.

But in fact the files, after being inserted, will be very rarely updated.

C) Files - Column MimeAndInfo

I have many types of sources and data as mentioned before.

- A VIDEO can be a file or HTML code for YouTube, Vimeo, etc.

- A Text can be a file or a text inserted from a form.

- ...

My idea is to hold that data in this column. Something like:

text/plain_f (Text inserted by f - form)

text/plain_u (Text inserted by u - uploading a file)

video/mp4_u (Video inserted by u - uploading file)

video/youtube_c (FAKE MIME: video from you tube using c - code)


So basically, this column will give me the MIME type and how was inserted into the database.

For a few contents which don't have a mime type I create fake mime types:

"video/youtube", "map/google" ... Of course this is internal to the application.


I have considered having many tables for classification but I am not sure ...

I wanted something flexible ... And that could be kind of controlled by the application.

NOTE: Each item also has an access level according to users ... but I left it out for sake of simplicity.

Well this is just a sketch but any advice is welcome ...

Thank You,

Miguel
Erland Sommarskog
2012-11-27 22:15:20 UTC
Permalink
Post by Shapper
text/plain_f (Text inserted by f - form)
text/plain_u (Text inserted by u - uploading a file)
video/mp4_u (Video inserted by u - uploading file)
video/youtube_c (FAKE MIME: video from you tube using c - code)
So basically, this column will give me the MIME type and how was
inserted into the database.
"video/youtube", "map/google" ... Of course this is internal to the application.
I always find data modelleing questions in forums difficult, because a
correct decision depends on so many details, and in a forum I rarely
get the full picture.

But I don't like that you overload different sort of information in
one column. I think you should keep the MIME column as the MIME type
and nothing else. Then add source, text/form etc in separate columns.
Add constraints like this:

CHECK (MIME = 'text/plain' AND textsource IS NOT NULL OR
MIME <> 'text/plain' AND textsource IS NULL)

One can also debate whether you should store the complex MIME tags
directly in the column, or have a lookup table for them. At very least,
it seems to me that you should have a table that defines the MIME
types you support and have an FK constraint that table. This prevents
you from getting things like 'text\plain' in the database.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Shapper
2012-11-28 10:47:38 UTC
Permalink
Post by Shapper
Hello,
1 - Files (PDF, TXT, HTML, ZIP, PNG, MP4, ...)
2 - Text (Insert through a form in the application)
3 - HTML or other types of markup (Insert through a form in the application)
4 - VIDEOS (HTML 5 Video (1 to 3 files), You Tube Video (Markup)), ...
5 - IMAGES (PNG, GIF, ...)
6 - AUDIO (HTML 5 AUDIO (1 to 2 files).
A - One item can have 3 files (An HTML 5 Video should have 3 files);
B - One type can be inserted in different ways (A TEXT can be inserted through a form or a file)
And when later, that item is edited, I need to be able to know how it was inserted to display the same UI.
Local Video: HTML 5 VIDEO with 3 files.
Remote Video: HTML Markup for HTML Video.
So I have a few types of content, different sources and different ways of being inserted (UI).
create table dbo.FilesPacks
(
Id int identity not null
constraint FilesPacks_Id_PK primary key clustered (Id),
Created datetime not null,
);
create table dbo.Files
(
Id int identity not null
constraint Files_Id_PK primary key clustered (Id),
PackOfFilesId int not null,
Data varbinary (max) filestream null
constraint Files_Data_DF default (0x),
[Key] uniqueidentifier rowguidcol not null
constraint Files_Key_U unique,
MimeAndInfo nvarchar (200) not null
) filestream_on [STORAGE];
A) FilesPacks.
A file pack will allows me to create a Video with 3 Files or a PDF with one file.
B) Files - Using FileStream.
This allows me to hold small files, large files, text and html (in byte form), etc.
The data does not count to the database size.
As disadvantage I see one: Poor update performance.
But in fact the files, after being inserted, will be very rarely updated.
C) Files - Column MimeAndInfo
I have many types of sources and data as mentioned before.
- A VIDEO can be a file or HTML code for YouTube, Vimeo, etc.
- A Text can be a file or a text inserted from a form.
- ...
text/plain_f (Text inserted by f - form)
text/plain_u (Text inserted by u - uploading a file)
video/mp4_u (Video inserted by u - uploading file)
video/youtube_c (FAKE MIME: video from you tube using c - code)
So basically, this column will give me the MIME type and how was inserted into the database.
"video/youtube", "map/google" ... Of course this is internal to the application.
I have considered having many tables for classification but I am not sure ...
I wanted something flexible ... And that could be kind of controlled by the application.
NOTE: Each item also has an access level according to users ... but I left it out for sake of simplicity.
Well this is just a sketch but any advice is welcome ...
Thank You,
Miguel
1 - Having MIME types in a lockup table is something I am considering.

But I was told that, in case of blobs, I should reduce the number of joins as possible because reading blobs is not cheap ...

2 - FilesPack ... I think there is a way to avoid this table.
The objective of this table was only to group different types of the same file, eg, a Video might have MP4, WEBM, ... versions.

So my idea would be to have only Files table as follows:

create table dbo.Files
(
Id int identity not null
constraint Files_Id_PK primary key clustered (Id),
Pack uniqueidentifier rowguidcol not null,
Data varbinary (max) filestream null
constraint Files_Data_DF default (0x),
[Key] uniqueidentifier rowguidcol not null
constraint Files_Key_U unique,
MimeAndInfo nvarchar (200) not null,
Created datetime not null
) filestream_on [STORAGE];

Note the PACK column. It is an uniqueidentifier but not unique.
A new video would have 3 files all with the same Pack GUID.
This allows me to identify them as different versions of same content.

I also moved the Created column to Files. Now I can remove FilesPacks.

What do you think?

3 - About the textsource column you suggested.
When the file source is text you would place "text" here?
Otherwise, you would leave it null?

Why not using only a column Source and Save to it "Text", "Code", ...

Or why not using BIT columns? True or False?

And it is possible to have a few columns, for example:

CODE, TEXT, FILE ...

All BIT columns but with a constraint which would allow only one to be true?

Does this make sense or is even possible?

Thank You,
Miguel
Erland Sommarskog
2012-11-28 21:15:08 UTC
Permalink
Post by Shapper
1 - Having MIME types in a lockup table is something I am considering.
A lockup table? "This system is designed to go slow!" Sorry, I couldn't
resist.
Post by Shapper
But I was told that, in case of blobs, I should reduce the number of
joins as possible because reading blobs is not cheap ...
Reading blobs takes longer time, because they are big. And TDS is not the
optimal channel, why we have FILESTREAM and the OpenSqlFilestream API.

But that has very little to do with joins to other tables, so that is a
void argument.
Post by Shapper
2 - FilesPack ... I think there is a way to avoid this table.
The objective of this table was only to group different types of the
same file, eg, a Video might have MP4, WEBM, ... versions.
I can't speak to that, but the idea appeared sound to me.
Post by Shapper
create table dbo.Files
(
Id int identity not null
constraint Files_Id_PK primary key clustered (Id),
Pack uniqueidentifier rowguidcol not null,
Data varbinary (max) filestream null
constraint Files_Data_DF default (0x),
[Key] uniqueidentifier rowguidcol not null
constraint Files_Key_U unique,
MimeAndInfo nvarchar (200) not null,
Created datetime not null
) filestream_on [STORAGE];
Note the PACK column. It is an uniqueidentifier but not unique.
A new video would have 3 files all with the same Pack GUID.
This allows me to identify them as different versions of same content.
That on the other sounds messy to me. A better alternative would be
to have three FILSTREAM columns and then constraints which says
for which MIME types they can be populated. (If you can have multiple
FILESTREAM columns in a table; I haven't tested.) Then again, you would
have less flexibility if a new type requires four files.
Post by Shapper
3 - About the textsource column you suggested.
When the file source is text you would place "text" here?
Otherwise, you would leave it null?
Depends. It seemed me that some of the other types had a source as well.
But maybe it would be optional for other types.
Post by Shapper
Why not using only a column Source and Save to it "Text", "Code", ...
Or why not using BIT columns? True or False?
CODE, TEXT, FILE ...
All BIT columns but with a constraint which would allow only one to be true?
There are a couple of possibilities, and I did not study all combinations
to say what is best. I just wanted to warn you from overloading to much
information into the MIME column.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Shapper
2012-11-29 02:11:27 UTC
Permalink
Post by Shapper
Hello,
1 - Files (PDF, TXT, HTML, ZIP, PNG, MP4, ...)
2 - Text (Insert through a form in the application)
3 - HTML or other types of markup (Insert through a form in the application)
4 - VIDEOS (HTML 5 Video (1 to 3 files), You Tube Video (Markup)), ...
5 - IMAGES (PNG, GIF, ...)
6 - AUDIO (HTML 5 AUDIO (1 to 2 files).
A - One item can have 3 files (An HTML 5 Video should have 3 files);
B - One type can be inserted in different ways (A TEXT can be inserted through a form or a file)
And when later, that item is edited, I need to be able to know how it was inserted to display the same UI.
Local Video: HTML 5 VIDEO with 3 files.
Remote Video: HTML Markup for HTML Video.
So I have a few types of content, different sources and different ways of being inserted (UI).
create table dbo.FilesPacks
(
Id int identity not null
constraint FilesPacks_Id_PK primary key clustered (Id),
Created datetime not null,
);
create table dbo.Files
(
Id int identity not null
constraint Files_Id_PK primary key clustered (Id),
PackOfFilesId int not null,
Data varbinary (max) filestream null
constraint Files_Data_DF default (0x),
[Key] uniqueidentifier rowguidcol not null
constraint Files_Key_U unique,
MimeAndInfo nvarchar (200) not null
) filestream_on [STORAGE];
A) FilesPacks.
A file pack will allows me to create a Video with 3 Files or a PDF with one file.
B) Files - Using FileStream.
This allows me to hold small files, large files, text and html (in byte form), etc.
The data does not count to the database size.
As disadvantage I see one: Poor update performance.
But in fact the files, after being inserted, will be very rarely updated.
C) Files - Column MimeAndInfo
I have many types of sources and data as mentioned before.
- A VIDEO can be a file or HTML code for YouTube, Vimeo, etc.
- A Text can be a file or a text inserted from a form.
- ...
text/plain_f (Text inserted by f - form)
text/plain_u (Text inserted by u - uploading a file)
video/mp4_u (Video inserted by u - uploading file)
video/youtube_c (FAKE MIME: video from you tube using c - code)
So basically, this column will give me the MIME type and how was inserted into the database.
"video/youtube", "map/google" ... Of course this is internal to the application.
I have considered having many tables for classification but I am not sure ...
I wanted something flexible ... And that could be kind of controlled by the application.
NOTE: Each item also has an access level according to users ... but I left it out for sake of simplicity.
Well this is just a sketch but any advice is welcome ...
Thank You,
Miguel
I would not like to have 3 FileStream columns on Files Table.

Basically, I would like it to be flexible enough for future changes.

Well, I could always go for something like this:

create table dbo.Packs
(
Pack_Id int identity not null
constraint Packs_Pack_Id_PK primary key clustered (Id),
Source_Id int not null
Pack_CreatedDate datetime not null,
);

create table dbo.Files
(
File_Id int identity not null
constraint Files_File_Id_PK primary key clustered (Id),
Pack_Id int not null,
Mime_Id int not null,
File_Data varbinary (max) filestream null
constraint Files_File_Data_DF default (0x),
File_Key uniqueidentifier rowguidcol not null
constraint Files_File_Key_U unique
) filestream_on [STORAGE];

create table dbo.Mimes
(
Mime_Id int identity not null
constraint Mimes_Mime_Id_PK primary key clustered (Id),
Mime_Value nvarchar(80) not null
);

create table dbo.Sources
(
Source_Id int identity not null
constraint Sources_Source_Id_PK primary key clustered (Id),
Sources_Value nvarchar(80) not null
);

So this would work as follows:

1 - Packs table is used to group files.
And it holds common file data such Created date and Source.
Source would be a table with values "CODE", "TEXT", "FILE" ...

Or maybe I could have a simple column in Packs.
Something that would accept "C", "T", "F" for Code, Text, File, ...
Would be possible to have a constraint such as this?

Between the Sources table or such a column which one would you use?

2 - Files would hold the files;
Each file as a Mime_Id which comes from a Mimes table as you suggested.


Finally, I would use this in such a way: Consider a Post table:

create table dbo.Posts
(
Post_Id int identity not null
constraint Posts_Post_Id_PK primary key clustered (Id),
Post_Tile nvarchar(80) not null,
Post_Body nvarchar(4000) not null,
)

create table dbo.PostsPacks
(
Post_Id int not null,
Pack_Id int not null,
constraint Post_Id_Pack_Id_PK primary key clustered (Post_Id, Pack_Id)
);

If I have other tables needing access to files I can have BooksPacks, UsersPacks, ...

And I could also restrict access to files in a pack using something like:

create table dbo.PacksRoles
(
Pack_Id int not null,
Role_Id int not null,
constraint Pack_Id_Role_Id_PK primary key clustered (Pack_Id, Role_Id)
);

Well, that is my "full table" approach ...

What is your opinion?

Thank you,
Miguel
Erland Sommarskog
2012-11-29 21:44:20 UTC
Permalink
Post by Shapper
What is your opinion?
That you should trim the quotes before you post!
Post by Shapper
1 - Packs table is used to group files.
And it holds common file data such Created date and Source.
Source would be a table with values "CODE", "TEXT", "FILE" ...
Or maybe I could have a simple column in Packs.
Something that would accept "C", "T", "F" for Code, Text, File, ...
Would be possible to have a constraint such as this?
Yes:

CHECK (Source IN ('C', 'T', 'F'))

It's always a trade-off between using codes in the table, or have a lookup
table. One advantage with codes is that you more easily have table
constraints to specify legal/illegal combinations. This can be done if you
have a lookup table as well, but then the check constraints only has a
number of ids, which, well are less nice to read.

Then again, your look-up tables could use the codes as key and the full name
as data.
Post by Shapper
Between the Sources table or such a column which one would you use?
Not sure that I understand, but one constraint could be:

CHECK (NOT (Source IN ('T', 'F') AND Mime NOT LIKE 'text/%'))

Yes, the double constraints are bit difficult, but you often have to write
them that way to permit for NULL values.
--
Erland Sommarskog, SQL Server MVP, ***@sommarskog.se

Links for SQL Server Books Online:
SQL 2008: http://msdn.microsoft.com/en-us/sqlserver/cc514207.aspx
SQL 2005: http://msdn.microsoft.com/en-us/sqlserver/bb895970.aspx
Continue reading on narkive:
Loading...