logging - Database design or architecture suitable for storing logs, real time reporting and utilized as log correlation engine -

September 15, 2011

the problem face related storing , retrieving reasonably fast millions of logs. work on collecting everyday logs firewalls, intrusion detection , prevention systems, application logs, user activity etc., storing them on database, perform real time reporting , correlating them identifying intrusions etc. after working , building system syslog , mysql found out bottlenck @ moment database. have experience on relational database. on other hand totally lost on technologies exist , came knowledge in database field.

so nosql databases (mongo, cassandra etc) better , outperform tranditional databases (mysql, oracle, mssql etc)? have read until there no aggregation functions , consequently reporting not feasible, right?

dataware houses better needs? know used reporting not real time. true or there implementation today support maybe near real time might acceptable? found out more or less different way of designing database schema , traditional databases excellent candidates that. true?

also proposed create table partitions not using database feature exists in databases. idea use seperate tables based on size , create procedures store , update indexes seperated tables , manipulate them speed things whenever need perform join or aggregation. heard or used similar this? because @ first seemed totally not appliable such solution me.

in end possible migrate of above technologies have better , more balanced results?

i know big issue. see date knowledge , experience in rdbms not enough solving problem. , since technologies many need hear opinions, discuss , guided people had experience in past. discuss pros , cons of approaches. there forums can propose can helpful me? 1 last thing measurement rank of data volume of terabytes, not petabytes, might exclude of technologies hadoop.

before settle on storage method, question type of analysis want do.

for aggregation oriented workloads , volume you're talking about, traditional rdbms oracle, sql server or postgresql running on beefy server should do. have native support partitioning , other dwh techniques (such materialized views) save time of cobbling yourself. example oracle query optimizer take account partitioning when generating new query plan.

as reporting front-end can go 1 of commercially available ones or create own. options obiee, sql server reporting services, cognos , pentaho (free) support cross-db reporting (combining dwh + operational store) extent.

if need instant answers arbitrary queries involving aggregations on large volumes (billion row datasets) teradata, netezza, vertica , like. these tend cost quite lot.

if want instant answers arbitrary queries involving aggregations on smaller datasets, qlikview. have powerful in-memory analysis tool. believe it's free single-person usage.

if it's not matter of adding numbers analyzing complex relationships (graph analysis) on large volumes, you're out of luck. old solutions don't scale or expensive, new ones hit , miss. it's going expensive either way. without knowing how want correlate events, it's hard recommend anything. i'm not aware of general solution.

personally, i'd go postgres (backend) + pentaho , qlikview (both front-end) kettle traditional etl , hadoop or custom code precalculate results more complicated analysis. in postgres split data in operational store , dwh.

Search This Blog

Error

logging - Database design or architecture suitable for storing logs, real time reporting and utilized as log correlation engine -

Comments

Post a Comment

Popular posts from this blog

basic authentication with http post params android -

c++ - End of file on pipe magic during open -

vb.net - Virtual Keyboard commands -