000 06432cam a2200601Ii 4500
001 ocn944961030
003 OCoLC
005 20190328114814.0
006 m o d
007 cr cnu|||unuuu
008 160317t20162016mau ob 001 0 eng d
040 _aN$T
_beng
_erda
_epn
_cN$T
_dEBLCP
_dN$T
_dOPELS
_dOCLCF
_dYDXCP
_dCDX
_dUMI
_dAZK
_dTOH
_dSTF
_dDEBBG
_dCOO
_dDEBSZ
_dVGM
_dIUL
_dVT2
_dU3W
_dD6H
_dUOK
_dCEF
_dKSU
_dOCLCQ
_dAU@
_dOCLCQ
_dWYU
_dTKN
019 _a961332310
_a961514762
020 _a9780128038543
_q(electronic bk.)
020 _a0128038543
_q(electronic bk.)
020 _a0128037814
020 _a9780128037812
020 _z9780128037812
035 _a(OCoLC)944961030
_z(OCoLC)961332310
_z(OCoLC)961514762
050 4 _aQA76.76.S46
072 7 _aCOM
_x051390
_2bisacsh
072 7 _aCOM
_x051440
_2bisacsh
072 7 _aCOM
_x051230
_2bisacsh
082 0 4 _a005.3
_223
100 1 _aBerman, Jules J.,
_eauthor.
245 1 0 _aData simplification : taming information with open source tools /
_h[electronic resource]
_cJules J. Berman.
264 1 _aCambridge, MA :
_bMorgan Kaufmann is an imprint of Elsevier,
_c2016.
264 4 _c�2016
300 _a1 online resource
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
504 _aIncludes bibliographical references and index.
588 0 _aOnline resource; title from PDF title page (EBSCO, viewed March 21, 2016).
520 _aData Simplification: Taming Information With Open Source Tools addresses the simple fact that modern data is too big and complex to analyze in its native form. Data simplification is the process whereby large and complex data is rendered usable. Complex data must be simplified before it can be analyzed, but the process of data simplification is anything but simple, requiring a specialized set of skills and tools. This book provides data scientists from every scientific discipline with the methods and tools to simplify their data for immediate analysis or long-term storage in a form that can be readily repurposed or integrated with other data. Drawing upon years of practical experience, and using numerous examples and use cases, Jules Berman discusses the principles, methods, and tools that must be studied and mastered to achieve data simplification, open source tools, free utilities and snippets of code that can be reused and repurposed to simplify data, natural language processing and machine translation as a tool to simplify data, and data summarization and visualization and the role they play in making data useful for the end user.
505 0 _aFront cover; Data Simplification: Taming Information With Open Source Tools; Copyright; Dedication; Contents; Foreword; Preface; Organization of this book; Chapter Organization; How to Read this Book; Nota Bene; Glossary; References; Author Biography; Chapter 1: The Simple Life; 1.1. Simplification Drives Scientific Progress; 1.2. The Human Mind is a Simplifying Machine; 1.3. Simplification in Nature; 1.4. The Complexity Barrier; 1.5. Getting Ready; Open Source Tools; Perl; Python; Ruby; Text Editors; OpenOffice; LibreOffice; Command Line Utilities; Cygwin, Linux Emulation for Windows.
505 8 _aDOS Batch ScriptsLinux Bash Scripts; Interactive Line Interpreters; Package Installers; System Calls; Glossary; References; Chapter 2: Structuring Text; 2.1. The Meaninglessness of Free Text; 2.2. Sorting Text, the Impossible Dream; 2.3. Sentence Parsing; 2.4. Abbreviations; 2.5. Annotation and the Simple Science of Metadata; 2.6. Specifications Good, Standards Bad; Open Source Tools; ASCII; Regular Expressions; Format Commands; Converting Nonprintable Files to Plain-Text; Dublin Core; Glossary; References; Chapter 3: Indexing Text; 3.1. How Data Scientists Use Indexes.
505 8 _a3.2. Concordances and Indexed Lists3.3. Term Extraction and Simple Indexes; 3.4. Autoencoding and Indexing with Nomenclatures; 3.5. Computational Operations on Indexes; Open Source Tools; Word Lists; Doublet Lists; Ngram Lists; Glossary; References; Chapter 4: Understanding Your Data; 4.1. Ranges and Outliers; 4.2. Simple Statistical Descriptors; 4.3. Retrieving Image Information; 4.4. Data Profiling; 4.5. Reducing Data; Open Source Tools; Gnuplot; MatPlotLib; R, for Statistical Programming; Numpy; Scipy; ImageMagick; Displaying Equations in LaTex; Normalized Compression Distance.
505 8 _aPearson's CorrelationThe Ridiculously Simple Dot Product; Glossary; References; Chapter 5: Identifying and Deidentifying Data; 5.1. Unique Identifiers; 5.2. Poor Identifiers, Horrific Consequences; 5.3. Deidentifiers and Reidentifiers; 5.4. Data Scrubbing; 5.5. Data Encryption and Authentication; 5.6. Timestamps, Signatures, and Event Identifiers; Open Source Tools; Pseudorandom Number Generators; UUID; Encryption and Decryption with OpenSSL; One-Way Hash Implementations; Steganography; Glossary; References; Chapter 6: Giving Meaning to Data; 6.1. Meaning and Triples.
505 8 _a6.2. Driving Down Complexity With Classifications6.3. Driving Up Complexity With Ontologies; 6.4. The Unreasonable Effectiveness of Classifications; 6.5. Properties That Cross Multiple Classes; Open Source Tools; Syntax for Triples; RDF Schema; RDF Parsers; Visualizing Class Relationships; Glossary; References; Chapter 7: Object-oriented Data; 7.1. The Importance of Self-Explaining Data; 7.2. Introspection and Reflection; 7.3. Object-Oriented Data Objects; 7.4. Working With Object-Oriented Data; Open Source Tools; Persistent Data; SQLite Databases; Glossary; References.
650 0 _aOpen source software.
650 0 _aData mining.
650 0 _aDatabase management.
650 7 _aCOMPUTERS
_xProgramming
_xOpen Source.
_2bisacsh
650 7 _aCOMPUTERS
_xSoftware Development & Engineering
_xTools.
_2bisacsh
650 7 _aCOMPUTERS
_xSoftware Development & Engineering
_xGeneral.
_2bisacsh
650 7 _aData mining.
_2fast
_0(OCoLC)fst00887946
650 7 _aDatabase management.
_2fast
_0(OCoLC)fst00888037
650 7 _aOpen source software.
_2fast
_0(OCoLC)fst01046097
655 4 _aElectronic books.
776 0 8 _iPrint version:
_aBerman, Jules J.
_tData simplification : taming information with open source tools.
_dCambridge, MA : Elsevier, [2016]
_z9780128037812
_w(DLC) 18934818
856 4 0 _3ScienceDirect
_uhttp://www.sciencedirect.com/science/book/9780128037812
999 _c247302
_d247302