Data Management for Natural Scientists offers a practical guide for scientific processing of data. It covers the way from “getting hands on” experimental results to ensuring their use for addressing various scientific questions. Code snippets are provided in order to introduce the proposed workstream and to demonstrate the adjustability to specific challenges.
This book does not require in-depth previous knowledge on scripting in Python or data management, but familiarity with basic concepts such as the creation of folders and files is assumed. Some previous experience using scripting languages is certainly helpful but not mandatory for understanding the key ideas of this book and following the provided code snippets. The examples presented herein were created using Python 3 on a Windows platform.
Getting access to the data part of an experimentally obtained results file is key to all of the downstream steps in data processing. So, let’s start our Python scripting endeavour at the beginning and tackle the problem at the source. The following topics will be addressed:
– How to extract and separate data, metadata and information from an experimental results file?
– How can regular expressions assist us in this task?
Throughout this book, Python is the tool of choice for all the tasks associated with data treatment, manipulation and basic visualization. Let me explain, why this might also be the right approach for your project. On the Python homepage, it is described as a programming language that lets you work quickly and integrate systems more effectively. Anyone who has read a few lines of Python will agree that the programming project’s philosophy of focusing on code readability and readily understandable syntax has clearly been met—at least in comparison to other programming languages. To reflect this characteristic, the term pythonic has been coined. This essential feature provides a rather low entry barrier to Python: both for programming beginners and more experienced programmers aiming to engage in a new programming language. Furthermore, it is an open-source project with a large community of active users and contributors. At the time of the writing of this book, Python consistently ranks as one of the most popular programming languages.
Further benefits of Python include its versatility provided by dedicated libraries for, among others, data science, scientific computing, image processing, computer vision, web programming and scraping and many more. Also, the availability of extensive documentation is an additional aspect in favor of Python.
Author(s): Matthias Hofmann
Publisher: De Gruyter
Year: 2023
Language: English
Pages: 216
City: Berlin
Acknowledgment
Contents
1 Presenting the challenge
2 Python quick start
3 The steps of data processing
4 From experimental files to data
5 From data to information
6 Where to put data and information
7 How to visualize data and information
8 Responding to lessons learned
9 Where to go from here
10 Conclusion
A Packaging a custom module
B Comments to tables and columns via SQLAlchemy
C A word on version control systems
D Overview of utilized Python and package versions
E Extracting data via pd.read_csv
F Installation of an ODBC driver
List of Figures
Bibliography
Index