Skip to content

fiedsch/datamanagement

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

129 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Datamanagement Tools

PHP classes and helpers for managing data read from text files

  • Data\File\Reader read text files
  • Data\File\CsvReader read CSV files
  • Data\File\FixedWidthReader reads text files that contain data in fixed width columns
  • Data\File\Helper helper functions like SC() that converts from spreadsheet column name to index of array generated by (e.g.) CsvFileReader->getLine()

Examples

Work on CSV data

<?php

require __DIR__ . '/vendor/autoload.php';

use Fiedsch\Data\File\CsvReader;
 
try {
 
  $reader = new CsvReader("testdata.csv", ";");

  // Read and handle all lines containing data.

  while (($line = $reader->getLine()) !== null) {
    // ignore empty lines (i.e. lines containing no data)
    if (!$reader->isEmpty($line)) {
      print_r($line);
    }
  }
  // $reader->close(); // not needed as it will be automatically called when there are no more lines

} catch (Exception $e) {
    print $e->getMessage() . "\n";
}

Features

As of v0.3.2 the typical boilerplate "open file, read every non-empty line, close file" can be written in a fancier way. Use the optional parameter to getLine():

<?php

  while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
      print_r($line);
  }
  

Data augmentation

<?php
 
require __DIR__ . '/vendor/autoload.php';
 
use Fiedsch\Data\File\CsvReader;
use Fiedsch\Data\Augmentation\Augmentor;
use Fiedsch\Data\Augmentation\Provider\TokenServiceProvider;
use Fiedsch\Data\File\CsvWriter;
  
try {

  $augmentor = new Augmentor();
 
  $augmentor->register(new TokenServiceProvider());
  
  $augmentor->addRule('token', function (Augmentor $augmentor, $data) {
     return [ 'token' => $augmentor['token']->getUniqueToken() ];
   });
  
   $reader = new CsvReader("testdata.csv", ";");
   
   $writer = new CsvWriter("testdata.augmented.txt", "\t");
   
   $header_written = false;
   
   while (($line = $reader->getLine(Reader::SKIP_EMPTY_LINES)) !== null) {
     $result = $augmentor->augment($line);
     if (!$header_written) {
        $writer->printLine(array_merge(['input_line'], array_keys($result), $reader->getHeader()));
        $header_written = true;
     }
     $writer->printLine(array_merge([$reader->getLineNumber()], $result, $line));
   }
   
   $writer->close();
 
 } catch (Exception $e) {
     print $e->getMessage() . "\n";
 }

Creating Tokens

Method one: let the TokenCreator make sure, we have unique tokens:

<?php
 
require __DIR__ . '/vendor/autoload.php';
 
use Fiedsch\Data\Utility\TokenCreator;
use Fiedsch\Data\File\Writer;


$creator = new TokenCreator(10, TokenCreator::UPPER);

$output = new Writer('mytokens.txt');
$numTokens = 1000;

while ($numTokens-- > 0) {
 $token = $creator->getUniqueToken();
 $output->printLine($token);
}
$output->close();

Method two: generate tokens first and then check if they are unique. This might be faster and less resource consuming for large amounts of tokens:

 // same as above, exept 
 // $token = $creator->getUniqueToken();
 // becomes
 $token = $creator->cretateToken();

Check that the generated tokens are unique

echo " both lines show the same numbers, there were no duplicate tokens"
wc -l mytokens.csv
sort mytokens.csv | uniq | wc -l

About

Data management helpers (PHP-CLI)

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages