Thomas - Thomas Gericke

HOWTO grab and thumbnail websites

February 9, 2009 |

Author Thomas

Hi there!

Because some of you asked, how I realized the grabbing and thumbnailing of whole websites (here’s an example and I wrote about that in this post), this is a brief HOWTO.

Imagine, you have a Linux system without graphical support. How do you display complex graphical content and make a screenshot? Here it comes: grabbing websites on a Linux system is quite simple.

Prerequisites:

a Linux operating system (Debian is fine)
khtml2png (I used khtml2png_2.7.6_i386.deb from here)
a running X server (Xvfb does it for me)
kdelibs4c2a
libkonq4

This is it!

The trick now is: on a system working as a server, you usually don’t want to have a running X server. So, I just installed Xvfb, which is a “Virtual Framebuffer ‘fake’ X server”. It is running in the background and khtml2png uses its display.

First, install Xvfb and several libs:

apt-get install xvfb kdelibs4c2a libkonq4

Hit ‘y’ to solve dependencies!

Now, get khtml2png from http://sourceforge.net/projects/khtml2png/ and install it:

dpkg -i khtml2png_2.7.6_i386.deb

Then, start your ‘fake’ X server:

/usr/bin/Xvfb :2 -screen 0 1920x1200x24

Of course, you may reduce the resolution to your needs. But remember the display number (:2) you set for Xvfb.

And finally, you may use khtml2png to fetch any website you like:

/usr/bin/khtml2png2 --display :2 --width 1024 --height 768 http://www.thomasgericke.de/ /tmp/website.png

Don’t worry about the fact that the package is named khtml2png and the binary is called khtml2png2. It’s okay!

I have a little magical wrapper around that stuff which gets URLs out of a database and performs some checks. Images are save with wget and converted to PNG, websites are fetched with khtml2png. Both are saved and thumbnailed on-the-fly with PHP.

I call khtml2png via cron like this:

/usr/bin/khtml2png2   --display :2 \
                      --width 1024 \
                      --height 768 \
                      --time 42 \
                      --disable-js \
                      --disable-java \
                      --disable-plugins \
                      --disable-redirect \
                      --disable-popupkiller \
                      http://www.thomasgericke.de/ \
                      /tmp/website.png

My script is started every minute and checks if new URLs have to be fetched. It also checks if existing PNGs are older than 24 hours and, if so, the URL will be fetched and the PNG overwritten.

Just let me know, if you have any further questions.

Posted in tech |

Tags: html, linux, unix |

3 Comments »

http://unfake.it/ goes magic!

February 8, 2009 |

Author Thomas

Hi there!

http://unfake.it/ now provides a tiny JavaScript bookmarklet which may be saved in your browser. While surfing on the web, you can click this bookmark and the page you’re currently viewing will automatically be faked/shortened.

But the real new feature is the preview function: by adding an asterisk ( * ) at the end of any faked URL, you’ll get a preview page which shows an image of the final destination. And this is really magical! Destination URLs may either be pages or images and they are initially fetched once a minute and refreshed once day. Preview of images will — of course — remain untouched. Whole websites are magically grabed as image and displayed as thumbnails.

An example?

http://unfake.it/XQh*

Check it out, this is really cool.

Posted in tech |

Tags: linux, unfake, url shortening |

No Comments »

Guide to LPIC-1 certification

February 7, 2009 |

Author Thomas

Since I lead a team of highly specialized IT personnel, who are mostly certified LPI level 1, level 2 or even level 3, I decided it was my turn to give it a try. I passed LPI 101 (part one of LPIC-1) on 2009-01-28 and LPI 102 (part two of LPIC-1) on 2009-02-02.

Here’s what you might wanna know about how to get LPIC-1 certified:

I did not learn a lot, because I’m used to Unix administration since about 1995. But I can highly recommend to buy and read the following book: LPIC-1 by Peer Heinlein. If you’re experienced, most parts may sound very familiar, but do you know all switches and all parameters of certain commands? Do you know, which commands and what paramaters LPI might ask? No? I guessed.

Here’s (in a very brief abstract), what you should know before trying to get certified:

Hardware & Architecture
- Do you know what interrupts are and where information are stored during the system’s runtime?
- What is IO?
- Do you know how to configure PCI expansion cards manually?
- How to connect and eventually configure a USB device?
- What is a Winmodem?
- Can you configure serial settings for a modem or a sound card?
- Do you know the difference between SATA, SCSI or even external USB drives?
- How does the system connect, name and use them?
Linux Installation & Package Management
- Do you know how to set up partitions?
- And why exactly this way?
- Why can some directories be on its own partition? Why should some be? Why must some not be?
- You can install and use LILO and Grub?
- What are the differences?
- Can you unpack, compile and install program sources?
- You know what shared libraries are and why they exist?
- Do you know the difference between RedHat and Debian package management?
- Do you know the most important switches and parameters (nearly all of them) of those packaging tools?
GNU & Unix Commands
- Do you know what a shell environment is?
- When and how will an environment be inherited?
- Can you set and unset variables?
- Do you know (including the usage) of all of the following tools? cat, cut, expand, fmt, head, hexdump, join, nl, paste, pr, sed, sort, split, tac, tail, tr, unexpand, uniq, wc. If not, get used to them!
- Do you know how to create, delete, read, move and copy files?
- Have you understood what STDIN, STDOUT and STDERR are and how they may be (re)directed?
- Can you kill processes, stop them, send them into the background or get them back to the foreground?
- Do you know what the process priority means and how it can be manipulated?
- Do you know how regular expressions work?
- Do you have problems using vi? If so, get used to this editor!
Devices, Linux Filesystems, Filesystem Hierarchy Standard
- Do you know how to set up a partition manually?
- Even a swap partition?
- Can you add new swap space while your system is running?
- Can you check, tune and repair a filesystem?
- Do you know when and how partitions are mounted? And why?
- Can you set up and use quotas?
- Can you get and read quota reports?
- Do you know what file and folder permissions are? And how they usually should be?
- Can you modify them?
- Do you know what special bits (files/folders) are?
- Can you explain the difference between hard and soft links? Can you create, use and delete both of them?
- Do you know how find, which and locate work? How and why?
The X Window System
- Do you know where the X configuration is stored?
- Have you understood the difference between a xserver, a window manager and a display manager?
- Can you redirect the graphical output of a program onto another machine?
- Do you know in which part of the xserver’s configuration fonts are defined?
Kernel
- Do you know what kernel modules are?
- How can you query them?
- How do you add or remove them?
- Do you know how to reconfigure a kernel?
- And how to compile and install it?
- What about LILO and Grub? Anything to do after you installed a new kernel? Why? Why not?
Boot, Initialization, Shutdown and Runlevels
- What are runlevels?
- Where are they defined?
- What about logfiles during a system boot?
- Do you know how to shut down a system safely? Even if still users are logged in?
Printing
- (Just my opinion: printing has been a freakin’ show ever since!)
- What printing systems do you know?
- What commands are used to print or to (re)configure the printing queue?
- Do you know what happens to you print job after you startet it?
- Do you know when (and when not) it has to be converted? And how this happens?
- Can you set up a remote printer?
- Even a remote printer on a Windows system?
Documentation
- Do you know the different type of man pages?
- And have you ever heared of whatis or apropos?
- Do you know which messages users get before or after the log on to a system? And why?
Shells, Scripting, Programming and Compiling
- Do you know all BASH fundamentals including profile, variables, environment or built-in functions?
- Can you write your own (simple) BASH scripts?
- Can you let your script perform a loop?
- Do you know how to compare strings or expressions (using test or if [ …])?
Administrative Tasks
- Do you (exactly!) know what a shadow system is? What is it? How does it work? And why is it used?
- Can you add, remove or reconfigure users or groups using tools?
- And can you even do it right within the config files?
- Do you know what the skeleton is used for?
- Do you know what syslog is used for and how it works?
- Can you read and/or rotate logfiles?
- Even if they are stil locked by a process?
- Do you know how to set up certian jobs which should be started once in the future or even frequently?
- Do you exactly know the syntax of a cronjob?
- Do you know how to make backups? How to store them and how to restore and use them?
- Can you set the correct time and date on your system?
- Even for the hardware clock?
Networking Fundamentals
- Have you understood what an IP address is and how it looks like?
- Can you calculate the netmask and broadcast of a network?
- Do you know what IP, TCP, UDP, ICMP mean? And the difference?
- Do you know the ports of the most common services?
- Can you change the IP address of a running system?
- Could you debug TCP connection failures?
- Do you know what routing means and when it is needed?
- Can you configure a modem as your PPP device?
Networking Services
- Do you know how incoming connections are handled?
- What is the difference between inetd and xinetd?
- How could you migrate?
- Can you block or allow certain connects using simple files under /etc/?
- Do you know how mail aliases are handled?
- And what if they have changed?
- Can you start, stop, restart and even configure an Apache webserver?
- Do you know how to mount or export filesystems using NFS?
- Can you mount a samba share?
- Do you know how to configure your system using DNS servers? How?
- Can you maintain a SSH server?
Security
- Do you know how incoming packets are handled by the kernel?
- Can you add new iptables rules?
- What are sockets?
- Do you know netstat?
- Do you know how to secure your system using xinetd, iptables, syslog, etc.?
- Can you block logins to your machine using a single file?
- Do you know how to secure access to files using permissions?
- Do you know what the umask is?

This is wow, isn’t it?!

If you’re using a linux/unix system for quite a while, most of the questions should sound easy. But, indeed, nearly every tool has switches and/or parameters, you’ve never ever heard of. So, I highly suggest, you should read at least the man pages of every program, tool or task I mentioned above.

If you do know the (of course right) answer for every question, you should have no problems passing the tests for LPIC-1 certification.

Posted in tech |

Tags: certification, linux, lpi, lpic-1, unix |

No Comments »

New theme!

February 7, 2009 |

Author Thomas

I did not like the old theme anymore, so I switched to a new one this night.

Posted in daily life |

Tags: blog |

No Comments »

Web 2.0 chaos

January 25, 2009 |

Author Thomas

This evening, I thought about setting up a flickr and a twitter account. Okay, okay… I did not just think about it, I did it. What a freakin’ show…

After setting up my accounts, which was very simple, I began to link ’em to facebook and back, I uploaded photos and back… And as I did this, I began wondering. “What the hell…?! What am I doing? All the information on every site and every picture on and in each account?”

I drawed a very simple drawing and I now have a question:

Is this, what Web 2.0 is supposed to be?

What do we have?

a homepage
a facebook account
a blog
a twitter account
a flickr account
maybe a online diary
maybe a guestbook
maybe a photo blog
maybe a gallery
maybe a Google account (grabbing feeds and more…)
maybe this and
maybe that

And they all talk to each other. Am I the only one on this planet, who is very, very confused with all those sites and applications? 🙂

Posted in daily life, tech |

Tags: blog, facebook, internet, pixelpost, studivz, twitter, web2.0 |

No Comments »

URL shortener / (un)faker

January 25, 2009 |

Author Thomas

There’re lots of URL un-shorteners (I call it: un-faker) all over the planet. It totally pisses me off, that I can never ever remember any of those f*ckin’ providers. That’s (I guess) the main reason, why I simply never used such URL redirectors.

So, I started my very own URL (un)faker. The site’s ready, the database is set up. You may shorten, fake and post your URLs now using:

http://unfake.it/

Enjoy! 😉

Posted in tech |

Tags: unfake, url shortening |

No Comments »

Google rating and ranking

January 21, 2009 |

Author Thomas

The last few days, I spent a lot of time, trying to optimize my websites for the Googlebot.

We all know, Google (as an empire, worth billions and billions of Dollars) is the leading website for searching and finding information throughout the world wide web. That’s okay. I, as a user, really like Google. Most of the time, I do find any information with Google. So, it’s worth its status.

As a maintainer of various websites, Google really stresses me out. It’s quite easy to announce a website. It also ain’t no problem to set up a Google account, use the webmaster tools, transmit a sitemap (or even more) and play with the settings.

I truely like the algorithm how Google works. PageRanking is no easy thing and Google does a great job. For me, as maintainer of very little websites, it’s unfortunately very hard, to get a high ranking. I’m quite often on the very first page of search responses – as long as the searches are explicit enough. To get a higher ranking for less specific searches, I would need lots of websites, linking back to mine. Okay, that’s the way how Google works. A link to website X on website Y is assumed to be a voting of website Y for website X. It’s a long way to manipulate that.

Whilst checking the raking and indexed pages of my sites nearly every day, I discovered a few things which really stresses me out:

1. Googlebot-Images Two images of one of my websites are in the index of Google images. These images are very old, probably indexed November 2007. The last few days, I studied lots and lots of websites and blogs and discovered: you nearly have no chance to manipulate the Googlebot-Images. Lucky you, if he comes around. Some people on some websites stated, that it may take up to 24 month to get your images indexed.

What a pitty.

2. dynamically generated websites

Once you have a website which serves it’s content dynamically, you either need Google to behave just like a real user and click from one page to another until Google indexed all your pages or you need to submit sitemaps which are also dynamically generated.

And that’s, what I did the last few days. I wrote quite simple scripts, which (depending on the type of website) fetches URLs from the database and generates a listing of links. Two such types are WordPress blog and PixelPost photoblog.

Another example: Gallery2. Unfortunately, I found no easy way to extract the links from the database, but Gallery2 comes with a built-in sitemap.

But here comes the next thing, which stresses me out: about 4 days ago, I submitted a Gallery2 sitemap with more than 900 URLs. Until today, only 220 URLs have been indexed.

As fast as Google’s searchengine works, so slow is indexing.

3. sitelinks

Sitelinks are a very nice idea and look quite impressive. But how to get sitelinks for your websites? Google says, there’s an algorithm, which calculates whether or not sitelinks would help users and automatically decides if sitelinks a generated. But until today, I found no information about how this algorithm works.

Damn, I’d love to have sitelinks.

Conclusion: the best and easiest way to get your websites on top of search results is to have them indexed (and be patient with that), to have specific content on your site and to rely on good search strings of interested users.

Good night so far…

Posted in tech |

Tags: Google |

No Comments »

photoblog: photo art

January 15, 2009 |

Author Thomas

Inspired by a co-worker and good friend of mine (http://www.oliver-schaef.de/), I set up a photoblog last night. It’s base system is http://www.pixelpost.org/. I installed a theme, various addons and reconfigured some PHP and HTML files to make it fit my needs.

This photoblog “photo art” is now accessible via http://photoart.thomasgericke.de/ or even http://www.thomasgericke.de/v4/interactive/pixelpost/ – it makes nearly no difference 🙂

http://photoart.thomasgericke.de/

Hopefully, I will manage it to upload a few more pictures and even more hopefully, I will take more interesting photos in the future.

Posted in photography, tech |

Tags: photography |

No Comments »

The passion of photography

August 15, 2008 |

Author Thomas

Since a few days, I (not very) frequently visit http://www.fotolism.us/, which basically is a non-commercial website of co-workers and friends. They share the passion of photography and while surfing on that site, I today got sort of re-addicted to photography, too.

I never was really good in taking pictures, but I always loved it.

So, I grabbed my camera: a FUJIFILM FinePix S6500fd, which I bought about a year ago.

I then decided to buy that one, simply because I wanted a better cam than my old Olympus. And I needed a device, which has high ISO-values and very short tripping time.

This evening, I got inspired by the great pictures, shown at http://www.fotolism.us/. So I took my cam, went upstairs and simply took some shots down the street out of a stairway window (5th floor).

This is, what I got:

Parameters were: 15s, ISO 100, F/8, 25mm

Keep on takin’ pictures, keep on bloggin’

Posted in photography |

Tags: photography |

No Comments »

Store Short Messages (SMS) in Database

July 16, 2008 |

Author Thomas

Since I am a “everything has to be stored”-freak, I created a way to extract short messages (SMS) from my cellphone and store them into a MySQL-Datebase. I also wrote a simple PHP frontend which allows to display messages, to filter them by sender’s or receiver’s name and also to arrange them to (so called) conversations.Conversations may be very useful if you write lots of messages with a certain person and some short messages are about a specific topic.Here’s what you need:

a cellphone with Symbian (e. g. UIQ3, S60v3)
BestMessageStorer
a Perl script
a MySQL database
an Apache webserver
a PHP interpreter
lots and lots of short messages 🙂

This is how to set up the database tables:

Database tables sms and sms_conversations:

CREATE TABLE sms (
  ID int(5),
  conversation int(3),
  src text,
  dest text,
  weekday text,
  year int(4),
  month text,
  day int(2),
  time text,
  text text,
  timestamp timestamp,
  PRIMARY KEY(ID)
)

CREATE TABLE sms_conversations (
  ID int(3),
  name text,
  PRIMARY KEY(ID)
)

I won’t post the Perl scripts (one for reverting the order of extracted short messages, one for storing them in the database) here. I neither won’t post the PHP file. You may get in touch with me to optain them.

Once you have this database up and running, you collected some short messages on your cellphone, getting them stored is as easy as 1-2-3. Here we go:

safe messages to a file using BestMessageStorer
transfer file onto a computer
safe file with encoding “ANSI” (or anything else, whatever fits your needs)
transfer file onto your server (on which you have the Perl scripts installed)
process the file using: cat FILENAME | ./reverse_file | ./process_sms
finished

Have fun! 🙂

Posted in tech |

4 Comments »