Feed aggregator

Dirk Eddelbuettel: Rcpp 0.12.2: More refinements

Planet Debian - Sun, 15/11/2015 - 21:45

The second update in the 0.12.* series of Rcpp is now on the CRAN network for GNU R. As usual, I will also push a Debian package. This follows the 0.12.0 release from late July which started to add some serious new features, and builds upon the 0.12.1 release in September. It also marks the sixth release this year where we managed to keep a steady bi-montly release frequency.

Rcpp has become the most popular way of enhancing GNU R with C or C++ code. As of today, 512 packages on CRAN depend on Rcpp for making analytical code go faster and further. That is up by more than fifty package from the last release in September (and we recently blogged about crossing 500 dependents).

This release once again features pull requests from two new contributors with Nathan Russell and Tianqi Chen joining in. As shown below, other recent contributors (such as such as Dan) are keeping at it too. Keep'em coming! Luke Tierney also email about a code smell he spotted and which we took care of. A big Thank You! to everybody helping with code, bug reports or documentation. See below for a detailed list of changes extracted from the NEWS file.

Changes in Rcpp version 0.12.2 (2015-11-14)
  • Changes in Rcpp API:

    • Correct return type in product of matrix dimensions (PR #374 by Florian)

    • Before creating a single String object from a SEXP, ensure that it is from a vector of length one (PR #376 by Dirk, fixing #375).

    • No longer use STRING_ELT as a left-hand side, thanks to a heads-up by Luke Tierney (PR #378 by Dirk, fixing #377).

    • Rcpp Module objects are now checked more carefully (PR #381 by Tianqi, fixing #380)

    • An overflow in Matrix column indexing was corrected (PR #390 by Qiang, fixing a bug reported by Allessandro on the list)

    • Nullable types can now be assigned R_NilValue in function signatures. (PR #395 by Dan, fixing issue #394)

    • operator<<() now always shows decimal points (PR #396 by Dan)

    • Matrix classes now have a transpose() function (PR #397 by Dirk fixing #383)

    • operator<<() for complex types was added (PRs #398 by Qiang and #399 by Dirk, fixing #187)

  • Changes in Rcpp Attributes:

    • Enable export of C++ interface for functions that return void.

  • Changes in Rcpp Sugar:

    • Added new Sugar function cummin(), cummax(), cumprod() (PR #389 by Nathan Russell fixing #388)

    • Enabled sugar math operations for subsets; e.g. x[y] + x[z]. (PR #393 by Kevin and Qiang, implementing #392)

  • Changes in Rcpp Documentation:

    • The NEWS file now links to GitHub issue tickets and pull requests.

    • The Rcpp.bib file with bibliographic references was updated.

Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Categories: Elsewhere

Lunar: Reproducible builds: week 29 in Stretch cycle

Planet Debian - Sun, 15/11/2015 - 18:51

What happened in the reproducible builds effort this week:

Toolchain fixes

Emmanuel Bourg uploaded eigenbase-resgen/ which uses of the scm-safe comment style by default to make them deterministic.

Mattia Rizzolo started a new thread on debian-devel to ask a wider audience for issues about the -Wdate-time compile time flag. When enabled, GCC and clang print warnings when __DATE__, __TIME__, or __TIMESTAMP__ are used. Having the flag set by default would prompt maintainers to remove these source of unreproducibility from the sources.

Packages fixed

The following packages have become reproducible due to changes in their build dependencies: bmake, cyrus-imapd-2.4, drobo-utils, eigenbase-farrago, fhist, fstrcmp, git-dpm, intercal, libexplain, libtemplates-parser, mcl, openimageio, pcal, powstatd, ruby-aggregate, ruby-archive-tar-minitar, ruby-bert, ruby-dbd-odbc, ruby-dbd-pg, ruby-extendmatrix, ruby-rack-mobile-detect, ruby-remcached, ruby-stomp, ruby-test-declarative, ruby-wirble, vtprint.

The following packages became reproducible after getting fixed:

Some uploads fixed some reproducibility issues, but not all of them:

Patches submitted which have not made their way to the archive yet:

  • #804729 on pbuilder by Reiner Herrmann: tell dblatex to build in a deterministic path.

The fifth and sixth armhf build nodes have been set up, resulting in five more builder jobs for armhf. More than 10,000 packages have now been identified as reproducible with the “reproducible” toolchain on armhf. (Vagrant Cascadian, h01ger)

Helmut Grohne and Mattia Rizzolo now have root access on all 12 build nodes used by reproducible.debian.net and jenkins.debian.net. (h01ger)

reproducible-builds.org is now linked from all package pages and the reproducible.debian.net dashboard. (h01ger)

profitbricks-build5-amd64 and profitbricks-build6-amd64, responsible for running amd64 tests now run 398.26 days in the future. This means that one of the two builds that are being compared will be run on a different minute, hour, day, month, and year. This is not yet the case for armhf. FreeBSD tests are also done with 398.26 days difference. (h01ger)

The design of the Arch Linux test page has been greatly improved. (Levente Polyak)

diffoscope development

Three releases of diffoscope happened this week numbered 39 to 41. It includes support for EPUB files (Reiner Herrmann) and Free Pascal unit files, usually having .ppu as extension (Paul Gevers).

The rest of the changes were mostly targetting at making it easier to run diffoscope on other systems. The tlsh, rpm, and debian modules are now all optional. The test suite will properly skip tests that need optional tools or modules when they are not available. As a result, diffosope is now available on PyPI and thanks to the work of Levente Polyak in Arch Linux.

Getting these versions in Debian was a bit cumbersome. Version 39 was uploaded with an expired key (according to the keyring on ftp.debian.org which will hopefully be updated soon) which is currently handled by keeping the files in the queue without REJECTing them. This prevented any other Debian Developpers to upload the same version. Version 40 was uploaded as a source-only upload… but failed to build from source which had the undesirable side effect of removing the previous version from unstable. The package faild to build from source because it was built passing -I to debbuild. This excluded the ELF object files and static archives used by the test suite from the archive, preventing the test suite to work correctly. Hopefully, in a nearby future it will be possible to implement a sanity check to prevent such mistakes in the future.

It has also been identified that ppudump outputs time in the system timezone without considering the TZ environment variable. Zachary Vance and Paul Gevers raised the issue on the appropriate channels.

strip-nondeterminism development

Chris Lamb released strip-nondeterminism version 0.014-1 which disables stripping Mono binaries as it is too aggressive and the source of the problem is being worked on by Mono upstream.

Package reviews

133 reviews have been removed, 115 added and 103 updated this week.

Chris West and Chris Lamb reported 57 new FTBFS bugs.


The video of h01ger and Chris Lamb's talk at MiniDebConf Cambridge is now available.

h01ger gave a talk at CCC Hamburg on November 13th, which was well received and sparked some interest among Gentoo folks. Slides and video should be available shortly.

Frederick Kautz has started to revive Dhiru Kholia's work on testing Fedora packages.

Your editor wish to once again thank #debian-reproducible regulars for reviewing these reports weeks after weeks.

Categories: Elsewhere

Simon McVittie: Discworld Noir in a Windows 98 virtual machine on Linux

Planet Debian - Sun, 15/11/2015 - 18:19

Discworld Noir was a superb adventure game, but is also notoriously unreliable, even in Windows on real hardware; using Wine is just not going to work. After many attempts at bringing it back into working order, I've settled on an approach that seems to work: now that qemu and libvirt have made virtualization and emulation easy, I can run it in a version of Windows that was current at the time of its release. Unfortunately, Windows 98 doesn't virtualize particularly well either, so this still became a relatively extensive yak-shaving exercise.

These instructions assume that /srv/virt is a suitable place to put disk images, but you can use anywhere you want.

The emulated PC

After some trial and error, it seems to work if I configure qemu to emulate the following:

  • Fully emulated hardware instead of virtualization (qemu-system-i386 -no-kvm)
  • Intel Pentium III
  • Intel i440fx-based motherboard with ACPI
  • Real-time clock in local time
  • No HPET
  • 256 MiB RAM
  • IDE primary master: IDE hard disk (I used 30 GiB, which is massively overkill for this game; qemu can use sparse files so it actually ends up less than 2 GiB on the host system)
  • IDE primary slave, secondary master, secondary slave: three CD-ROM drives
  • PS/2 keyboard and mouse
  • Realtek AC97 sound card
  • Cirrus video card with 16 MiB video RAM

A modern laptop CPU is an order of magnitude faster than what Discworld Noir needs, so full emulation isn't a problem, despite being inefficient.

There is deliberately no networking, because Discworld Noir doesn't need it, and a 17 year old operating system with no privilege separation is very much not safe to use on the modern Internet!

Software needed
  • Windows 98 installation CD-ROM as a .iso file (cp /dev/cdrom windows98.iso) - in theory you could also use a real optical drive, but my laptop doesn't usually have one of those. I used the OEM disc, version 4.10.1998 (that's the original Windows 98, not the Second Edition), which came with a long-dead PC, and didn't bother to apply any patches.
  • A Windows 98 license key. Again, I used an OEM key from a past PC.
  • A complete set of Discworld Noir (English) CD-ROMs as .iso files. I used the UK "Sold Out Software" budget release, on 3 CDs.
  • A multi-platform Realtek AC97 audio driver.
Windows 98 installation

It seems to be easiest to do this bit by running qemu-system-i386 manually:

qemu-img create -f qcow2 /srv/virt/discworldnoir.qcow2 30G qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime

Don't start the installation immediately. Instead, boot the installation CD to a DOS prompt with CD-ROM support. From here, run


and create a single partition filling the emulated hard disk. When finished, hard-reboot the virtual machine (press Ctrl+C on the qemu-system-i386 process and run it again).

The DOS FORMAT.COM utility is on the Windows CD-ROM but not in the root directory or the default %PATH%, so you'll have to run:

d:\win98\format c:

to create the FAT filesystem. You might have to reboot again at this point.

The reason for doing this the hard way is that the Windows 98 installer doesn't detect qemu as supporting ACPI. You want ACPI support, so that Windows will issue IDLE instructions from its idle loop, instead of occupying a CPU core with a busy-loop. To get that, boot to a DOS prompt again, and use:

setup /p j /iv

/p j forces ACPI support (Thanks to "Richard S" on the Virtualbox forums for this tip.) /iv is unimportant, but it disables the annoying "billboards" during installation, which advertised once-exciting features like support for dial-up modems and JPEG wallpaper.

I used a "Typical" installation; there didn't seem to be much point in tweaking the installed package set when everything is so small by modern standards.

Windows 98 has built-in support for the Cirrus VGA card that we're emulating, so after a few reboots, it should be able to run in a semi-modern resolution and colour depth. Discworld Noir apparently prefers a 640 × 480 × 16-bit video mode, so right-click on the desktop background, choose Properties and set that up.

Audio drivers

This is the part that took me the longest to get working. Of the sound cards that qemu can emulate, Windows 98 only supports the SoundBlaster 16 out of the box. Unfortunately, the Soundblaster 16 emulation in qemu is incomplete, and in particular version 2.1 (as shipped in Debian 8) has a tendency to make Windows lock up during boot.

I've seen advice in various places to emulate an Eqsonic ES1370 (SoundBlaster AWE 64), but that didn't work for me: one of the drivers I tried caused Windows to lock up at a black screen during boot, and the other didn't detect the emulated hardware.

The next-oldest sound card that qemu can emulate is a Realtek AC97, which was often found integrated into motherboards in the late 1990s. This one seems to work, with the "A400" driver bundle linked above. For Windows 98 first edition, you need a driver bundle that includes the old "VXD" drivers, not just the "WDM" drivers supported by Second Edition and newer.

The easiest way to get that into qemu seems to be to turn it into a CD image:

genisoimage -o /srv/virt/discworldnoir-drivers.iso WDM_A400.exe qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/windows98.iso \ -drive media=cdrom,format=raw,file=/srv/virt/discworldnoir-drivers.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

Run the installer from E:, then reboot with the Windows 98 CD inserted, and Windows should install the driver.

Installing Discworld Noir

Boot up the virtual machine with CD 1 in the emulated drive:

qemu-system-i386 -hda /srv/virt/discworldnoir.qcow2 \ -drive media=cdrom,format=raw,file=/srv/virt/DWN_ENG_1.iso \ -no-kvm -vga cirrus -m 256 -cpu pentium3 -localtime -soundhw ac97

You might be thinking "... why not insert all three CDs into D:, E: and F:?" but the installer expects subsequent disks to appear in the same drive where CD 1 was initially, so that won't work. Instead, when prompted for a new CD, switch to the qemu monitor with Ctrl+Alt+2 (note that this is 2, not F2). At the (qemu) prompt, use info block to see a list of emulated drives, then issue a command like

change ide0-cd1 /srv/virt/DWN_ENG_2.iso

to swap the CD. Then switch back to Windows' console with Ctrl+Alt+1 and continue installation. I used a Full installation of Discworld Noir.

Transferring the virtual machine to GNOME Boxes

Having finished the "control freak" phase of installation, I wanted a slightly more user-friendly way to run this game, so I transferred the virtual machine to be used by libvirtd, which is the backend for both GNOME Boxes and virt-manager:

virsh create discworldnoir.xml

Here is the configuration I used. It's a mixture of automatic configuration from virt-manager, and hand-edited configuration to make it match the qemu-system-i386 command-line.

Running the game

If all goes well, you should now see a discworldnoir virtual machine in GNOME Boxes, in which you can boot Windows 98 and play the game. Have fun!

Categories: Elsewhere

Daniel Pocock: Migrating data from Windows phones

Planet Debian - Sat, 14/11/2015 - 22:18

Many of the people who have bought Windows phones seek relief sooner or later. Sometimes this comes about due to peer pressure or the feeling of isolation, in other cases it is the frustration of the user interface or the realization that they can't run cool apps like Lumicall.

Frequently, the user has been given the phone as a complimentary upgrade when extending a contract without perceiving the time, effort and potential cost involved in getting their data out of the phone, especially if they never owned a smartphone before.

When a Windows phone user does decide to cut their losses, they are usually looking to a friend or colleague with technical expertise to help them out. Personally, I'm not sure that anybody I would regard as an IT expert has ever had a Windows phone though, meaning that many experts are probably also going to be scratching their heads when somebody asks them for help. Therefore, I've put together this brief guide to help deal with these phones more expediently when they are encountered.

The Windows phones have really bad support for things like CalDAV and WebDAV so don't get your hopes up about using such methods to backup the data to any arbitrary server. Searching online you can find some hacks that involve creating a Google or iCloud account in the phone and then modifying the advanced settings to send the data to an arbitrary server. These techniques vary a lot between specific versions of the Windows Phone OS and so the techniques I've described below are probably easier.

Identify the Windows Live / Hotmail account

The user may not remember or realize that a Microsoft account was created when they first obtained the phone. It may have been created for them by the phone, a friend or the salesperson in the phone shop.

Look in the settings (Accounts) to find the account ID / email address. If the user hasn't been using this account, they may not recognize it and probably won't know the password for it. It is essential to try and obtain (or reset) the password before going any further, so start with the password recovery process. Microsoft may insist on sending a password reset email to some other email address that the user has previously provided or linked to their phone.

Extracting data from the phone

In many cases, the easiest way to extract the data is to download it from Microsoft live.com rather than extracting it from the phone. Even if the user doesn't realize it, the data is probably all replicated in live.com and so there is no further loss of privacy by logging in there to extract it.

Set up an IMAP mail client

An IMAP client will be used to download the user's emails (from the live.com account they may never have used) and SMS.

Install Mozilla Thunderbird (IceDove on Debian), GNOME Evolution or a similar program on the user's PC.

Configure the IMAP mail client to connect to the live.com account. Some clients, like Thunderbird, will automatically set up all the server details when you enter the live.com account ID. For manual account setup, the details here may help.

Email backup

If the user was not using the live.com account ID for email correspondence, there may not be a lot of mail in it. There may be some billing receipts or other things that are worth keeping though.

Create a new folder (or set of folders) in the user's preferred email account and drag and drop the messages from the live.com Inbox to the new folder(s).

SMS backup

SMS backup can also be done through live.com. It is slightly more complicated than email backup, but similar.

  • In the live.com Outlook email index page, look for the settings button and click Manage Categories.
  • Enable the Contacts and Photos categories with a tick in each of them.
  • Go back to the main Inbox page and look for the categories section on the bottom left-hand side of the screen, under the folder list. Click the Contacts category.
  • The page may now appear blank. That is normal.
  • On the top right-hand corner of the page, click the Arrange menu and choose Conversation.
  • All the SMS messages should now appear on the screen.
  • Under the mail folders list on the left-hand side of the page, click to create a new folder with a name like SMS.
  • Select all the SMS messages and look for the option to move them to a folder. Send them to the SMS folder you created.
  • Now use the IMAP mail client to locate the SMS folder and copy everything from there to a new folder in the user's preferred mail server or local disk.
Contacts backup

On the top left-hand corner of the live.com email page, there is a chooser to select other live.com applications. Select People.

You should now see a list of all the user's contacts. Look for the option to export them to Outlook and other programs. This will export them as a CSV file.

You can now import the CSV file into another application. GNOME Evolution has an import wizard with an option for Outlook file format. To load the contacts into a WebDAV address book, such as DAViCal, configure the address book in Evolution and then select it as the destination when running the CSV import wizard.

WARNING: beware of using the Mozilla Thunderbird address book with contact data from mobile devices and other sources. It can't handle more than two email addresses per contact and this can lead to silent data loss if contacts are not fully saved.

Calendar backup

Now go to the live.com application chooser again and select the calendar application. Microsoft provides instructions to extract the calendar, summarised here:

  • Look for the Share button at the top somewhere and click it.
  • On the left-hand side of the page, click Get a link
  • On the right-hand side, choose Show event details to ensure you get a full calendar and then click Create underneath it.
  • Look for the link with a webcals prefix. If you are downloading with a tool like wget, change the scheme prefix to https. Fetch the file from this link and save it with an ics extension.
  • Inspect the ics calendar file to make sure it looks like real iCalendar data.

You can now import the ics file into another application. GNOME Evolution has an import wizard with an option for iCalendar file format. To load the calendar entries into a CalDAV server, such as DAViCal, configure the calendar server in Evolution and then select it as the destination when running the import wizard.

Backup the user's photos, videos and other data files

Hopefully you will be able to do this step without going through live.com. Try enabling the MTP or PTP mode in the phone and attach it to the computer using the USB cable. Hopefully the computer will recognize it in at least one of those modes.

Use the computer's file manager or another tool to simply backup the entire directory structure.

Reset the phone to factory defaults

Once the user has their hands on a real phone, it is likely they will never want to look at that Windows phone again. It is time to erase the Windows phone, there is no going back.

Go to the Settings and About and tap the factory reset option. It is important to do this before obliterating the live.com account, otherwise there are scenarios where you could be locked out of the phone and unable to erase it.

Erasing may take some time. The phone will reboot and then display an animation of some gears spinning around for a few minutes and then reboot again. Wait for it to completely erase.

Permanently close the Microsoft live.com account

Keeping track of multiple accounts and other services is tedious and frustrating for most people, especially with services that try to force the user to receive email in different places.

You can help eliminate user fatigue by helping them permanently close the live.com account so they never have to worry about it again.

Follow the instructions on the Microsoft site.

At some point it will suggest certain actions you should take before closing the account, most can be ignored. One thing you should do is remove the link between the live.com account ID and the phone. It is a good idea to do this as otherwise you may have problems erasing the device, if you haven't already done so. Before completely closing the account, also verify that the factory reset of the phone completed successfully.

Dispose of the Windows phone safely

If you can identify any faults with the phone, the user may be able to return it under the terms of the warranty. Some phone companies may allow the user to exchange it for something more desirable when it fails under warranty.

It may be tempting to sell the phone to a complete stranger on eBay or install a custom ROM on it. In practice, neither option may be worth the time and effort involved. You may be tempted to put it beyond use so nobody else will suffer with it, but please try to do so in a way that is respectful of the environment.

Putting the data into a new phone

Prepare the new phone with a suitable ROM such as Replicant or Cyanogenmod.

Install the F-Droid app on the new phone.

From F-droid, install the DAVdroid app. DAVdroid will allow you to quickly sync the new phone against any arbitrary CalDAV and WebDAV server to populate it with the user's calendar and contact / address book data.

Now is a good time to install other interesting apps like Lumicall, Conversations and K-9 Mail.

Categories: Elsewhere

Paul Johnson: Tell me your Celebr8D8 plans

Planet Drupal - Sat, 14/11/2015 - 12:40

I am spearheading the Drupal 8 release celebrations on social media Thursday 19th November. Perhaps, like me, you have been working behind the scenes on a personal project to mark this significant occasion. If you have a website, special party, publicity stunt planned I'm keen to know about it. Come the big day I will use Drupal's social media and @Celebr8D8 to tell the world about your event, site, stunt.

Use my Drupal.org contact form to let me know about your release day plans. Thanks!

Categories: Elsewhere

agoradesign: Including image styles in your module or theme in Drupal 8

Planet Drupal - Sat, 14/11/2015 - 11:29
Today I have a small practical tipp for those, who want to include a custom image style in their module or theme in Drupal 8, including a best practice proposal.
Categories: Elsewhere

Juliana Louback: PaperTrail - Powered by IBM Watson

Planet Debian - Sat, 14/11/2015 - 11:06

On the final semester of my MSc program at Columbia SEAS, I was lucky enough to be able to attend a seminar course taught by Alfio Gliozzo entitled Q&A with IBM Watson. A significant part of the course is dedicated to learning how to leverage the services and resources available on the Watson Developer Cloud. This post describes the course project my team developed, the PaperTrail application.

Project Proposal

Create an application to assist in the development of future academic papers. Based on a paper’s initial proposal, Paper Trail predicts publications to be used as references or acknowledgement of prior art and provides a trend analysis of major topics and methods.

The objective is to speed the discovery of relevant papers early in the research process, and allow for early assessment of the depth of prior research concerning the initial proposal.

Meet the Team

Wesley Bruning, Software Engineer, MSc. in Computer Science

Xavier Gonzalez, Industrial Engineer, MSc. in Data Science

Juliana Louback, Software Engineer, MSc. in Computer Science

Aaron Zakem, Patent Attorney, MSc. in Computer Science

Prior Art

A significant amount of attention has been given to this topic over the past few decades. The table below shows the work the team deemed most relevant due to recency, accuracy and similarity of functionality.

The variation in accuracy displayed is a result of experimentation with different dataset sizes and algorithm variations. More information and details can be found in the prior art report.

The main differential of PaperTrail is providing a form of access to the citation prediciton and trend analysis algorithm. With the exception of the project by McNee et al., these algorithmns aren’t currently available for general use. The application on researchindex.net is open to use but its objective is to rank publications and authors for given topics.


Citation Prediction: PaperTrail builds on the work done by Wolski’s team in Fall 2014. This algorithmn builds a reference graph used to define research communities, with an associated vector of topic scores generated by an LDA model. The papers in each research community are then ranked by importance within the community with a custom ranking algorithm. When a target document is given to algorithm as input, the LDA model is used to generate a vector of topics that are present in the document. The communities with the most similar topic vectors are selected and the publications within these communities with highest rank and greatest similarity to the input document are recommended as references. A more detailed description can be found here.

Trend Analysis: Initially, the idea was to use the AlchemyData News API to obtain statistics pertaining to the amount of publications on a given topic over time. However, with the exception of buzz-words (i.e. ‘big data’), many more specialized topics appeared very infrequently in news articles, if at all. This isn’t entirely surprising given the target audience of PaperTrail. As a work around, we use the Alchemy Language API to extract keywords from the abstracts in the dataset, in addition to relevance scores. The PaperTrail database could then be queried for entry counts for a given year and keyword to provide an indication of publication trends in academia. Note that the Alchemy Language API extracts multiple-word ‘keywords’ as well as single words.


To maintain consistency with Wolski’s project, we are using the DBLP data as made available on aminer.org. The DBLP-Citation-network V5 dataset contains 1,572,277 entries; we are limited to the use of entries that contain both abstracts and citations, bringing the dataset size down to 265,865 entries.


A high-level visualization of the project architecture is displayed below. Before launching PaperTrail, it’s necessary to train Wolski’s algorithm offline. Currently any documentation with regard to the performance of said algorithm is unavailable; the PaperTrail project will include an evaluation phase and report the findings made.

The PaperTrail app and database will be hosted on the Bluemix Platform.

Status Report

Phases completed:

  • Project design

  • Prior art research

  • Data cleansing

  • Development and deployment of an alpha version of the PaperTrail app

Phases under development:

  • Algorithm training and evaluation

  • Keyword extraction

  • MapReduce of publication frequency by year and topic

  • Data visualization component

Categories: Elsewhere

Craig Small: Mixing pysnmp and stdin

Planet Debian - Sat, 14/11/2015 - 08:04

Depending on the application, sometimes you want to have some socket operations going (such as loading a website) and have stdin being read. There are plenty of examples for this in python which usually boil down to making stdin behave like a socket and mixing it into the list of sockets select() cares about.

A while ago I asked an email list could I have pysnmp use a different socket map so I could add my own sockets in (UDP, TCP and a zmq to name a few) and the Ilya the author of pysnmp explained how pysnmp can use a foreign socket map.

This sample code below is merely an mixture of Ilya’s example code and the way stdin gets mixed into the fold.  I have also updated to the high-level pysnmp API which explains the slight differences in the calls.

  1. from time import time
  2. import sys
  3. import asyncore
  4. from pysnmp.hlapi import asyncore as snmpAC
  5. from pysnmp.carrier.asynsock.dispatch import AsynsockDispatcher
  8. class CmdlineClient(asyncore.file_dispatcher):
  9. def handle_read(self):
  10. buf = self.recv(1024)
  11. print "you said {}".format(buf)
  14. def myCallback(snmpEngine, sendRequestHandle, errorIndication,
  15. errorStatus, errorIndex, varBinds, cbCtx):
  16. print "myCallback!!"
  17. if errorIndication:
  18. print(errorIndication)
  19. return
  20. if errorStatus:
  21. print('%s at %s' % (errorStatus.prettyPrint(),
  22. errorIndex and varBinds[int(errorIndex)-1] or '?')
  23. )
  24. return
  26. for oid, val in varBinds:
  27. if val is None:
  28. print(oid.prettyPrint())
  29. else:
  30. print('%s = %s' % (oid.prettyPrint(), val.prettyPrint()))
  32. sharedSocketMap = {}
  33. transportDispatcher = AsynsockDispatcher()
  34. transportDispatcher.setSocketMap(sharedSocketMap)
  35. snmpEngine = snmpAC.SnmpEngine()
  36. snmpEngine.registerTransportDispatcher(transportDispatcher)
  37. sharedSocketMap[sys.stdin] = CmdlineClient(sys.stdin)
  39. snmpAC.getCmd(
  40. snmpEngine,
  41. snmpAC.CommunityData('public'),
  42. snmpAC.UdpTransportTarget(('', 161)),
  43. snmpAC.ContextData(),
  44. snmpAC.ObjectType(
  45. snmpAC.ObjectIdentity('SNMPv2-MIB', 'sysDescr', 0)),
  46. cbFun=myCallback)
  48. while True:
  49. asyncore.poll(timeout=0.5, map=sharedSocketMap)
  50. if transportDispatcher.jobsArePending() or transportDispatcher.transportsAreWorking():
  51. transportDispatcher.handleTimerTick(time())

Some interesting lines from the above code:

  • Lines 8-11 are the stdin class that is called (or rather its handle_read method is) when there is text available on stdin.
  • Line 34 is where pysnmp is told to use our socket map and not its inbuilt one
  • Line 37 is where we have used the socket map to say if we get input from stdin, what is the handler.
  • Lines 39-46 are sending a SNMP query using the high-level API
  • Lines 48-51 are my simple socket poller

With all this I can handle keyboard presses and network traffic, such as a simple SNMP poll.

Categories: Elsewhere

ActiveLAMP: Visual Regression Testing with Shoov.io

Planet Drupal - Sat, 14/11/2015 - 03:00

Shoov.io is a nifty website testing tool created by Gizra. We at ActiveLAMP were first introduced to Shoov.io at DrupalCon LA, in fact, Shoov.io is built on, you guessed it, Drupal 7 and it is an open source visual regression toolkit.

Categories: Elsewhere

Drupal core announcements: Drupal core security release window on Wednesday, November 18

Planet Drupal - Fri, 13/11/2015 - 22:24
Start:  2015-11-18 (All day) America/New_York Organizers:  David_Rothstein Event type:  Online meeting (eg. IRC meeting)

The monthly security release window for Drupal 6 and Drupal 7 core will take place on Wednesday, November 18.

This does not mean that a Drupal core security release will necessarily take place on that date for either the Drupal 6 or Drupal 7 branches, only that you should prepare to look out for one (and be ready to update your Drupal sites in the event that the Drupal security team decides to make a release).

There will be no bug fix/feature release on this date; the next window for a Drupal core bug fix/feature release is Wednesday, December 2 (and before that, on November 19, Drupal 8.0.0 is scheduled to be released).

For more information on Drupal core release windows, see the documentation on release timing and security releases, and the discussion that led to this policy being implemented.

Categories: Elsewhere

Francois Marier: How Tracking Protection works in Firefox

Planet Debian - Fri, 13/11/2015 - 21:40

Firefox 42, which was released last week, introduced a new feature in its Private Browsing mode: tracking protection.

If you are interested in how this list is put together and then used in Firefox, this post is for you.

Safe Browsing lists

There are many possible ways to download URL lists to the browser and check against that list before loading anything. One of those is already implemented as part of our malware and phishing protection. It uses the Safe Browsing v2.2 protocol.

In a nutshell, the way that this works is that each URL on the block list is hashed (using SHA-256) and then that list of hashes is downloaded by Firefox and stored into a data structure on disk:

  • ~/.cache/mozilla/firefox/XXXX/safebrowsing/mozstd-track* on Linux
  • ~/Library/Caches/Firefox/Profiles/XXXX/safebrowsing/mozstd-track* on Mac
  • C:\Users\XXXX\AppData\Local\mozilla\firefox\profiles\XXXX\safebrowsing\mozstd-track* on Windows

This sbdbdump script can be used to extract the hashes contained in these files and will output something like this:

$ ~/sbdbdump/dump.py -v . - Reading sbstore: mozstd-track-digest256 [mozstd-track-digest256] magic 1231AF3B Version 3 NumAddChunk: 1 NumSubChunk: 0 NumAddPrefix: 0 NumSubPrefix: 0 NumAddComplete: 1696 NumSubComplete: 0 [mozstd-track-digest256] AddChunks: 1445465225 [mozstd-track-digest256] SubChunks: ... [mozstd-track-digest256] addComplete[chunk:1445465225] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5 ... [mozstd-track-digest256] MD5: 81a8becb0903de19351427b24921a772

The name of the blocklist being dumped here (mozstd-track-digest256) is set in the urlclassifier.trackingTable preference which you can find in about:config. The most important part of the output shown above is the addComplete line which contains a hash that we will see again in a later section.

List lookups

Once it's time to load a resource, Firefox hashes the URL, as well as a few variations of it, and then looks for it in the local lists.

If there's no match, then the load proceeds. If there's a match, then we do an additional check against a pairwise allowlist.

The pairwise allowlist (hardcoded in the urlclassifier.trackingWhitelistTable pref) is designed to encode what we call "entity relationships". The list groups related domains together for the purpose of checking whether a load is first or third party (e.g. twitter.com and twimg.com both belong to the same entity).

Entries on this list (named mozstd-trackwhite-digest256) look like this:


which translates to "if you're on the twitter.com site, then don't block resources from twimg.com.

If there's a match on the second list, we don't block the load. It's only when we get a match on the first list and not the second one that we go ahead and cancel the network load.

If you visit our test page, you will see tracking protection in action with a shield icon in the URL bar. Opening the developer tool console will expose the URL of the resource that was blocked:

The resource at "https://trackertest.org/tracker.js" was blocked because tracking protection is enabled.

Creating the lists

The blocklist is created by Disconnect according to their definition of tracking.

The Disconnect list is on their Github page, but the copy we use in Firefox is the copy we have in our own repository. Similarly the Disconnect entity list is from here but our copy is in our repository. Should you wish to be notified of any changes to the lists, you can simply subscribe to this Atom feed.

To convert this JSON-formatted list into the binary format needed by the Safe Browsing code, we run a custom list generation script whenever the list changes on GitHub.

If you run that script locally using the same configuration as our server stack, you can see the conversion from the original list to the binary hashes.

Here's a sample entry from the mozstd-track-digest256.log file:

[m] twimg.com >> twimg.com/ [canonicalized] twimg.com/ [hash] e48768b0ce59561e5bc141a52061dd45524e75b66cad7d59dd92e4307625bdc5

and one from mozstd-trackwhite-digest256.log:

[entity] Twitter >> (canonicalized) twitter.com/?resource=twimg.com, hash a8e9e3456f46dbe49551c7da3860f64393d8f9d96f42b5ae86927722467577df

This in combination with the sbdbdump script mentioned earlier, will allow you to audit the contents of the local lists.

Serving the lists

The way that the binary lists are served to Firefox is through a custom server component written by Mozilla: shavar.

Every hour, Firefox requests updates from shavar.services.mozilla.com. If new data is available, then the whole list is downloaded again. Otherwise, all it receives in return is an empty 204 response.

Should you want to play with it and run your own server, follow the installation instructions and then go into about:config to change these preferences to point to your own instance:

browser.trackingprotection.gethashURL browser.trackingprotection.updateURL

Note that on Firefox 43 and later, these prefs have been renamed to:

browser.safebrowsing.provider.mozilla.gethashURL browser.safebrowsing.provider.mozilla.updateURL Learn more

If you want to learn more about how tracking protection works in Firefox, you can find all of the technical details on the Mozilla wiki or you can ask questions on our mailing list.

Thanks to Tanvi Vyas for reviewing a draft of this post.

Categories: Elsewhere

Drupal core announcements: CHANGELOG.txt is being updated to prepare for D8 release. What changes to you want highlighted?

Planet Drupal - Fri, 13/11/2015 - 20:52

CHANGELOG.txt is being updated to prepare for D8 release. What changes to you want highlighted?

See issue: https://www.drupal.org/node/2606334

Categories: Elsewhere

John Goerzen: Memories of a printer

Planet Debian - Fri, 13/11/2015 - 19:42

I have a friend who hates printers. I’ll call him “Mark”, because that, incidentally, is his name. His hatred for printers is partly my fault, but that is, ahem, a story for another time that involves him returning from a battle with a printer with a combination of weld dust, toner, and a deep scowl on his face.

I also tend to hate printers. Driver issues, crinkled paper, toner spilling all over the place…. everybody hates printers.

But there is exactly one printer that I have never hated. It’s almost 20 years old, and has some stories to tell.

Nearly 20 years ago, I was about to move out of my parents’ house, and I needed a printer. I bought a LaserJet 6MP. This printer ought to have been made by Nokia. It’s still running fine, 18 years later. It turned out to be one of the best investments in computing equipment I’ve ever made. Its operating costs, by now, are cheaper than just about any printer you can buy today — less than one cent per page. It has been supported by every major operating system for years.

PostScript was important, because back then running Ghostscript to convert to PCL was both slow and a little error-prone. PostScript meant I didn’t need a finicky lpr/lprng driver on my Linux workstation to print. It just… printed. (Hat tip to anyone else that remembers the trial and error of constructing an /etc/printcap that would print both ASCII and PostScript files correctly!)

Out of this printer have come plane and train tickets, taking me across the country to visit family and across the world to visit friends. It’s printed resumes and recipes, music and university papers. I even printed wedding invitations and envelopes on them two years ago, painstakingly typeset in LaTeX and TeXmacs. I remember standing at the printer in the basement one evening, feeding envelope after envelope into the manual feed slot. (OK, so it did choke on a couple of envelopes, but overall it all worked great.)

The problem, though, is that it needs a parallel port. I haven’t had a PC with one of those in a long while. A few years ago, in a moment of foresight, I bought a little converter box that has an Ethernet port and a parallel port, with the idea that it would be pay for itself by letting me not maintain some old PC just to print. Well, it did, but now the converter box is dying! And they don’t make them anymore. So I finally threw in the towel and bought a new LaserJet.

It cost a third of what the 6MP did, has a copier, scanner, prints in color, does duplexing, has wifi… and, yes, still supports PostScript — strangely enough, a deciding factor in going with HP over Brother once again. (The other was image quality)

We shall see if I am still using it when I’m 50.

Categories: Elsewhere

Daniel Pocock: How much video RAM for a 4k monitor?

Planet Debian - Fri, 13/11/2015 - 18:49

I previously wrote about my experience moving to a 4K monitor.

I've been relatively happy with it except for one thing: I found that 1GB video RAM simply isn't sufficient for a stable system. This wasn't immediately obvious as it appeared to work in the beginning, but over time I've observed that it was not sufficient.

I'm not using it for gaming or 3D rendering. My typical desktop involves several virtual workspaces with terminal windows, web browsers, a mail client, IRC and Eclipse. Sometimes I use vlc for playing media files.

Using the nvidia-settings tool, I observed that the Used Dedicated memory statistic would frequently reach the maximum, 1024MB. On a few occasions, X crashed with errors indicating it was out of memory.

After observing these problems, I put another card with 4GB video RAM into the system and I've observed it using between 1024 MB and 1300 MB at any one time. This leaves me feeling that people with only basic expectations for their desktop should aim for at least 2GB video RAM for 4k.

That said, I've continued to enjoy many benefits of computing with a 4K monitor. In addition to those mentioned in my previous blog, here are some things that were easier for me with 4K:

  • Using gitk to look through many commits on the master branch of reSIProcate and cherry-pick some things to the resiprocate-1.9 branch. gitk only used half the screen and I was able to use the right hand side of the screen to look at the code in an editor in more detail.
  • Simultaneously monitoring logs from two Android devices running Lumicall and a repro SIP proxy server in three terminal windows arranged side by side, up to 125 lines of text in each.
  • Using WebRTC sites in the Mozilla browser while having a browser console window, source code and SIP proxy logs all open at the same time, none of them overlapping.

You can do much of this with a pair of monitors, but there is something quite nice about doing it all on a single 4K screen.

Categories: Elsewhere

Wunderkraut Belgium: Captain Drupal returns

Planet Drupal - Fri, 13/11/2015 - 15:46
In the run up to Drupal 8’s launch on the 19th we’ll be posting a Captain Drual cartoon every day from Monday 16th November so you can follow our hero’s journey to enter the era of Drupal 8!
Categories: Elsewhere

Daniel Pocock: Building teams around SIP and XMPP in Debian and Fedora

Planet Debian - Fri, 13/11/2015 - 12:52

I've recently started a discussion on the Fedora devel mailing list about building a team to collaborate on RTC services (SIP, XMPP, TURN and WebRTC) for the Fedora community. We already started a similar team within Debian.

This isn't only for developers or package maintainers and virtually anybody with a keen interest in free software can help. Testing different softphones and putting screenshots on the wiki can help a lot (the Debian wiki already provides some examples). The FedRTC.org site is not intended to be an advertisement for my web design skills and anybody with expertise in design would be very welcome to contribute.

Teamwork in this endeavor can provide many benefits:

  • Sharing knowledge about RTC, for use within our communities and also for other communities using the free and open technology
  • Engaging with collaborators who are not involved in packaging teams, for example, the Debian RTC team has also had interest from upstream developers who are not on other Debian or Fedora mailing lists
  • Minimizing the effort required by the system administrators (the DSA team in Debian or Infrastructure team in Fedora) by triaging user problems and planning and testing any proposed changes.
  • Freeing up developer time to work on new features, such as the exciting work I'm doing on telepathy-resiprocate.

There are also many opportunities for project work that go beyond traditional packaging responsibilities. Wouldn't it be interesting to find ways to integrate the publish/subscribe capabilities of SIP and XMPP with the Fedmsg infrastructure?

Bringing XMPP to FedoraProject.org

We recently launched XMPP for debian.org and it would not be hard to replicate for FedoraProject.org users. Sure, some people are happy running their own XMPP servers. There are just as many people who prefer to focus on development and have something like XMPP provided for them.

With the strong emphasis on building a roster/buddy-list, XMPP can also help to facilitate long-term engagement in the community and users may identify more closely with the project.

I haven't offered XMPP on the FedRTC.org trial service because it would be inconvenient for people to migrate buddy lists to the FedoraProject.org domain when the service is officially adopted.

Collaboration across communities

There are various other places where we can share knowledge between teams in different communities and people are invited to participate.

The Free-RTC mailing list is a great place to discuss free RTC strategies and initiatives.

The XMPP operators mailing list provides a forum to discuss operational issues in the XMPP space, such as keeping out the spammers.

Would you like to participate?

Please consider joining some of the mailing lists I've mentioned, replying to the thread on the Fedora devel mailing list, volunteering for the Debian RTC team or emailing me personally.

Categories: Elsewhere

Krimson: Drupal 8 will be released on November 19

Planet Drupal - Fri, 13/11/2015 - 12:36
We're ready to celebrate and build (even more) amazing Drupal 8 websites.  On November 19 we'll put our Drupal 8 websites in the spotlight...be sure to come back and check out our website.
Categories: Elsewhere

Krimson: 77 of us are going

Planet Drupal - Fri, 13/11/2015 - 12:36
People from across the globe who use, develop, design and support the Drupal platform will be brought together during a full week dedicated to networking, Drupal 8 and sharing and growing Drupal skills. As we have active hiring plans we’ve decided that this year’s approach should have a focus on meeting people who might want to work for Wunderkraut and getting Drupal 8 out into the world. As Signature Supporting Partner we wanted as much people as possible to attend the event. We managed to get 77 Wunderkrauts on the plane to Barcelona!  From Belgium alone we have an attendance of 17 people. The majority of our developers will be participating in sprints (a get-together for focused development work on a Drupal project) giving all they got together with all other contributors at DrupalCon. We look forward to an active DrupalCon week.   If you're at DrupalCon and feel like talking to us. Just look for the folks with Wunderkraut carrot t-shirts or give Jo a call at his cell phone +32 476 945 176.
Categories: Elsewhere

Krimson: Watch our epic Drupal 8 promo video

Planet Drupal - Fri, 13/11/2015 - 12:36
Drupal 8 is coming and everyone is sprinting hard to get it over the finish line. To boost contributor morale we’ve made a motivational Drupal 8 video that will get them into the zone and tackling those last critical issues in no time.
Categories: Elsewhere


Subscribe to jfhovinne aggregator